|
Zhanpeng Zhou (周展鹏)
Ph.D. candidate in Computer Science at Shanghai Jiao Tong University.
My research interests focus on deep learning theory and science of large language models.
I am a member of ReThinkLab, Department of Computer Science & Engineering and advised by Prof. Junchi Yan.
Prior than that, I also obtained my Bachelor's Degree in Electrical and Computer Engineering at Shanghai Jiao Tong University.
Email: zzp1012 [at] sjtu.edu.cn  /  1012zzphh [at] gmail.com
CV  / 
Google Scholar  / 
GitHub  / 
|
|
|
Publications
(* indicates equal contributions; † indicates correspondence.)
|
How Does Local Landscape Geometry Evolve in Language Model Pre-Training?
[arXiv]
Zhanpeng Zhou*†, Yuhan Sun*, Bingrui Li, Jinbo Wang, Huaijin Wu, Lei Wu, Junchi Yan
In Submission
|
Towards Revealing the Effect of Batch Size Scheduling on Pre-training
[arXiv]
Jinbo Wang, Binghui Li, Zhanpeng Zhou, Mingze Wang, Yuxuan Sun, Jiaqi Zhang, Xunliang Cai, Lei Wu
In Submission
|
Efficient Hyperparameter Tuning via Trajectory Invariance Principle
[arXiv]
Bingrui Li, Jiaxin Wen, Zhanpeng Zhou, Jun Zhu, Jianfei Chen
In Submission
|
On Path to Multimodal Historical Reasoning: HistBench and HistAgent
[arXiv]
Qiu et al. (Co-author)
In Submission
|
New Evidence of the Two-Phase Learning Dynamics of Neural Networks
[arXiv]
Zhanpeng Zhou†, Yongyi Yang, Mahito Sugiyama, Junchi Yan
In Submission
|
On the Surprising Effectiveness of a Single Global Merging in Decentralized Learning
[arXiv]
Tongtian Zhu, Tianyu Zhang, Mingze Wang, Zhanpeng Zhou†, Can Wang
In Submission
|
On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGD
[arXiv]
Tongcheng Zhang*, Zhanpeng Zhou*†, Mingze Wang, Andi Han, Wei Huang, Taiji Suzuki, Junchi Yan
AAAI 2026 (Oral)
|
On the Role of Label Noise in the Feature Learning Process
[arXiv]
[GitHub]
Andi Han*†, Wei Huang*†, Zhanpeng Zhou*†, Gang Niu, Wuyang Chen, Junchi Yan, Akiko Takeda, Taiji Suzuki
ICML 2025
|
The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training
[arXiv]
[GitHub]
Jinbo Wang*, Mingze Wang*, Zhanpeng Zhou*, Junchi Yan, Weinan E, Lei Wu
ICML 2025
|
On the Cone Effect in the Learning Dynamics
[arXiv]
Zhanpeng Zhou†, Yongyi Yang, Jie Ren, Mahito Sugiyama, Junchi Yan
ICLR 2025 Workshop DeLTa
|
SE-Merging: A Self-Enhanced Approach for Dynamic Model Merging
[arXiv]
Zijun Chen*, Zhanpeng Zhou*, Bo Zhang, Weinan Zhang, Xi Sun, Junchi Yan
IJCNN 2025
|
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training
[arXiv]
[GitHub]
[Slides]
Zhanpeng Zhou*†, Mingze Wang*, Yuchen Mao, Bingrui Li, Junchi Yan†
ICLR 2025 (Spotlight)
|
On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
[arXiv]
Bingrui Li, Wei Huang, Andi Han, Zhanpeng Zhou, Taiji Suzuki, Jun Zhu, Jianfei Chen
ICLR 2025 (Spotlight)
|
On the Emergence of Cross-Task Linearity in the Pretraining-Finetuning Paradigm
[arXiv]
[GitHub]
[Slides]
Zhanpeng Zhou*, Zijun Chen*, Yilan Chen, Bo Zhang, Junchi Yan
ICML 2024
|
Going Beyond Neural Network Feature Similarity: The Network Feature Complexity and Its Interpretation Using Category Theory
[arXiv]
[GitHub]
Yiting Chen, Zhanpeng Zhou, Junchi Yan
ICLR 2024
|
Going Beyond Linear Mode Connectivity: The Layerwise Linear Feature Connectivity
[arXiv]
[GitHub]
[Slides]
[Post]
Zhanpeng Zhou, Yongyi Yang, Xiaojiang Yang, Junchi Yan, Wei Hu
NeurIPS 2023
|
Defects of Convolutional Decoder Networks in Frequency Representation
[arXiv]
[GitHub]
Ling Tang*, Wen Shen*, Zhanpeng Zhou, Quanshi Zhang
ICML 2023
|
Batch Normalization Is Blind to the First and Second Derivatives of the Loss
[arXiv]
[GitHub]
Zhanpeng Zhou*, Wen Shen*, Huixin Chen*, Ling Tang, Quanshi Zhang
AAAI 2024 (Oral)
|
Can We Faithfully Represent Absence States to Compute Shapley Values on a DNN?
[arXiv]
[GitHub]
Jie Ren, Zhanpeng Zhou, Qirui Chen, Quanshi Zhang
ICLR 2023
|
A Unified Game-Theoretic Interpretation of Adversarial Robustness
[arXiv]
[Github]
Jie Ren*, Die Zhang*, Yisen Wang*, Lu Chen, Zhanpeng Zhou, Yiting Chen, Xu Cheng, Xin Wang, Meng Zhou, Jie Shi, Quanshi Zhang
NeurIPS 2021
|
|
[2025 Nov. - present.] Student Intern, ByteDance Seed Edge Research Program, Beijing, mentored by Yuxin Fang.
[2025 Apr. - 2025 Nov.] Student Intern, MiniMax Inc., Shanghai, contributing to the MiniMax-M2 model.
[2025 Apr. - 2025 Oct.] Student Trainee, RIKEN Center for Advanced Intelligence Proj., Tokyo, advised by Prof. Taiji Suzuki.
[2023 Sep. - 2024 Mar.] Visiting Ph.D. Student, National Institute of Informatics, Tokyo, advised by Prof. Mahito Sugiyama.
[2021 Mar. - 2021 Jun.] Research Intern, Mila Quebec, Montreal, advised by Prof. Jian Tang.
|
|
[2025 Nov. ] Top Reviewer Award, NeurIPS 2025
[2024 Nov. ] National Scholarship (top ~0.2%), Ministry of Education
[2024 Mar. ] Top Internship Evaluation, National Institute of Informatics
[2022 May. ] Outstanding Graduate Student, Shanghai Jiao Tong University
[2021 Nov. ] Yu Liming Scholarship, Shanghai Jiao Tong University
[2020 Nov. ] John Wu & Jane Sun Scholarship, Shanghai Jiao Tong University
[2019 Aug. ] Best Technology Award in Summer Design Expo, Shanghai Jiao Tong University
|
|
[Area Chair] CPAL '26
[Conference Reviewer] ICML ('22, '24-25), NeurIPS ('22-25), ICLR ('24-26), AISTATS ('25-26)
[Journal Reviewer] IEEE T-PAMI, Intelligent Computing (Science Partner)
|
|
[FA. 2021] Bayesian Analysis (VE414), Teaching Assistant, Shanghai Jiao Tong University.
[SU. 2021] Probabilistic Methods in Eng. (VE401), Teaching Assistant, Shanghai Jiao Tong University.
|
|