Zhanpeng Zhou (周展鹏)

Ph.D. candidate in Computer Science at Shanghai Jiao Tong University. My research interests focus on deep learning theory and science of large language models.

I am a member of ReThinkLab, Department of Computer Science & Engineering and advised by Prof. Junchi Yan. Prior than that, I also obtained my Bachelor's Degree in Electrical and Computer Engineering at Shanghai Jiao Tong University.

Email: zzp1012 [at] sjtu.edu.cn  /  1012zzphh [at] gmail.com

CV  /  Google Scholar  /  GitHub  / 

profile photo
Publications   (* indicates equal contributions; † indicates correspondence.)
How Does Local Landscape Geometry Evolve in Language Model Pre-Training?   [arXiv]
Zhanpeng Zhou*†, Yuhan Sun*, Bingrui Li, Jinbo Wang, Huaijin Wu, Lei Wu, Junchi Yan
In Submission
Towards Revealing the Effect of Batch Size Scheduling on Pre-training   [arXiv]
Jinbo Wang, Binghui Li, Zhanpeng Zhou, Mingze Wang, Yuxuan Sun, Jiaqi Zhang, Xunliang Cai, Lei Wu
In Submission
Efficient Hyperparameter Tuning via Trajectory Invariance Principle   [arXiv]
Bingrui Li, Jiaxin Wen, Zhanpeng Zhou, Jun Zhu, Jianfei Chen
In Submission
On Path to Multimodal Historical Reasoning: HistBench and HistAgent   [arXiv]
Qiu et al. (Co-author)
In Submission
New Evidence of the Two-Phase Learning Dynamics of Neural Networks   [arXiv]
Zhanpeng Zhou†, Yongyi Yang, Mahito Sugiyama, Junchi Yan
In Submission
On the Surprising Effectiveness of a Single Global Merging in Decentralized Learning   [arXiv]
Tongtian Zhu, Tianyu Zhang, Mingze Wang, Zhanpeng Zhou†, Can Wang
In Submission
On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGD   [arXiv]
Tongcheng Zhang*, Zhanpeng Zhou*†, Mingze Wang, Andi Han, Wei Huang, Taiji Suzuki, Junchi Yan
AAAI 2026 (Oral)
On the Role of Label Noise in the Feature Learning Process   [arXiv]   [GitHub]
Andi Han*†, Wei Huang*†, Zhanpeng Zhou*†, Gang Niu, Wuyang Chen, Junchi Yan, Akiko Takeda, Taiji Suzuki
ICML 2025
The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training   [arXiv]   [GitHub]
Jinbo Wang*, Mingze Wang*, Zhanpeng Zhou*, Junchi Yan, Weinan E, Lei Wu
ICML 2025
On the Cone Effect in the Learning Dynamics   [arXiv]
Zhanpeng Zhou†, Yongyi Yang, Jie Ren, Mahito Sugiyama, Junchi Yan
ICLR 2025 Workshop DeLTa
SE-Merging: A Self-Enhanced Approach for Dynamic Model Merging   [arXiv]
Zijun Chen*, Zhanpeng Zhou*, Bo Zhang, Weinan Zhang, Xi Sun, Junchi Yan
IJCNN 2025
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training   [arXiv]   [GitHub]   [Slides]
Zhanpeng Zhou*†, Mingze Wang*, Yuchen Mao, Bingrui Li, Junchi Yan†
ICLR 2025 (Spotlight)
On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent   [arXiv]
Bingrui Li, Wei Huang, Andi Han, Zhanpeng Zhou, Taiji Suzuki, Jun Zhu, Jianfei Chen
ICLR 2025 (Spotlight)
On the Emergence of Cross-Task Linearity in the Pretraining-Finetuning Paradigm   [arXiv]   [GitHub]   [Slides]
Zhanpeng Zhou*, Zijun Chen*, Yilan Chen, Bo Zhang, Junchi Yan
ICML 2024
Going Beyond Neural Network Feature Similarity: The Network Feature Complexity and Its Interpretation Using Category Theory   [arXiv]   [GitHub]
Yiting Chen, Zhanpeng Zhou, Junchi Yan
ICLR 2024
Going Beyond Linear Mode Connectivity: The Layerwise Linear Feature Connectivity   [arXiv]   [GitHub]   [Slides]   [Post]
Zhanpeng Zhou, Yongyi Yang, Xiaojiang Yang, Junchi Yan, Wei Hu
NeurIPS 2023
Defects of Convolutional Decoder Networks in Frequency Representation   [arXiv]   [GitHub]
Ling Tang*, Wen Shen*, Zhanpeng Zhou, Quanshi Zhang
ICML 2023
Batch Normalization Is Blind to the First and Second Derivatives of the Loss   [arXiv]   [GitHub]
Zhanpeng Zhou*, Wen Shen*, Huixin Chen*, Ling Tang, Quanshi Zhang
AAAI 2024 (Oral)
Can We Faithfully Represent Absence States to Compute Shapley Values on a DNN?   [arXiv]   [GitHub]
Jie Ren, Zhanpeng Zhou, Qirui Chen, Quanshi Zhang
ICLR 2023
A Unified Game-Theoretic Interpretation of Adversarial Robustness   [arXiv]   [Github]
Jie Ren*, Die Zhang*, Yisen Wang*, Lu Chen, Zhanpeng Zhou, Yiting Chen, Xu Cheng, Xin Wang, Meng Zhou, Jie Shi, Quanshi Zhang
NeurIPS 2021
Experience

[2025 Nov. - present.] Student Intern, ByteDance Seed Edge Research Program, Beijing, mentored by Yuxin Fang.

[2025 Apr. - 2025 Nov.] Student Intern, MiniMax Inc., Shanghai, contributing to the MiniMax-M2 model.

[2025 Apr. - 2025 Oct.] Student Trainee, RIKEN Center for Advanced Intelligence Proj., Tokyo, advised by Prof. Taiji Suzuki.

[2023 Sep. - 2024 Mar.] Visiting Ph.D. Student, National Institute of Informatics, Tokyo, advised by Prof. Mahito Sugiyama.

[2021 Mar. - 2021 Jun.] Research Intern, Mila Quebec, Montreal, advised by Prof. Jian Tang.

Awards

[2025 Nov. ] Top Reviewer Award, NeurIPS 2025

[2024 Nov. ] National Scholarship (top ~0.2%), Ministry of Education

[2024 Mar. ] Top Internship Evaluation, National Institute of Informatics

[2022 May. ] Outstanding Graduate Student, Shanghai Jiao Tong University

[2021 Nov. ] Yu Liming Scholarship, Shanghai Jiao Tong University

[2020 Nov. ] John Wu & Jane Sun Scholarship, Shanghai Jiao Tong University

[2019 Aug. ] Best Technology Award in Summer Design Expo, Shanghai Jiao Tong University

Services

[Area Chair] CPAL '26

[Conference Reviewer] ICML ('22, '24-25), NeurIPS ('22-25), ICLR ('24-26), AISTATS ('25-26)

[Journal Reviewer] IEEE T-PAMI, Intelligent Computing (Science Partner)

Teaching

[FA. 2021] Bayesian Analysis (VE414), Teaching Assistant, Shanghai Jiao Tong University.

[SU. 2021] Probabilistic Methods in Eng. (VE401), Teaching Assistant, Shanghai Jiao Tong University.


Thanks Jon Barron for this template.