Zhaolin Gao

Member of Technical Staff at Microsoft Super Intelligence

My research includes reinforcement learning (RL) and its applications in LLM post-training. I received my Ph.D. from Cornell, where I was fortunate to work with Thorsten Joachims and Wen Sun. My Ph.D. was supported by LinkedIn fellowship.

I completed my bachelor’s degree at University of Toronto, working with Baochun Li, Scott Sanner, and Maksims Volkovs.

I am also a part-time content creator with more than 50,000 followers and 12 million views on Bilibili and Douyin.

Email / CV / Google Scholar / X / LinkedIn

News

Apr 13, 2026	Our paper “p1: Better Prompt Optimization with Fewer Prompts” is now on arXiv.
Jan 26, 2026	PCL is accepted ICLR 2026!.
Oct 1, 2025	Our paper “Prompt Curriculum Learning for Efficient LLM Post-Training” is now on arXiv.
Sep 18, 2025	A*-PO, Q#, VGS, and LLM-HMM are accepted NeurIPS 2025!
Sep 3, 2025	Q# and A*-PO are accepted (poster and oral) to New York Reinforcement Learning Workshop 2025.
Aug 6, 2025	LangPTune is accepted at CIKM 2025!
Jun 11, 2025	Our paper “Pre-trained Large Language Models Learn Hidden Markov Models In-context” is now on arXiv.
May 27, 2025	Our paper “Accelerating RL for LLM Reasoning with Optimal Advantage Regression” is now on arXiv.

Selected Publications

p1: Better Prompt Optimization with Fewer Prompts

Gao, Zhaolin, Wang, Yu (Sid), Liu, Bo, Joachims, Thorsten, Brantley, Kianté, and Sun, Wen

Preprint

PDF
Prompt Curriculum Learning for Efficient LLM Post-Training

Gao, Zhaolin, Kim, Joongwon, Sun, Wen, Joachims, Thorsten, Wang, Sid, Pang, Richard Yuanzhe, and Tan, Liang

ICLR 2026

PDF
Pre-trained Large Language Models Learn Hidden Markov Models In-context

Dai, Yijia, Gao, Zhaolin, Sattar, Yahya, Dean, Sarah, and Sun, Jennifer J.

NeurIPS 2025

PDF Code
Accelerating RL for LLM Reasoning with Optimal Advantage Regression

Brantley, Kianté, Chen, Mingyu, Gao, Zhaolin, Lee, Jason D., Sun, Wen, Zhan, Wenhao, and Zhang, Xuezhou (alphabetical order)

NeurIPS 2025

PDF Model Code
Value-Guided Search for Efficient Chain-of-Thought Reasoning

Wang, Kaiwen, Zhou, Jin Peng, Chang, Jonathan D., Gao, Zhaolin, Kallus, Nathan, Brantley, Kianté, and Sun, Wen

NeurIPS 2025

PDF Model Code dataset
Q#: Provably Optimal Distributional RL for LLM Post-Training

Zhou, Jin Peng*, Wang, Kaiwen*, Chang, Jonathan D., Gao, Zhaolin, Kallus, Nathan, Weinberger, Kilian Q., Brantley, Kianté, and Sun, Wen

NeurIPS 2025

PDF Code
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF

Gao, Zhaolin, Zhan, Wenhao, Chang, Jonathan D., Swamy, Gokul, Brantley, Kianté, Lee, Jason D., and Sun, Wen

ICLR 2025

PDF Model Code
End-to-end Training for Recommendation with Language-based User Profiles

Gao, Zhaolin, Zhou, Joyce, Dai, Yijia, and Joachims, Thorsten

CIKM 2025

PDF Code
REBEL: Reinforcement Learning via Regressing Relative Rewards

Gao, Zhaolin, Chang, Jonathan D., Zhan, Wenhao, Oertell, Owen, Swamy, Gokul, Brantley, Kianté, Joachims, Thorsten, Bagnell, J. Andrew, Lee, Jason D., and Sun, Wen

NeurIPS 2024

PDF Model Code
Session-based Recommendation With Transformers

Lu, Yichao, Gao, Zhaolin*, Cheng, Zhaoyue*, Sun, Jianing*, Brown, Bradley, Yu, Guangwei, Wong, Anson, Perez, Felipe, and Volkovs, Maksims

RecSys Challenge 2022

PDF
Mitigating the Filter Bubble while Maintaining Relevance: Targeted Diversification with VAE-based Recommender Systems

Gao, Zhaolin, Shen, Tianshu, Mai, Zheda, Bouadjenek, Mohamed Reda, Waller, Isaac, Anderson, Ashton, Bodkin, Ron, and Sanner, Scott

SIGIR 2022

PDF Code
MCL: Mixed-Centric Loss for Collaborative Filtering

Gao, Zhaolin*, Cheng, Zhaoyue*, Perez, Felipe, Sun, Jianing, and Volkovs, Maksims

WWW 2022

PDF Code
Shoestring: Graph-Based Semi-Supervised Classification With Severely Limited Labeled Data

Lin, Wanyu, Gao, Zhaolin, and Li, Baochun

CVPR 2020

PDF Code
Guardian: Evaluating Trust in Online Social Networks with Graph Convolutional Networks

Lin, Wanyu, Gao, Zhaolin, and Li, Baochun

INFOCOM 2020

PDF Code