Zhaolin Gao

I’m a third-year Computer Science Ph.D. student at Cornell University, where I am advised by Thorsten Joachims and Wen Sun, and a part-time Researcher at Meta Superintelligence. My research includes reinforcement learning (RL) and its applications in LLM post-training. My work has been published at NeurIPS, ICLR, CVPR, WWW, SIGIR, CIKM, RecSys, and INFOCOM.

I completed my bachelor’s degree in Computer Engineering at University of Toronto, where I had the privilege of working with Baochun Li, Scott Sanner, and Maksims Volkovs.

I am also a part-time content creator with more than 50,000 followers and 10 million views on Bilibili, Douyin, and YouTube.

Email / CV / Google Scholar

News

Oct 1, 2025	Our paper “Prompt Curriculum Learning for Efficient LLM Post-Training” is now on arXiv.
Sep 18, 2025	A*-PO, Q#, VGS, and LLM-HMM are accepted NeurIPS 2025!
Sep 3, 2025	Q# and A*-PO are accepted (poster and oral) to New York Reinforcement Learning Workshop 2025.
Aug 6, 2025	LangPTune is accepted at CIKM 2025!
Jun 11, 2025	Our paper “Pre-trained Large Language Models Learn Hidden Markov Models In-context” is now on arXiv.
May 27, 2025	Our paper “Accelerating RL for LLM Reasoning with Optimal Advantage Regression” is now on arXiv.
May 23, 2025	Our paper “Value-Guided Search for Efficient Chain-of-Thought Reasoning” is now on arXiv.
Mar 1, 2025	Our paper “Q#: Provably Optimal Distributional RL for LLM Post-Training” is now on arXiv.

Selected Publications

Prompt Curriculum Learning for Efficient LLM Post-Training

Gao, Zhaolin, Kim, Joongwon, Sun, Wen, Joachims, Thorsten, Wang, Sid, Pang, Richard Yuanzhe, and Tan, Liang

Preprint

PDF
Pre-trained Large Language Models Learn Hidden Markov Models In-context

Dai, Yijia, Gao, Zhaolin, Sattar, Yahya, Dean, Sarah, and Sun, Jennifer J.

NeurIPS 2025

PDF Code
Accelerating RL for LLM Reasoning with Optimal Advantage Regression

Brantley, Kianté, Chen, Mingyu, Gao, Zhaolin, Lee, Jason D., Sun, Wen, Zhan, Wenhao, and Zhang, Xuezhou (alphabetical order)

NeurIPS 2025

PDF Model Code
Value-Guided Search for Efficient Chain-of-Thought Reasoning

Wang, Kaiwen, Zhou, Jin Peng, Chang, Jonathan D., Gao, Zhaolin, Kallus, Nathan, Brantley, Kianté, and Sun, Wen

NeurIPS 2025

PDF Model Code dataset
Q#: Provably Optimal Distributional RL for LLM Post-Training

Zhou, Jin Peng*, Wang, Kaiwen*, Chang, Jonathan D., Gao, Zhaolin, Kallus, Nathan, Weinberger, Kilian Q., Brantley, Kianté, and Sun, Wen

NeurIPS 2025

PDF Code
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF

Gao, Zhaolin, Zhan, Wenhao, Chang, Jonathan D., Swamy, Gokul, Brantley, Kianté, Lee, Jason D., and Sun, Wen

ICLR 2025

PDF Model Code
End-to-end Training for Recommendation with Language-based User Profiles

Gao, Zhaolin, Zhou, Joyce, Dai, Yijia, and Joachims, Thorsten

CIKM 2025

PDF Code
REBEL: Reinforcement Learning via Regressing Relative Rewards

Gao, Zhaolin, Chang, Jonathan D., Zhan, Wenhao, Oertell, Owen, Swamy, Gokul, Brantley, Kianté, Joachims, Thorsten, Bagnell, J. Andrew, Lee, Jason D., and Sun, Wen

NeurIPS 2024

PDF Model Code
Session-based Recommendation With Transformers

Lu, Yichao, Gao, Zhaolin*, Cheng, Zhaoyue*, Sun, Jianing*, Brown, Bradley, Yu, Guangwei, Wong, Anson, Perez, Felipe, and Volkovs, Maksims

RecSys Challenge 2022

PDF
Mitigating the Filter Bubble while Maintaining Relevance: Targeted Diversification with VAE-based Recommender Systems

Gao, Zhaolin, Shen, Tianshu, Mai, Zheda, Bouadjenek, Mohamed Reda, Waller, Isaac, Anderson, Ashton, Bodkin, Ron, and Sanner, Scott

SIGIR 2022

PDF Code
MCL: Mixed-Centric Loss for Collaborative Filtering

Gao, Zhaolin*, Cheng, Zhaoyue*, Perez, Felipe, Sun, Jianing, and Volkovs, Maksims

WWW 2022

PDF Code
Shoestring: Graph-Based Semi-Supervised Classification With Severely Limited Labeled Data

Lin, Wanyu, Gao, Zhaolin, and Li, Baochun

CVPR 2020

PDF Code
Guardian: Evaluating Trust in Online Social Networks with Graph Convolutional Networks

Lin, Wanyu, Gao, Zhaolin, and Li, Baochun

INFOCOM 2020

PDF Code