Zhaolin Gao

prof_pic.jpg

I’m a second-year Computer Science Ph.D. student at Cornell University, where I am advised by Thorsten Joachims and Wen Sun. My research includes reinforcement learning, natural language processing, and recommendation systems. My work has been published at NeurIPS, ICLR, CVPR, WWW, SIGIR, RecSys, and INFOCOM.

I completed my bachelor’s degree in Computer Engineering at University of Toronto, where I had the privilege of working with Baochun Li, Scott Sanner, and Maksims Volkovs.

I am also a part-time content creator with more than 50,000 followers and 10 million views on Bilibili, Douyin, and YouTube.

Email / CV / Google Scholar

News

May 27, 2025 Our paper “Accelerating RL for LLM Reasoning with Optimal Advantage Regression” is now on arXiv.
May 23, 2025 Our paper “Value-Guided Search for Efficient Chain-of-Thought Reasoning” is now on arXiv.
Mar 1, 2025 Our paper “Q#: Provably Optimal Distributional RL for LLM Post-Training” is now on arXiv.
Jan 25, 2025 REFUEL is accepted to ICLR’25!
Oct 25, 2024 Our paper “End-to-end Training for Recommendation with Language-based User Profiles” is now on arXiv.
Oct 10, 2024 Our paper “Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF” is now on arXiv.
Oct 5, 2024 I’m awarded Cornell Bowers CIS-LinkedIn Grants!
Oct 1, 2024 REBEL is accepted to NeurIPS’24!

Selected Publications

  1. Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
    Gao, Zhaolin, Zhan, Wenhao, Chang, Jonathan D., Swamy, Gokul, Brantley, Kianté, Lee, Jason D., and Sun, Wen
    ICLR 2025
  2. End-to-end Training for Recommendation with Language-based User Profiles
    Gao, Zhaolin, Zhou, Joyce, Dai, Yijia, and Joachims, Thorsten
    ROEGEN-RecSys 2024
  3. REBEL: Reinforcement Learning via Regressing Relative Rewards
    Gao, Zhaolin, Chang, Jonathan D., Zhan, Wenhao, Oertell, Owen, Swamy, Gokul, Brantley, Kianté, Joachims, Thorsten, Bagnell, J. Andrew, Lee, Jason D., and Sun, Wen
    NeurIPS 2024
  4. Mitigating the Filter Bubble while Maintaining Relevance: Targeted Diversification with VAE-based Recommender Systems
    Gao, Zhaolin, Shen, Tianshu, Mai, Zheda, Bouadjenek, Mohamed Reda, Waller, Isaac, Anderson, Ashton, Bodkin, Ron, and Sanner, Scott
    SIGIR 2022
  5. MCL: Mixed-Centric Loss for Collaborative Filtering
    Gao, Zhaolin*, Cheng, Zhaoyue*, Perez, Felipe, Sun, Jianing, and Volkovs, Maksims
    WWW 2022