March 1, 2025
2025
Our paper “Q#: Provably Optimal Distributional RL for LLM Post-Training” is now on arXiv.