Our paper “Accelerating RL for LLM Reasoning with Optimal Advantage Regression” is now on arXiv.