Our paper “Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF” is now on arXiv.