Liu, Dong, Lu, Diao, Belcak, Liu, Chen, Yin, Wang, Cheng, Choi, Kautz, Molchanov: GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
https://arxiv.org/abs/2601.05242 https://arxiv.org/pdf/2601.05242 https://arxiv.org/html/2601.05242