Overview of GRPO (Group Relative Policy Optimization)
GRPO is an improvement on PPO introduced in the DeepSeekMath paper
The motivation is that PPO requires 4 large models, a policy, value function, reward model, and reference model. GRPO removes the need for the value model.
10 months ago