Group Relative Policy Optimization


Link to primary paper

text.