Qin, Tan, He, Li, Lin, Li, Xu, Shi, Cai, Rui, Cai, Cai, Zhang, Ye, Li, Sun: Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning
https://arxiv.org/abs/2509.22601 https://arxiv.org/pdf/2509.22601 https://arxiv.org/html/2509.22601