Sumit (@reachsumit.com)

Effective Reinforcement Learning for Agentic Search by Recycling Zero-Variance Queries During Training Proposes recycling zero-variance rollout groups back into the RL training pool, letting a 1.7B search agent match or surpass 7B systems. 📝 arxiv.org/abs/2606.10709 👨🏽‍💻 github.com/cxcscmu/agen...

loading . . .

GitHub - cxcscmu/agentic_search_query_recycling Contribute to cxcscmu/agentic_search_query_recycling development by creating an account on GitHub. https://github.com/cxcscmu/agentic_search_query_recycling

about 10 hours ago