🧠DeepSeek's R1 report highlights challenges in applying AlphaZero to LLMs. Gaming AI offers a key insight: AlphaStar, like an LLM, starts with imitation learning, then RL, and crucially, league self-play. League diversity helps avoid local optima. How do we implement this? ⬇️
8 months ago