Anastasios Gerontopoulos (@nasosger.bsky.social)

1/n Multi-token prediction training boosts LLMs (DeepSeek-V3), tackling key limitations of the next-token prediction objective: • Short-term focus • Struggles with long-range decisions • Weaker supervision Prior methods add complexity (extra layers) 🔑 Our fix? Register tokens—elegant and powerful