Edoardo Ponti 7 months ago
Sparse attention is one of the most promising strategies to unlock long-context processing and long-generation reasoning in LLMs.
We performed the most comprehensive study on training-free sparse attention to date.
Here is what we found: