Shantanu Acharya (@shantanuacharya.bsky.social)

🚀 Introducing Star Attention - a novel inference method combining local and global attention to do LLM inference over long sequences. ✅ Improves inference by 11x while preserving 95-100% accuracy ✅Integrates with any LLM without any finetuning Paper: arxiv.org/abs/2411.17116

loading . . .

Star Attention: Efficient LLM Inference over Long Sequences Inference with Transformer-based Large Language Models (LLMs) on long sequences is both costly and slow due to the quadratic complexity of the self-attention mechanism. We introduce Star Attention, a ... https://arxiv.org/abs/2411.17116

10 months ago