Shantanu Acharya
@shantanuacharya.bsky.social
๐ค 21
๐ฅ 226
๐ 2
Researcher at NVIDIA - Working on Long Context LLMs
reposted by
Shantanu Acharya
alphaXiv
10 months ago
Star Attention Star Attention is a new way to make large language models process very long texts much faster while maintaining accuracy. Author
@shantanuacharya.bsky.social
is on alphaXiv this week to answer your questions on his paper!
1
4
1
๐ Introducing Star Attention - a novel inference method combining local and global attention to do LLM inference over long sequences. โ Improves inference by 11x while preserving 95-100% accuracy โ Integrates with any LLM without any finetuning Paper:
arxiv.org/abs/2411.17116
loading . . .
Star Attention: Efficient LLM Inference over Long Sequences
Inference with Transformer-based Large Language Models (LLMs) on long sequences is both costly and slow due to the quadratic complexity of the self-attention mechanism. We introduce Star Attention, a ...
https://arxiv.org/abs/2411.17116
10 months ago
1
0
0
you reached the end!!
feeds!
log in