Ivan Nardini (@ivnardini.bsky.social)

I’ve been exploring how to optimize Gemma 3 inference on Vertex AI and came across LMCache. Boosts vLLM with 7x faster access to 100x more KV cache, cutting TTFT 3–10x for multi-turn agents and long-context RAG via CacheGen + CacheBlend. Repo for details: github.com/LMCache/LMC...

loading . . .

GitHub - LMCache/LMCache: Supercharge Your LLM with the Fastest KV Cache Layer Supercharge Your LLM with the Fastest KV Cache Layer - LMCache/LMCache https://github.com/LMCache/LMCache

16 days ago