Simone Civetta (@viteinfinite.bsky.social)

It’s time for the Llama generation speed benchmark! 🚀 We’ve compared quantized Llama 3.2 1B QLoRA and the full precision model. Results: ⚡Quantized model: 9.56s ⌛Full precision model: 19.14s It’s 2x faster with quantization! 📈 Try it yourself with our example app ⭐ github.com/software-man...

add a skeleton here at some point