Blue skies 🦋 , hot (?) takes 🔥
Constrained output for LLMs, e.g., outlines library for vllm which forces models to output json/pydantic schemas, is cool!
But, because output tokens cost much more latency than input tokens, if speed matters: bespoke, low-token output formats are often better.
10 months ago