(@klieret.bsky.social)

Gemini 3 Pro sets new record on SWE-bench verified: 74%! (evaluated with minimal agent) Costs are 1.6x of GPT-5, but still cheaper than Sonnet 4.5. Gemini iterates longer than everyone; run your agent with a step limit of >100 for max performance. Details & full agent logs in 🧵

7 months ago