GPT-5 numbers:
* HCAST/RE-Bench 50%: +25% rel, to 2h17m, SOTA
* HCAST/RE-Bench 80%: +25% rel, to 25mins, SOTA
* (Tier 1-3) FrontierMath: +5% abs, SOTA
* SWE-Bench Verified: same as Claude 4.1
* <1% improvement on other coding benchmarks
* Aider: +3% abs, SOTA
* Cost/perf: seems much worse
3 months ago