š”Eureka inference-time scaling insight (Day 2): Despite the major updates, reasoning does not benefit all domains equally. E.g., most players report numbers on GPQA to show generalization. However, improvements in GPQA are driven by Physics, with Chemistry and Biology still visibly lagging behind.
6 months ago