We're expanding the Epoch AI Benchmarking Hub with four more external benchmarks: VPCT, Fiction-liveBench, GeoBench, and SimpleBench! These benchmarks test visual physics understanding, Geoguessr ability, long-context comprehension, and reasoning and logic skills. š§µ
4 months ago