Jonathan Bragg
@jbragg.bsky.social
๐ค 256
๐ฅ 24
๐ 7
Leading agents R&D at AI2. AI & HCI research scientist.
https://jonathanbragg.com
pinned post!
Agent benchmarks don't measure true *AI* advances We built one that's hard & trustworthy: ๐ AstaBench tests agents w/ *standardized tools* on 2400+ scientific research problems ๐ SOTA results across 22 agent *classes* ๐ AgentBaselines agents suite ๐
arxiv.org/abs/2510.21652
๐งต๐
about 22 hours ago
1
6
1
Agent benchmarks don't measure true *AI* advances We built one that's hard & trustworthy: ๐ AstaBench tests agents w/ *standardized tools* on 2400+ scientific research problems ๐ SOTA results across 22 agent *classes* ๐ AgentBaselines agents suite ๐
arxiv.org/abs/2510.21652
๐งต๐
about 22 hours ago
1
6
1
you reached the end!!
feeds!
log in