(@weidingerlaura.bsky.social)

📣 New paper! The field of AI research is increasingly realising that benchmarks are very limited in what they can tell us about AI system performance and safety. We argue and lay out a roadmap toward a *science of AI evaluation*: arxiv.org/abs/2503.05336 🧵

loading . . .

LinkedIn This link will take you to a page that’s not on LinkedIn https://lnkd.in/eGBhdd4P

about 1 year ago