The personalization gap challenges an implicit assumption in AI evaluation: that we can measure capability and safety independently of deployment context. Good opportunity to rethink how we evaluate AI systems.
add a skeleton here at some point
about 2 months ago