Jørgen Lund (@jaalu.bsky.social)

Very happy this paper got accepted to NeurIPS 2025 as a Spotlight! 😁 Main takeaway: In mechanistic interpretability, we need assumptions about how DNNs encode concepts in their representations (eg, the linear representation hypothesis). Without them, we can claim any DNN implements any algorithm!

add a skeleton here at some point