Lee Sharkey (@leesharkey.bsky.social)

New interpretability paper from Apollo Research! 🟢Attribution-based Parameter Decomposition 🟢 It's a new way to decompose neural network parameters directly into mechanistic components. It overcomes many of the issues with SAEs! 🧵