Yoav Gur Arieh (@yoav.ml)

How can we interpret LLM features at scale? 🤔 Current pipelines use activating inputs, which is costly and ignores how features causally affect model outputs! We propose efficient output-centric methods that better predict the steering effect of a feature. New preprint led by @yoav.ml 🧵1/

loading . . .