Fazl Barez 9 months ago
New paper alert! ๐จ
Important question: Do SAEs generalise?
We explore the answerability detection in LLMs by comparing SAE features vs. linear residual stream probes.
Answer:
probes outperform SAE features in-domain, out-of-domain generalization varies sharply between features and datasets. ๐งต