Millicent Li
@millicentli.bsky.social
📤 20
📥 13
📝 8
CS PhD Student @ Northeastern, former ugrad @ UW, UWNLP --
https://millicentli.github.io/
reposted by
Millicent Li
Aaron Mueller
about 1 month ago
What's the right unit of analysis for understanding LLM internals? We explore in our mech interp survey (a major update from our 2024 ms). We’ve added more recent work and more immediately actionable directions for future work. Now published in Computational Linguistics!
2
40
16
Wouldn’t it be great to have questions about LM internals answered in plain English? That’s the promise of verbalization interpretability. Unfortunately, our new paper shows that evaluating these methods is nuanced—and verbalizers might not tell us what we hope they do. 🧵👇1/8
about 2 months ago
1
27
9
you reached the end!!
feeds!
log in