Millicent Li
@millicentli.bsky.social
📤 14
📥 13
📝 8
CS PhD Student @ Northeastern, former ugrad @ UW, UWNLP --
https://millicentli.github.io/
Wouldn’t it be great to have questions about LM internals answered in plain English? That’s the promise of verbalization interpretability. Unfortunately, our new paper shows that evaluating these methods is nuanced—and verbalizers might not tell us what we hope they do. 🧵👇1/8
7 days ago
1
19
5
you reached the end!!
feeds!
log in