@tomerashuach.bsky.social
📤 7
📥 13
📝 8
reposted by
Dana Arad
4 months ago
Tried steering with SAEs and found that not all features behave as expected? Check out our new preprint - "SAEs Are Good for Steering - If You Select the Right Features" 🧵
2
18
9
🚨New paper at
#ACL2025
Findings! REVS: Unlearning Sensitive Information in LMs via Rank Editing in the Vocabulary Space. LMs memorize and leak sensitive data—emails, SSNs, URLs from their training. We propose a surgical method to unlearn it. 🧵👇w/
@boknilev.bsky.social
@mtutek.bsky.social
1/8
4 months ago
1
6
3
you reached the end!!
feeds!
log in