Zhengyang Shan
@shanzzyy.bsky.social
📤 3
📥 1
📝 13
PhD @ Boston University | Researching interpretability & evaluation in large language models
Can steering remove LLM shortcuts without breaking legitimate LLM capabilities? In our
@eaclmeeting.bsky.social
paper, we show that conceptual bias is separable from concept detection; this means inference-time debiasing is possible with minimal capability loss.
30 days ago
6
1
1
you reached the end!!
feeds!
log in