Zhengyang Shan
@shanzzyy.bsky.social
📤 4
📥 1
📝 13
PhD @ Boston University | Researching interpretability & evaluation in large language models
Can steering remove LLM shortcuts without breaking legitimate LLM capabilities? In our
@eaclmeeting.bsky.social
paper, we show that conceptual bias is separable from concept detection; this means inference-time debiasing is possible with minimal capability loss.
3 months ago
6
3
1
you reached the end!!
feeds!
log in