Somin W
@sominw.bsky.social
π€ 32
π₯ 71
π 10
cs phd @ northeastern. opinions on new england & beyond..
π New work w/
@silvioamir.bsky.social
&
@byron.bsky.social
! We show you can distill a modelβs mechanism, not just its answers -- teaching a small LM to run it's circuit same as a larger teacher model. We call it Circuit Distillation. (1/4)
about 1 month ago
1
3
1
π’ Can we trace a small distilled model back to its teacher? π€New work (w/
@chantalsh.bsky.social
,
@silvioamir.bsky.social
&
@byron.bsky.social
) finds some footprints left by LLMs in distillation! [1/6] π Full paper:
arxiv.org/abs/2502.06659
loading . . .
Who Taught You That? Tracing Teachers in Model Distillation
Model distillation -- using outputs from a large teacher model to teach a small student model -- is a practical means of creating efficient models for a particular task. We ask: Can we identify a stud...
https://arxiv.org/abs/2502.06659
9 months ago
1
7
2
you reached the end!!
feeds!
log in