I'd expect that if this is the right way to think about it, models with math-focused post-training should have shorter reasoning traces on math problems than models without.
I don't know if that's true - if not, seems pretty dissimilar to caching?
add a skeleton here at some point
about 1 month ago