Michael Hu (@michahu.bsky.social)

So you want a good pretraining data mix🧑‍🍳, but which data mixing algorithm do you pick? DoGE, DoReMi, Skill-it, grid searching proportions… 😵‍💫 It turns out that these algorithms are all special cases of Linear Mixing Optimization, our new data mixing framework! 🧵