So you want a good pretraining data mix🧑🍳, but which data mixing algorithm do you pick? DoGE, DoReMi, Skill-it, grid searching proportions… 😵💫
It turns out that these algorithms are all special cases of Linear Mixing Optimization, our new data mixing framework! 🧵
11 months ago