Kyle Lo 12 months ago
Excited to share OLMo 2!
🐟 7B and 13B weights, trained up to 4-5T tokens, fully open data, code, etc
🐠 better architecture and recipe for training stability
🐡 staged training, with new data mix Dolmino🍕 added during annealing
🦈 state-of-the-art OLMo 2 Instruct models
#nlp #mlsky
links below👇