Kyle Lo 12 months ago
Excited to share OLMo 2!
π 7B and 13B weights, trained up to 4-5T tokens, fully open data, code, etc
π better architecture and recipe for training stability
π‘ staged training, with new data mix Dolminoπ added during annealing
π¦ state-of-the-art OLMo 2 Instruct models
#nlp #mlsky
links belowπ