Mechanical Dirk
@mechanicaldirk.bsky.social
📤 509
📥 241
📝 60
Training big models at
@ai2.bsky.social
.
Oof
add a skeleton here at some point
10 days ago
0
1
0
reposted by
Mechanical Dirk
Nathan Lambert
about 1 month ago
Happy Olmo day to all who celebrate. Sorry to all who delayed releases today to get out of our way. We're hiring.
0
32
2
reposted by
Mechanical Dirk
Ai2
about 1 month ago
Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use, and an open model flow—not just the final weights, but the entire training journey. Best fully open 32B reasoning model & best 32B base model. 🧵
1
70
21
reposted by
Mechanical Dirk
Kyle Lo
about 1 month ago
we released Olmo 3! lot of exciting stuff but wanna focus on: 🐟Olmo 3 32B Base, the best fully-open base model to-date, near Qwen 2.5 & Gemma 3 on diverse evals 🐠Olmo 3 32B Think, first fully-open reasoning model approaching Qwen 3 levels 🐡12 training datasets corresp to different staged training
1
42
8
reposted by
Mechanical Dirk
Nathan Lambert
about 1 month ago
I'm excited to announce my RLHF Book is now in pre-order for the
@manning.com
Early Access Program (MEAP), and for this milestone it's 50% off. Excited to land in print in early 2026! Lots of improvements coming soon. Thanks for the support!
hubs.la/Q03Tc37Q0
4
48
7
Incredible work by Apple's UX department, enabling three different corner radii at the same time 🙈
about 2 months ago
0
2
0
reposted by
Mechanical Dirk
Daniel Buschek
2 months ago
While reviewing for
#CHI2026
, I've noticed four new writing issues in
#HCI
papers, likely due to an increased use of
#LLMs
/
#AI
. I describe them here - and how to fix them:
dbuschek.medium.com/when-llms-wr...
loading . . .
When LLMs Write Our Papers
Four writing issues I notice as a reviewer — and how to fix them
https://dbuschek.medium.com/when-llms-write-our-papers-1cc746373cd0
2
28
7
reposted by
Mechanical Dirk
Ai2
4 months ago
We’re releasing early pre-training checkpoints for OLMo-2-1B to help study how LLM capabilities emerge. They’re fine-grained snapshots intended for analysis, reproduction, and comparison. 🧵
1
27
6
Mein Dreijähriger: "Ich will den Lerns Geschichte Podcast hören!" Was ist denn "Lerns Geschichte"? Zwei Minuten später im Radio: "Lernen's a bissel
@geschichte.fm
, dann ..." 😲
4 months ago
0
0
0
This project is a perfect model of an OLMo contribution. Well scoped, practical, sound theoretical underpinnings, and
@lambdaviking.bsky.social
submitted the paper 24h before the deadline 😍. It's integrated into the OLMo trainer here:
github.com/allenai/OLMo...
add a skeleton here at some point
7 months ago
0
2
0
Finally, OLMo 1B. This is the most commonly requested OLMo feature l, and it's finally here.
add a skeleton here at some point
8 months ago
0
1
0
reposted by
Mechanical Dirk
Jacob Morrison
8 months ago
I'm in Singapore for
@iclr-conf.bsky.social
! Come check out our spotlight paper on the environmental impact of training OLMo (link in next tweet) during the Saturday morning poster session from 10-12:30 -- happy to chat about this or anything else! DMs should be open, email works too
1
10
6
Came across
arxiv.org/pdf/2504.05058
today. What a cool example of work you can do when LLM training data is open!
loading . . .
https://arxiv.org/pdf/2504.05058
8 months ago
1
7
0
reposted by
Mechanical Dirk
Ai2
9 months ago
Ever wonder how LLM developers choose their pretraining data? It’s not guesswork— all AI labs create small-scale models as experiments, but the models and their data are rarely shared. DataDecide opens up the process: 1,050 models, 30k checkpoints, 25 datasets & 10 benchmarks 🧵
1
52
14
reposted by
Mechanical Dirk
Jiacheng Liu
9 months ago
Today we're unveiling OLMoTrace, a tool that enables everyone to understand the outputs of LLMs by connecting to their training data. We do this on unprecedented scale and in real time: finding matching text between model outputs and 4 trillion training tokens within seconds. ✨
add a skeleton here at some point
1
41
7
The fact that my Bsky feed is all tariffs and none Llama 4 means the platform is pretty much cooked for research purposes.
9 months ago
1
1
0
reposted by
Mechanical Dirk
Alisa Liu
9 months ago
We created SuperBPE🚀, a *superword* tokenizer that includes tokens spanning multiple words. When pretraining at 8B scale, SuperBPE models consistently outperform the BPE baseline on 30 downstream tasks (+8% MMLU), while also being 27% more efficient at inference time.🧵
3
83
21
Error bars!
@hails.computer
will be so proud!
add a skeleton here at some point
10 months ago
0
2
0
reposted by
Mechanical Dirk
Ai2
10 months ago
Introducing olmOCR, our open-source tool to extract clean plain text from PDFs! Built for scale, olmOCR handles many document types with high throughput. Run it on your own GPU for free—at over 3000 token/s, equivalent to $190 per million pages, or 1/32 the cost of GPT-4o!
loading . . .
3
82
16
reposted by
Mechanical Dirk
Ai2
11 months ago
We took our most efficient model and made an open-source iOS app📱but why? As phones get faster, more AI will happen on device. With OLMoE, researchers, developers, and users can get a feel for this future: fully private LLMs, available anytime. Learn more from
@soldaini.net
👇
youtu.be/rEK_FZE5rqQ
loading . . .
Ai2 OLMoE: Fully open source, running entirely on-device
YouTube video by Ai2
https://youtu.be/rEK_FZE5rqQ
2
30
19
14.8T tokens in 2.8M hours is about 1500 tokens per second. That's a very good number for 37B active parameters, but by no means unbelievable.
11 months ago
0
0
0
reposted by
Mechanical Dirk
Nathan Lambert
11 months ago
Behind the scenes with what its like to build language models and pursue (hopefully) cutting edge AI research Interviewing OLMo 2 leads: Open secrets of training language models What we have learned and are going to do next. YouTube:
https://buff.ly/40IlSFF
Podcast / notes:
loading . . .
Interviewing OLMo 2 leads: Open secrets of training language models
What we have learned and are going to do next.
https://buff.ly/4gbbydY
1
33
8
In November, every post here was about NLP. Now it's all about TikTok. We're doing the Twitter speed run.
11 months ago
0
2
0
A few days ago, we did finally release the OLMo 2 tech report:
arxiv.org/pdf/2501.00656
. There is a lot of good stuff in there, but the stability work we did over the summer makes me particularly proud.
loading . . .
https://arxiv.org/pdf/2501.00656
12 months ago
0
1
0
reposted by
Mechanical Dirk
Nathan Lambert
12 months ago
Everyone wants open-source language models but no one wants to lift these heavy ass weights. We just released our paper "2 OLMo 2 Furious" Can't stop us in 2025. Links below.
6
56
12
Some people seem to believe that LLMs give inoffensive, milquetoast answers because of overblown safety concerns ("Because of the woke!"). But that's not it. LLMs give bland answers because they produce the average of what anyone would have said on the Internet.
about 1 year ago
1
2
0
It seems to me the second most common language spoken in the halls of NeurIPS is German.
about 1 year ago
0
4
0
reposted by
Mechanical Dirk
Nathan Lambert
about 1 year ago
Made a list of resources for open source language models with
@soldaini.net
ahead of the tutorial tomorrow at 930 AM.
github.com/allenai/awes...
loading . . .
GitHub - allenai/awesome-open-source-lms: Friends of OLMo and their links.
Friends of OLMo and their links. Contribute to allenai/awesome-open-source-lms development by creating an account on GitHub.
https://github.com/allenai/awesome-open-source-lms
2
112
20
reposted by
Mechanical Dirk
Jiacheng Liu
about 1 year ago
Want to predict the task performance of LMs before pretraining them? We develop task scaling laws and model ladders, which predict the accuracy on individual tasks by OLMo 2 7B & 13B models within 2 points of absolute error. The cost is 1% of the compute used to pretrain them.
2
33
14
I'll be at NeurIPS from Wednesday until Sunday! Do you think about pre-training? GPUs? What makes a foundation model good? If you have questions or answers, let's find a time to chat!
about 1 year ago
0
6
0
We just updated the OLMo repo at
github.com/allenai/OLMo
! There are now several training configs that together reproduce the training runs that lead to the final OLMo 2 models. In particular, all the training data is available, tokenized and shuffled exactly as we trained on it!
loading . . .
GitHub - allenai/OLMo: Modeling, training, eval, and inference code for OLMo
Modeling, training, eval, and inference code for OLMo - allenai/OLMo
https://github.com/allenai/OLMo
about 1 year ago
0
54
11
reposted by
Mechanical Dirk
Nathan Lambert
about 1 year ago
I've spent the last two years scouring all available resources on RLHF specifically and post training broadly. Today, with the help of a totally cracked team, we bring you the fruits of that labor — Tülu 3, an entirely open frontier model post training recipe. We beat Llama 3.1 Instruct. Thread.
8
211
52
reposted by
Mechanical Dirk
Ian Magnusson
about 2 years ago
LMs are used to process text from many topics, styles, dialects, etc., but how well do they do? 📈 Evaluating perplexity on just one corpus like C4 doesn't tell the whole story 📉 ✨📃✨ We introduce Paloma, a benchmark of 585 domains from NY Times to r/depression on Reddit.
1
17
8
you reached the end!!
feeds!
log in