Jay Alammar
@jayalammar.bsky.social
📤 1374
📥 209
📝 21
Writer
http://jalammar.github.io
. O'Reilly Author
http://LLM-book.com
. LLM Builder Cohere.com.
pinned post!
The Illustrated DeepSeek-R1 Spent the weekend reading the paper and sorting through the intuitions. Here's a visual guide and the main intuitions to understand the model and the process that created it.
newsletter.languagemodels.co/p/the-illust...
10 months ago
1
73
26
Inside NeurIPS 2025: The Year’s AI Research, Mapped New blog post! NeurIPS 2025 papers are out—and it’s a lot to take in. This visualization lets you explore the entire research landscape interactively, with clusters and
@cohere.com
LLM-generated explanations that make it easier to grasp.
8 days ago
1
11
2
reposted by
Jay Alammar
Maarten Grootendorst
29 days ago
Excited to share that
@jayalammar.bsky.social
and I are writing the book “An Illustrated Guide to AI Agents” with
@oreilly.bsky.social
🥳 Our new book will contain chapters on the fundamentals of agents (memory, tools, and planning), alongside more advanced concepts like RL and reasoning LLMs.
2
9
2
The Illustrated Guide to AI Agents New book announcement! Thrilled that together with
@maartengr.bsky.social
, we're writing a new book titled “An Illustrated Guide to AI Agents” and published by
@oreilly.bsky.social
.
29 days ago
1
9
3
reposted by
Jay Alammar
Antoine Bosselut
2 months ago
The next generation of open LLMs should be inclusive, compliant, and multilingual by design. That’s why we
@icepfl.bsky.social
@ethz.ch
@cscsch.bsky.social
) built Apertus.
add a skeleton here at some point
2
25
10
The Illustrated GPT-OSS New post! A visual tour of the architecture, message formatting, and reasoning of the latest GPT.
newsletter.languagemodels.co/p/the-illust...
3 months ago
1
20
7
The legendary John Carmack at
#upperbound
: - Current AI focus is RL (with Richard Sutton) solving Atari games - Thinking in line with the Alberta Plan. - It was a misstep to start working too low-level (e.g., at the cuda level). I kept stepping up the stack chain until now in pytorch
6 months ago
0
7
0
reposted by
Jay Alammar
Adam Hill
6 months ago
I'm really excited for this year's PyData London conference - there are some awesome talks on the schedule and I'm excited to hear the keynote speakers
@jayalammar.bsky.social
, Tony Wears, & Leanne Fitzpatrick
#pydata
#datascience
add a skeleton here at some point
0
3
1
reposted by
Jay Alammar
PyData London
6 months ago
Unleash your inner data aficionado at PyData London 2025, 6-8 June at Convene Sancroft, St. Paul’s! We have 3 top flight keynotes lined up for you this year from
@jayalammar.bsky.social
, Leanne Kim Fitzpatrick and Tony Mears. Just 17 days left. Book your tickets now!
pydata.org/london2025
1
7
7
reposted by
Jay Alammar
Max Bartolo
8 months ago
I'm excited to share the tech report for our
@cohere.com
@cohereforai.bsky.social
Command A and Command R7B models. We highlight our novel approach to model training including self-refinement algorithms and model merging techniques at scale. Read more below! ⬇️
1
11
7
reposted by
Jay Alammar
Tom Aarsen
9 months ago
We've just released MMTEB, our multilingual upgrade to the MTEB Embedding Benchmark! It's a huge collaboration between 56 universities, labs, and organizations, resulting in a massive benchmark of 1000+ languages, 500+ tasks, and a dozen+ domains. Details in 🧵
2
23
4
reposted by
Jay Alammar
Maarten Grootendorst
9 months ago
Did you know we continue to develop new content for the "Hands-On Large Language Models" book? There's now even a free course available with
@deeplearningai.bsky.social
!
1
11
3
reposted by
Jay Alammar
Mason Youngblood
9 months ago
Do whales optimize their vocalizations for efficiency, just like human language? 🐋🎶 My latest study in Science Advances (
@science.org
) suggests they do—following linguistic laws seen in human speech. 🧵
www.science.org/doi/10.1126/...
loading . . .
Language-like efficiency in whale communication
Whale vocalizations follow efficiency rules seen in human language, revealing striking similarities in communication systems.
https://www.science.org/doi/10.1126/sciadv.ads6014
5
203
69
reposted by
Jay Alammar
Ellen Garland
9 months ago
We uncovered the same statistical structure that is a hallmark of human language in whale song, published today in Science.
@inbalarnon.bsky.social
@simonkirby.bsky.social
@jennyallen13.bsky.social
@clairenea.bsky.social
@emma-carroll.bsky.social
www.science.org/doi/10.1126/...
17
268
121
reposted by
Jay Alammar
Naomi Saphra
10 months ago
One of my grand interpretability goals is to improve human scientific understanding by analyzing scientific discovery models, but this is the most convincing case yet that we CAN learn from model interpretation: Chess grandmasters learned new play concepts from AlphaZero's internal representations.
loading . . .
Bridging the Human-AI Knowledge Gap: Concept Discovery and Transfer in AlphaZero
Artificial Intelligence (AI) systems have made remarkable progress, attaining super-human performance across various domains. This presents us with an opportunity to further human knowledge and improv...
https://arxiv.org/abs/2310.16410
2
108
23
The Illustrated DeepSeek-R1 Spent the weekend reading the paper and sorting through the intuitions. Here's a visual guide and the main intuitions to understand the model and the process that created it.
newsletter.languagemodels.co/p/the-illust...
10 months ago
1
73
26
Alphaxiv is an awesome way to discuss ML papers -- often with the authors themselves. Here's an intro and demo by
@rajpalleti.bsky.social
we shot at
#Neurips2024
www.youtube.com/watch?v=-Kwl...
loading . . .
AlphaXiv - a great place to discuss ML papers
YouTube video by Jay Alammar
https://www.youtube.com/watch?v=-Kwlqd1mXv0
10 months ago
0
11
4
reposted by
Jay Alammar
Tom Aarsen
10 months ago
The newest extremely strong embedding model based on ModernBERT-base is out: `cde-small-v2`. Both faster and stronger than its predecessor, this one tops the MTEB leaderboard for its tiny size! Details in 🧵
1
31
8
Floored that the repo for Hands-On Large Language Models is now at 3.6k Github stars! And excited that professors are starting to use the book to teach LLM courses. Reach out to us if we can be of assistance! And if you've liked the book, leave us a review on Amazon or Goodreads!
10 months ago
1
10
0
SWE-Bench has been one of the most important tasks measuring the progress of agents tackling software engineering in 2024. I caught up with two of its creators,
@ofirpress.bsky.social
and Carlos E. Jimenez to share their ideas on the state of LLM-backed agents.
www.youtube.com/watch?v=bivZ...
loading . . .
SWE-Bench authors reflect on the state of LLM agents at Neurips 2024
YouTube video by Jay Alammar
https://www.youtube.com/watch?v=bivZWNQHRfE
10 months ago
0
3
2
reposted by
Jay Alammar
Nathan Lambert
11 months ago
OpenAI's o3: The grand finale of AI in 2024 A step change as influential as the release of GPT-4. Reasoning language models are the current and next big thing. I explain: * The ARC prize * o3 model size / cost * Dispelling training myths * Extreme benchmark progress
loading . . .
o3: The grand finale of AI in 2024
A step change as influential as the release of GPT-4. Reasoning language models are the current big thing.
https://buff.ly/4gpHxbe
8
81
13
Good morning
#NeurIPS2024
! Stop by the
@cohere.com
booth at 3PM today (Thursday) for a signed copy of Hands-On Large Language Models - it will introduce you to LLMs, their applications, as well as Cohere's Embed, Rerank, and Command-R models. Come early as quantities are limited!
11 months ago
0
7
0
I'll be in the Cohere
#NeurIPS2024
booth most of this afternoon. Come say hi, ask questions, and yes, we're hiring! Tomorrow I'll be signing copies of my book at 3PM! Limited copies available!
11 months ago
0
6
1
Hi NeurIPS! Explore ~4,500 NeurIPS papers in this interactive visualization:
jalammar.github.io/assets/neuri...
(Click on a point to see the paper on the website) Uses
@cohere.com
models and
@lelandmcinnes.bsky.social
's datamapplot/umap to help make sense of the overwhelming scale of NeurIPS.
loading . . .
11 months ago
1
62
13
Sure to be thought provoking. The previous interview had fascinating thoughts on scifi (Dune Vs. Foundation), on AI competition for AI safety, and on successful scifi as a self-preventing prophecy.
add a skeleton here at some point
12 months ago
0
5
0
Excited to see you all at NeurIPS this year! Let's hang!
12 months ago
1
2
0
Great insights
add a skeleton here at some point
12 months ago
1
1
0
Join us for a panel on scientific communication Dec 4!
add a skeleton here at some point
12 months ago
0
0
0
reposted by
Jay Alammar
Max Bartolo
12 months ago
🚨 LLMs can learn to reason from procedural knowledge in pretraining data! 🚨 I particularly enjoy research where the evidence contradicts our initial hypothesis. If you're interested in LLM reasoning, check out the 60+ pages of in-depth work at
arxiv.org/abs/2411.12580
add a skeleton here at some point
4
67
8
reposted by
Jay Alammar
Chris McKitterick
12 months ago
Have you updated your handle to use your own domain? I'm in the process of updating my handle to use the adastra-sf.com doman, but though I've tried both methods of uploading a TXT file to my server (DNS interface and no interface), it's not resolving even after half an hour. Tips? Ideas?
0
0
2
reposted by
Jay Alammar
Evan Peck
12 months ago
Trying something new: A 🧵 on a topic I find many students struggle with: "why do their 📊 look more professional than my 📊?" It's *lots* of tiny decisions that aren't the defaults in many libraries, so let's break down 1 simple graph by
@jburnmurdoch.bsky.social
🔗
www.ft.com/content/73a1...
92
1592
559
reposted by
Jay Alammar
Maarten Grootendorst
12 months ago
🍿 Introducing the animated "Visual Guide to Mixture of Experts (MoE)"! This was a blast to make and contains more in-depth descriptions than the original already had! Expect even more intuition as we break down visuals and discover the nuances behind MoE.
www.youtube.com/watch?v=sOPD...
loading . . .
A Visual Guide to Mixture of Experts (MoE) in LLMs
YouTube video by Maarten Grootendorst
https://www.youtube.com/watch?v=sOPDGQjFcuM
0
4
3
I loved Daniel Dennett's From Bacteria to Bach and Back and its analogies between biological minds and computing paradigms. Chapter 4, especially, which speaks about how an intelligent being can have "competence without comprehension".
12 months ago
0
4
0
you reached the end!!
feeds!
log in