Andrew Gordon Wilson
@andrewgwils.bsky.social
📤 2813
📥 204
📝 135
Machine Learning Professor
https://cims.nyu.edu/~andrewgw
Perhaps I'm an outlier, but generally the value I derive from art is not from its backstory. I love a Bach fugue not because he was suffering, content, had many children, or whatever else, but because it's an extraordinary composition. I'd feel the same about AI generated art.
6 days ago
0
5
1
How much does a language model forget when finetuned on new tasks? We show both model size and optimization matter and forgetting can be nearly eliminated with self-generated replay!
arxiv.org/abs/2605.26097
w/Martin Marek, Dongkyu Cho, Shikai Qiu, Rumi Chunara, and Pavel Izmailov. 1/8
9 days ago
1
47
8
May all of your NeurIPS submissions be high epiplexity.
28 days ago
0
5
0
"Does it still make sense to get a CS degree?" A CS degree has never been primarily about software engineering. It's about core skills, about learning how to think. That never goes obsolete. But really you should get a physics degree.
about 2 months ago
2
33
4
Never be embarrassed about explaining something basic. The best work has no pretense, no ego.
about 2 months ago
0
9
2
Me in every meeting: "have you considered epiplexity?"
about 2 months ago
0
5
0
reposted by
Andrew Gordon Wilson
NYU Center for Data Science
about 2 months ago
Using advanced AI optimizers like Muon doesn’t have to rely on guesswork. Courant PhD students Shikai Qiu and Zixi (Charlie) Chen, CDS PhD Student Hoang Phan, CDS Asst. Prof. Qi Lei, and CDS Prof.
@andrewgwils.bsky.social
bridge theory and practice.
nyudatascience.medium.com/building-the...
loading . . .
Building the Science of Scaling: Improving the Efficiency of Deep Learning Optimizers
A profound regime change in the field of optimization may be around the corner. For a decade, the Adam optimizer has overwhelmingly…
https://nyudatascience.medium.com/building-the-science-of-scaling-improving-the-efficiency-of-deep-learning-optimizers-4861f4d88d0b
0
1
1
For the most part, people see what they want to see. If they want to find fault, they will. If they want to be supportive, they will. Smart people can convincingly rationalize virtually any position. But underneath it all often lies something fundamentally irrational, and far from objective.
2 months ago
0
5
0
Alec Radford (and others behind GPT, let's not forget there were other authors) deserve credit. Conventional wisdom said it shouldn't work well. It didn't work well. They got brutal feedback: stop wasting time building a glorified autocomplete. But they persisted and the results were mindblowing.
2 months ago
0
15
2
There's a new generation of empirical deep learning researchers, hacking away at whatever seems trendy, blowing with the wind... no accumulation of real understanding, or foundations. No real passion or depth, just light amusement and career advancement. I'm hoping it's a phase.
3 months ago
0
11
1
I don't like how the world is becoming increasingly isolating and impersonal. I don't want to scan a QR code with my phone to order at a restaurant. I'd like to talk with a person. Expediency isn't all that matters. Am I alone in this?
3 months ago
4
27
0
What if Watson & Crick discovered the double helix structure of DNA at Nando's instead of The Eagle pub? Would they have a commemorative perinaise, or stick with the plaque?
5 months ago
0
3
0
We introduce epiplexity, a new measure of information that provides a foundation for how to select, generate, or transform data for learning systems. We have been working on this for almost 2 years, and I cannot contain my excitement!
arxiv.org/abs/2601.03220
1/7
5 months ago
8
142
42
reposted by
Andrew Gordon Wilson
Sebastian Raschka (rasbt)
5 months ago
One of the underrated papers this year: "Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful" (
arxiv.org/abs/2507.07101
) (I can confirm this holds for RLVR, too! I have some experiments to share soon.)
0
70
10
Excited about our new paper that unifies discrete, Gaussian, and simplicial diffusion, enabling model comparison, likelihood evaluation, stable training, and more, including a DNA design application! Amazing work from
@alannawzadamin.bsky.social
, Alina, Lily, and team!
arxiv.org/abs/2512.15923
loading . . .
A Unification of Discrete, Gaussian, and Simplicial Diffusion
To model discrete sequences such as DNA, proteins, and language using diffusion, practitioners must choose between three major methods: diffusion in discrete space, Gaussian diffusion in Euclidean spa...
https://arxiv.org/abs/2512.15923
6 months ago
0
26
5
reposted by
Andrew Gordon Wilson
Erin Grant
6 months ago
Thrilled to start 2026 as faculty in Psych & CS
@ualberta.bsky.social
+
Amii.ca
Fellow! 🥳 Recruiting students to develop theories of cognition in natural & artificial systems 🤖💭🧠. Find me at
#NeurIPS2025
workshops (speaking
coginterp.github.io/neurips2025
& organising
@dataonbrainmind.bsky.social
)
4
105
28
Excited to be speaking at the SPIGM workshop at NeurIPS tomorrow, 10:30-11 am, Room 20C. My talk will be "Probabilistic Inference is the Future of Foundation Models". See you there!
spigmworkshopv3.github.io/schedule/
6 months ago
1
15
0
A nice list. But, it doesn't actually go much beyond electronics. In terms of quality of life, I think some of these "conveniences" are a downgrade in practice. I miss blockbuster. I miss watching my favourite shows when they aired on a TV schedule. I miss 90s gaming. I miss being able to unplug.
add a skeleton here at some point
8 months ago
2
11
0
My full interview with MLStreetTalk has just been posted. I really enjoyed this conversation! We talk about the bitter lesson, scientific discovery, Bayesian inference, mysterious phenomena, and key principles for building intelligent systems.
www.youtube.com/watch?v=M-jT...
loading . . .
The Real Reason Huge AI Models Actually Work
YouTube video by Machine Learning Street Talk
https://www.youtube.com/watch?v=M-jTeBCEGHc
9 months ago
1
33
6
I'm excited to be giving a keynote talk at the AutoML conference 9-10 am at Cornell Tech tomorrow! I'm presenting "Prescriptions for Universal Learning". I'll talk about how we can enable automation, which I'll argue is the defining feature of ML.
2025.automl.cc/program/
9 months ago
0
7
0
Research doesn't go in circles, but in spirals. We return to the same ideas, but in a different and augmented form.
9 months ago
0
23
1
reposted by
Andrew Gordon Wilson
NYU Center for Data Science
9 months ago
CDS/Courant Professor Andrew Gordon Wilson (
@andrewgwils.bsky.social
) argues mysterious behavior in deep learning can be explained by decades-old theory, not new paradigms: PAC-Bayes bounds, soft biases, and large models with a soft simplicity bias.
nyudatascience.medium.com/deep-learnin...
loading . . .
Deep Learning’s Most Puzzling Phenomena Can Be Explained by Decades-Old Theory
Andrew Gordon Wilson argues that many generalization phenomena in deep learning can be explained using decades-old theoretical tools.
https://nyudatascience.medium.com/deep-learnings-most-puzzling-phenomena-can-be-explained-by-decades-old-theory-91d4cf235a89
0
8
1
Regardless of whether you plan to use them in applications, everyone should learn about Gaussian processes, and Bayesian methods. They provide a foundation for reasoning about model construction and all sorts of deep learning behaviour that would otherwise appear mysterious.
10 months ago
3
54
6
A common takeaway from "the bitter lesson" is we don't need to put effort into encoding inductive biases, we just need compute. Nothing could be further from the truth! Better inductive biases mean better scaling exponents, which means exponential improvements with computation.
10 months ago
1
19
4
Gould mostly recorded baroque and early classical. He only recorded a single Chopin piece, as a one-off broadcast. But like many of his efforts, it's profoundly thought provoking, the end product as much Gould as it is Chopin. I love the last mvt (20:55+).
www.youtube.com/watch?v=NAHE...
loading . . .
Glenn Gould plays Chopin Piano Sonata No. 3 in B minor Op.58
YouTube video by The Piano Experience
https://www.youtube.com/watch?v=NAHE8PTR8tE
10 months ago
0
5
0
Whatever you do, just don't be boring.
10 months ago
1
4
1
I had a great time presenting "It's Time to Say Goodbye to Hard Constraints" at the Flatiron Institute. In this talk, I describe a philosophy for model construction in machine learning. Video now online!
www.youtube.com/watch?v=LxuN...
loading . . .
It's Time to Say Goodbye to Hard (equivariance) Constraints - Andrew Gordon Wilson
YouTube video by LoG Meetup NYC
https://www.youtube.com/watch?v=LxuNC3I7Fxg
11 months ago
0
14
3
Excited to be presenting my paper "Deep Learning is Not So Mysterious or Different" tomorrow at ICML, 11 am - 1:30 pm, East Exhibition Hall A-B, E-500. I made a little video overview as part of the ICML process (viewable from Chrome):
recorder-v3.slideslive.com#/share?share...
add a skeleton here at some point
11 months ago
0
25
6
Our new ICML paper discovers scaling collapse: through a simple affine transformation, whole training loss curves across model sizes with optimally scaled hypers collapse to a single universal curve! We explain the collapse, providing a diagnostic for model scaling.
arxiv.org/abs/2507.02119
1/3
11 months ago
3
30
5
Excited about our new ICML paper, showing how algebraic structure can be exploited for massive computational gains in population genetics.
add a skeleton here at some point
12 months ago
0
3
1
Machine learning is perhaps the only discipline that has become less mature over time. A reverse metamorphosis, from butterfly to caterpillar.
12 months ago
1
22
3
AI this, AI that, the implications of AI for X... can we just never talk about AI again?
12 months ago
1
10
0
Really excited about our new paper, "Why Masking Diffusion Works: Condition on the Jump Schedule for Improved Discrete Diffusion". We explain the mysterious success of masking diffusion to propose new diffusion models that work well in a variety settings, including proteins, images, and text!
add a skeleton here at some point
12 months ago
0
6
0
A really outstanding interview of Terence Tao, providing an introduction to many topics, including the math of general relativity (
youtube.com/watch?v=HUkB...
). I love relativity, and in a recent(ish) paper we also consider the wave maps equation (section 5,
arxiv.org/abs/2304.14994
).
loading . . .
Terence Tao: Hardest Problems in Mathematics, Physics & the Future of AI | Lex Fridman Podcast #472
YouTube video by Lex Fridman
https://youtube.com/watch?v=HUkBz-cdB-k
12 months ago
0
13
3
AI benchmarking culture is completely out of control. Tables with dozens of methods, datasets, and bold numbers, trying to answer a question that perhaps no one should be asking anymore.
about 1 year ago
1
18
6
We have a strong bias to overestimate the speed of technological innovation and impact. See past claims about autonomous driving, AI curing diseases... or the timeline in every sci-fi book ever written. Where is my flying car?
about 1 year ago
1
9
1
My new paper "Deep Learning is Not So Mysterious or Different":
arxiv.org/abs/2503.02113
. Generalization behaviours in deep learning can be intuitively understood through a notion of soft inductive biases, and formally characterized with countable hypothesis bounds! 1/12
over 1 year ago
6
210
59
I had a great time talking with
@anilananth.bsky.social
as part of the Simons Institute Polylogues. We cover universal learning, generalization phenomena, how transformers are both surprisingly general but also limited, and the difference between statistics and ML!
www.youtube.com/watch?v=Aja0...
loading . . .
Andrew Gordon Wilson | Polylogues
YouTube video by Simons Institute
https://www.youtube.com/watch?v=Aja0kZeWRy4
over 1 year ago
0
8
2
These DeepSeek results mostly just reflect the diminishing gap between open and closed models, such that any company with billions can start with llama as a baseline, make some tweaks, and appear like the next OpenAI. Going forward, data and scale won't be the decisive advantage.
over 1 year ago
2
16
1
It's not the size of your parameter space that matters, it's how you use it.
over 1 year ago
1
9
2
With interview season coming, don't despair. I conspicuously forgot the name of the place I was interviewing in a 1-1. I made sure to name drop the university a bunch in my job talk right after, just so my allies could be like "he really does know the name".
over 1 year ago
0
3
0
There's apparently another Andrew Wilson at NYU who teaches piano lessons. I get a lot of emails meant for him. Maybe I'll charge his rate minus $1.
over 1 year ago
0
6
0
reposted by
Andrew Gordon Wilson
Brandon Amos
over 1 year ago
📢 My team at Meta (including Yaron Lipman and Ricky Chen) is hiring a postdoctoral researcher to help us build the next generation of flow, transport, and diffusion models! Please apply here and message me:
www.metacareers.com/jobs/1459691...
loading . . .
Postdoctoral Researcher, Fundamental AI Research (PhD)
Meta's mission is to build the future of human connection and the technology that makes it possible.
https://www.metacareers.com/jobs/1459691901359421/
1
53
15
We're excited to announce the ICML 2025 call for workshops! The CFP and submission advice can be found at:
icml.cc/Conferences/...
. The deadline is Feb 10. Submit some creative proposals!
loading . . .
ICML 2025 Call for Workshops
https://icml.cc/Conferences/2025/CallForWorkshops
over 1 year ago
0
15
10
Happy New Year everyone! Excited for the year ahead.
over 1 year ago
0
5
0
Many of the greatest papers, now canonical works, have a story of resistance, tension, and, finally, a crucial advocate. It's shockingly common. Why is there a bias against excellence? And what happens to those papers, those people, when no one has the courage to advocate?
over 1 year ago
1
12
2
Research scientists using industry GPUs these days... "But Mr Garnier… we're scientists, we want to change the world. You have the finest GPUs that money can buy! You employ 3000 research staff."
www.youtube.com/watch?v=hdHF...
loading . . .
That Mitchell and Webb Look - The Garnier Laboratoire
YouTube video by fanvideos4u
https://www.youtube.com/watch?v=hdHFmc9oiKY
over 1 year ago
0
5
0
This is your monthly reminder that understanding deep learning does not require rethinking generalization, and it never did.
over 1 year ago
3
22
3
So excited about this new work on Bayesian optimization for antibody design! It works by teaching a generative model how the human immune system evolves antibodies for strong and stable binders. Satisfying mix of ML+Bio. Check out the great thread from
@alannawzadamin.bsky.social
and the paper!
add a skeleton here at some point
over 1 year ago
0
19
0
Excited for the
#NeurIPS2024
workshops today! I'll be speaking at: (1) Science of DL (panel, 3:10-4:10,
scienceofdlworkshop.github.io/schedule/
) (2) "Time Series in the Age of Large Models" (talk, 4:39-5:14,
neurips-time-series-workshop.github.io
).
loading . . .
Schedule | SciForDL'24
https://scienceofdlworkshop.github.io/schedule/
over 1 year ago
2
25
1
Load more
feeds!
log in