Jacob Eisenstein
@jacobeisenstein.bsky.social
📤 5571
📥 2365
📝 202
natural language processing and computational linguistics at google deepmind.
reposted by
Jacob Eisenstein
Tim Kellogg
9 days ago
this is the theme — you can’t have AGI without existing in and learning from the real world
1
20
2
reposted by
Jacob Eisenstein
Kaitlyn Zhou
28 days ago
No better time to start learning about that
#AI
thing everyone's talking about... 📢 I'm recruiting PhD students in Computer Science or Information Science
@cornellbowers.bsky.social
! If you're interested, apply to either department (yes, either program!) and list me as a potential advisor!
1
20
9
knowing how to tie your shoes or order a drink in a crowded bar: not agi naming the big five personality traits: definitely agi
add a skeleton here at some point
about 2 months ago
1
7
0
nice summary of everybody’s new fave
add a skeleton here at some point
about 2 months ago
0
4
1
Nicholas Carlini asking the right questions at
#COLM2025
about 2 months ago
0
5
0
reposted by
Jacob Eisenstein
Maria Antoniak
about 2 months ago
Here’s a
#COLM2025
feed! Pin it 📌 to follow along with the conference this week!
add a skeleton here at some point
2
26
18
reposted by
Jacob Eisenstein
Myra Cheng @ NeurIPS
2 months ago
AI always calling your ideas “fantastic” can feel inauthentic, but what are sycophancy’s deeper harms? We find that in the common use case of seeking AI advice on interpersonal situations—specifically conflicts—sycophancy makes people feel more right & less willing to apologize.
2
115
53
reposted by
Jacob Eisenstein
Pete Shaw
2 months ago
Excited to share a new paper that aims to narrow the conceptual gap between the idealized notion of Kolmogorov complexity and practical complexity measures for neural networks.
1
9
5
reposted by
Jacob Eisenstein
Joshua Raclaw
3 months ago
Cannot stress enough how good it is that you can come across a post about gorgeous little Yiddish book sitting in someone’s family collection, and within a few seconds you can find the full scanned version of the book available for free through the Yiddish Book Center’s website
add a skeleton here at some point
4
43
7
found some books at my parents’ house
3 months ago
2
20
2
add a skeleton here at some point
4 months ago
0
6
1
Baristas still safe from robotic automation, and not just because robots don’t know what coffee tastes like. prompt: “I’m trying to dial in this v60 of huatusco with my vario. temp / grind recommendations?”
4 months ago
1
3
0
I’d guess that the majority position of syntacticians about LLMs (and other NLP beforehand) is roughly what Chomsky says: language tech can’t possibly teach us anything about the human language capability, so whether the LLM writes well doesn’t matter at all.
add a skeleton here at some point
4 months ago
1
10
1
boston champaign pittsburgh atlanta, and, uh, let’s count seattle glad i did it, hope i don’t have to do it again
add a skeleton here at some point
4 months ago
0
2
1
reposted by
Jacob Eisenstein
Margaret Mitchell
4 months ago
🤖 ICYMI: Yesterday,
@hf.co
and OpenAI partnered to bring open source GPT to the public. This is a Big Deal in "AI world". Allow me to explain why. 🧵
huggingface.co/openai/gpt-o...
2
53
20
reposted by
Jacob Eisenstein
Ibn Bassal
4 months ago
roman burrito thread
add a skeleton here at some point
3
151
42
reposted by
Jacob Eisenstein
this is very cool and i’m looking forward to reading the paper, but a basic question about this data: isn’t it likely that a congressional rep’s speeches are written by a shifting cast of speechwriters over the course of their career? wouldn’t that explain adoption of new usages?
4 months ago
1
3
1
reposted by
Jacob Eisenstein
Gaurav Kamath
4 months ago
Our new paper in
#PNAS
(
bit.ly/4fcWfma
) presents a surprising finding—when words change meaning, older speakers rapidly adopt the new usage; inter-generational differences are often minor. w/ Michelle Yang, @sivareddyg.bsky.social ,
@msonderegger.bsky.social
and
@dallascard.bsky.social
👇(1/12)
3
34
19
There's a lot to like in this position paper - and not just the "whiff of Frankenstein" quote.
www.arxiv.org/abs/2507.06268
4 months ago
1
14
1
reposted by
Jacob Eisenstein
Maria Antoniak
4 months ago
What are your favorite recent papers on using LMs for annotation (especially in a loop with human annotators), synthetic data for task-specific prediction, active learning, and similar? Looking for practical methods for settings where human annotations are costly. A few examples in thread ↴
13
78
26
reposted by
Jacob Eisenstein
Ahmad Beirami
5 months ago
[Thu Jul 17] w/ Ananth Balashankar &
@jacobeisenstein.bsky.social
, we present a reinforcement learning framework in view of test-time scaling. We show how to optimally calibrate & transform rewards to obtain optimal performance with a given test-time algorithm.
add a skeleton here at some point
1
1
1
reposted by
Jacob Eisenstein
Ahmad Beirami
5 months ago
[Wed Jul 16] w/
@jacobeisenstein.bsky.social
& Alekh Agarwal, we present a theoretical characterization of best-of-N (a simple yet effective method for test-time scaling & alignment). Our results justify the widespread use of BoN as a strong baseline in this space.
add a skeleton here at some point
1
0
1
Cheap but noisy? Or accurate but expensive? How to split a limited annotation budget between different types of judges?👩⚖️🤖🦧
www.arxiv.org/abs/2506.07949
loading . . .
Cost-Optimal Active AI Model Evaluation
The development lifecycle of generative AI systems requires continual evaluation, data acquisition, and annotation, which is costly in both resources and time. In practice, rapid iteration often makes...
http://www.arxiv.org/abs/2506.07949
6 months ago
1
9
3
reposted by
Jacob Eisenstein
Ned Resnikoff
6 months ago
Everyone should check out People Time, the recording of his dates with Kenny Barron in Copenhagen three months before his death. Getz knew he was dying, and produced some of the most moving music of his career.
www.youtube.com/watch?v=c3jd...
add a skeleton here at some point
1
24
2
Great topic! looking forward to this
add a skeleton here at some point
6 months ago
0
3
0
most relatable parenting thread
add a skeleton here at some point
6 months ago
0
1
0
reposted by
Jacob Eisenstein
Ferenc Huszár
7 months ago
A new blog post with intuitions behind continuous-time Markov chains, a building block of diffusion language models, like
@inceptionlabs.bsky.social
's Mercury and Gemini Diffusion. This post touches on different ways of looking at Markov chains, connections to point processes, and more.
loading . . .
Discrete Diffusion: Continuous-Time Markov Chains
A tutorial explaining some key intuitions behind continuous time Markov chains for machine learners interested in discrete diffusion models: alternative representations, connections to point processes...
https://www.inference.vc/discrete-diffusion-continuous-time-markov-chains/
1
22
5
reposted by
Jacob Eisenstein
Stella Biderman
6 months ago
People keep plugging AI "Co-Scientists," so what happens when you ask them to do an important task like finding errors in papers? We built SPOT, a dataset of STEM manuscripts across 10 fields annotated with real errors to find out. (tl;dr not even close to usable)
#NLProc
arxiv.org/abs/2505.11855
4
119
32
‘intermediate tokens-often anthropomorphized as "thoughts" or reasoning traces’ 🌶️ but true! really glad to see work approaching inference scaling more skeptically and objectively
add a skeleton here at some point
7 months ago
0
13
2
reposted by
Jacob Eisenstein
Myra Cheng @ NeurIPS
7 months ago
Dear ChatGPT, Am I the Asshole? While Reddit users might say yes, your favorite LLM probably won’t. We present Social Sycophancy: a new way to understand and measure sycophancy as how LLMs overly preserve users' self-image.
6
136
35
Google Deepmind is hiring a research scientist in Seattle to work on foundational research in language!
job-boards.greenhouse.io/deepmind/job...
loading . . .
Research Scientist, Foundational Research in Language, USA
Seattle, Washington, US
https://job-boards.greenhouse.io/deepmind/jobs/6889774
7 months ago
0
9
2
reposted by
Jacob Eisenstein
Ahmad Beirami
7 months ago
#ICML2025
Is standard RLHF optimal in view of test-time scaling? Unsurprisingly no. We show a simple change to standard RLHF framework that involves 𝐫𝐞𝐰𝐚𝐫𝐝 𝐜𝐚𝐥𝐢𝐛𝐫𝐚𝐭𝐢𝐨𝐧 and 𝐫𝐞𝐰𝐚𝐫𝐝 𝐭𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐚𝐭𝐢𝐨𝐧 (suited to test-time procedure) is optimal!
add a skeleton here at some point
1
17
6
reposted by
Jacob Eisenstein
Naomi Saphra
7 months ago
uh oh! uh oh!
10
37
9
reposted by
Jacob Eisenstein
Gus
7 months ago
Gemma 3 explained: Longer context, image support, and a new 1B model. →
goo.gle/4lV8iaw
Other key enhancements: 🔸 Best model that fits in a single consumer GPU or TPU host 🔸 KV-cache memory reduction with 5-to-1 interleaved attention 🔸 And more! Read the blog for the full details on Gemma 3.
loading . . .
Gemma explained: What’s new in Gemma 3- Google Developers Blog
Google's Gemma 3 model includes vision-language support and architectural changes for resource-friendly multimodal language models.
https://goo.gle/4lV8iaw
1
22
8
lol
www.quantamagazine.org/when-chatgpt...
7 months ago
1
8
0
reposted by
Jacob Eisenstein
7 months ago
on my way back to NYC, i met wise Leon Bottou in the airport. we talked. then i told him "you should tweet that!" and, he delivered much more than a tweet: a blog post with thoughts and insights on AI research only he can deliver this clearly and succinctly.
leon.bottou.org/news/two_les...
1
45
6
Confabulation and overconfidence are still problems for LLMs (among others) but it is just not true that these models are somehow technically constrained to make stuff up rather than abstaining from answering
add a skeleton here at some point
7 months ago
0
7
0
reposted by
Jacob Eisenstein
Arianna Bisazza #EMNLP
8 months ago
Modern LLMs "speak" hundreds of languages... but do they really? Multilinguality claims are often based on downstream tasks like QA & MT, while *formal* linguistic competence remains hard to gauge in lots of languages Meet MultiBLiMP! (joint work w/
@jumelet.bsky.social
&
@weissweiler.bsky.social
)
add a skeleton here at some point
2
21
7
reposted by
Jacob Eisenstein
sarah jeong
8 months ago
ok so!!! something I found really interesting after talking to Korean protesters was... well, it was actually me realizing something about protest in the US: protest is trapped in an unwinnable spiral about "peaceful" versus "violent" protest, and even people who know it's dumb can't break free
add a skeleton here at some point
56
3723
1014
reposted by
Jacob Eisenstein
Thomas Steinke
about 1 year ago
I'm going to slowly repost my math notes from the other site🐦 here🦋; it's the only thing I posted over there that I think may have some long-term value & worth not deleting. These started out as notes for myself, but people seem to appreciate them. 😅 I'll keep track of all of them in this thread.
6
202
32
reposted by
Jacob Eisenstein
Jeff Dean
8 months ago
🥁Introducing Gemini 2.5, our most intelligent model with impressive capabilities in advanced reasoning and coding. Now integrating thinking capabilities, 2.5 Pro Experimental is our most performant Gemini model yet. It’s #1 on the LM Arena leaderboard. 🥇
34
215
76
We all want LLMs to collaborate with humans to help them achieve their goals. But LLMs are not trained to collaborate, they are trained to imitate. Can we teach LM agents to help humans by first making them help each other?
arxiv.org/abs/2503.14481
loading . . .
Don't lie to your friends: Learning what you know from collaborative self-play
To be helpful assistants, AI agents must be aware of their own capabilities and limitations. This includes knowing when to answer from parametric knowledge versus using tools, when to trust tool outpu...
https://arxiv.org/abs/2503.14481
8 months ago
1
56
20
This gets at something that’s been bothering me about LLM “reasoning” models: while inference scaling is cool and useful (number goes up), what is the cognitive process that reasoning models are supposed to be replicating and what’s the evidence that they’re actually doing that?
add a skeleton here at some point
9 months ago
4
36
8
reposted by
Jacob Eisenstein
Somesh Jha
9 months ago
Nicholas Carlini moves to Anthrophic.
nicholas.carlini.com/writing/2025...
loading . . .
Career Update: Google DeepMind -> Anthropic
TODO
https://nicholas.carlini.com/writing/2025/career-update.html
0
18
6
reposted by
Jacob Eisenstein
Aphreditto 🔆 Diary of a Trans Girl
9 months ago
boomer take but apps that automatically turn ascii smiley faces into emojis instead are destroying cherished relics of the internet's golden age
22
566
75
reposted by
Jacob Eisenstein
Yoshitomo Matsubara
9 months ago
[Update & Hiring] Last month, I joined Search and Recommendation Science team at Yahoo! as a Research Scientist🚀 Our team is hiring another Research Scientist🚀🚀 Send me a DM if you have strong research background in deep learning and NLP and want to be considered for the position🙋🙋🙋
4
9
3
reposted by
Jacob Eisenstein
Brutalismbot
10 months ago
Brutalist Slide, Bucharest - it's still standing [OC]
r/brutalism
3
92
22
reposted by
Jacob Eisenstein
Jonathan Lambert
10 months ago
Dear NSF scientists and staff: If you're looking to talk to a journalist about what's going on at the agency, please reach out (from a non-governmental phone/computer) via signal @jonlambert.12
1
158
117
reposted by
Jacob Eisenstein
Yoav Artzi
10 months ago
I am looking for a postdoc. A serious-looking call coming soon, but this is to get it going. Topics include (but not limited to): LLMs (🫢!), multimodal LLMs, interaction+learning, RL, intersection with cogsci, ... see our work to get an idea:
yoavartzi.com/pubs
Plz RT 🙏
loading . . .
Publications
https://yoavartzi.com/pubs
1
24
15
reposted by
Jacob Eisenstein
interesting results, but i don’t agree with framing this as “bias”, which in this context is typically used in relation to stereotypes. “don’t trust people from new jersey” is bias; “a four day work week would be give people more time with their families” is something else.
10 months ago
2
9
1
Load more
feeds!
log in