Jacob Eisenstein
@jacobeisenstein.bsky.social
📤 5540
📥 2337
📝 199
natural language processing and computational linguistics at google deepmind.
reposted by
Jacob Eisenstein
Joshua Raclaw
about 1 month ago
Cannot stress enough how good it is that you can come across a post about gorgeous little Yiddish book sitting in someone’s family collection, and within a few seconds you can find the full scanned version of the book available for free through the Yiddish Book Center’s website
add a skeleton here at some point
4
44
7
found some books at my parents’ house
about 1 month ago
2
20
2
add a skeleton here at some point
about 1 month ago
0
6
1
Baristas still safe from robotic automation, and not just because robots don’t know what coffee tastes like. prompt: “I’m trying to dial in this v60 of huatusco with my vario. temp / grind recommendations?”
about 2 months ago
1
3
0
I’d guess that the majority position of syntacticians about LLMs (and other NLP beforehand) is roughly what Chomsky says: language tech can’t possibly teach us anything about the human language capability, so whether the LLM writes well doesn’t matter at all.
add a skeleton here at some point
about 2 months ago
1
10
1
boston champaign pittsburgh atlanta, and, uh, let’s count seattle glad i did it, hope i don’t have to do it again
add a skeleton here at some point
about 2 months ago
0
2
1
reposted by
Jacob Eisenstein
Margaret Mitchell
about 2 months ago
🤖 ICYMI: Yesterday,
@hf.co
and OpenAI partnered to bring open source GPT to the public. This is a Big Deal in "AI world". Allow me to explain why. 🧵
huggingface.co/openai/gpt-o...
2
53
20
reposted by
Jacob Eisenstein
Ibn Bassal
2 months ago
roman burrito thread
add a skeleton here at some point
3
144
38
reposted by
Jacob Eisenstein
this is very cool and i’m looking forward to reading the paper, but a basic question about this data: isn’t it likely that a congressional rep’s speeches are written by a shifting cast of speechwriters over the course of their career? wouldn’t that explain adoption of new usages?
2 months ago
1
3
1
reposted by
Jacob Eisenstein
Gaurav Kamath
2 months ago
Our new paper in
#PNAS
(
bit.ly/4fcWfma
) presents a surprising finding—when words change meaning, older speakers rapidly adopt the new usage; inter-generational differences are often minor. w/ Michelle Yang, @sivareddyg.bsky.social ,
@msonderegger.bsky.social
and
@dallascard.bsky.social
👇(1/12)
3
33
18
There's a lot to like in this position paper - and not just the "whiff of Frankenstein" quote.
www.arxiv.org/abs/2507.06268
2 months ago
1
14
1
reposted by
Jacob Eisenstein
Maria Antoniak
2 months ago
What are your favorite recent papers on using LMs for annotation (especially in a loop with human annotators), synthetic data for task-specific prediction, active learning, and similar? Looking for practical methods for settings where human annotations are costly. A few examples in thread ↴
13
76
26
reposted by
Jacob Eisenstein
Ahmad Beirami
3 months ago
[Thu Jul 17] w/ Ananth Balashankar &
@jacobeisenstein.bsky.social
, we present a reinforcement learning framework in view of test-time scaling. We show how to optimally calibrate & transform rewards to obtain optimal performance with a given test-time algorithm.
add a skeleton here at some point
1
1
1
reposted by
Jacob Eisenstein
Ahmad Beirami
3 months ago
[Wed Jul 16] w/
@jacobeisenstein.bsky.social
& Alekh Agarwal, we present a theoretical characterization of best-of-N (a simple yet effective method for test-time scaling & alignment). Our results justify the widespread use of BoN as a strong baseline in this space.
add a skeleton here at some point
1
0
1
Cheap but noisy? Or accurate but expensive? How to split a limited annotation budget between different types of judges?👩⚖️🤖🦧
www.arxiv.org/abs/2506.07949
loading . . .
Cost-Optimal Active AI Model Evaluation
The development lifecycle of generative AI systems requires continual evaluation, data acquisition, and annotation, which is costly in both resources and time. In practice, rapid iteration often makes...
http://www.arxiv.org/abs/2506.07949
4 months ago
1
9
3
reposted by
Jacob Eisenstein
Ned Resnikoff
4 months ago
Everyone should check out People Time, the recording of his dates with Kenny Barron in Copenhagen three months before his death. Getz knew he was dying, and produced some of the most moving music of his career.
www.youtube.com/watch?v=c3jd...
add a skeleton here at some point
1
24
2
Great topic! looking forward to this
add a skeleton here at some point
4 months ago
0
3
0
most relatable parenting thread
add a skeleton here at some point
4 months ago
0
1
0
reposted by
Jacob Eisenstein
Ferenc Huszár
4 months ago
A new blog post with intuitions behind continuous-time Markov chains, a building block of diffusion language models, like
@inceptionlabs.bsky.social
's Mercury and Gemini Diffusion. This post touches on different ways of looking at Markov chains, connections to point processes, and more.
loading . . .
Discrete Diffusion: Continuous-Time Markov Chains
A tutorial explaining some key intuitions behind continuous time Markov chains for machine learners interested in discrete diffusion models: alternative representations, connections to point processes...
https://www.inference.vc/discrete-diffusion-continuous-time-markov-chains/
1
22
5
reposted by
Jacob Eisenstein
Stella Biderman
4 months ago
People keep plugging AI "Co-Scientists," so what happens when you ask them to do an important task like finding errors in papers? We built SPOT, a dataset of STEM manuscripts across 10 fields annotated with real errors to find out. (tl;dr not even close to usable)
#NLProc
arxiv.org/abs/2505.11855
4
121
33
‘intermediate tokens-often anthropomorphized as "thoughts" or reasoning traces’ 🌶️ but true! really glad to see work approaching inference scaling more skeptically and objectively
add a skeleton here at some point
4 months ago
0
13
2
reposted by
Jacob Eisenstein
Myra Cheng
4 months ago
Dear ChatGPT, Am I the Asshole? While Reddit users might say yes, your favorite LLM probably won’t. We present Social Sycophancy: a new way to understand and measure sycophancy as how LLMs overly preserve users' self-image.
6
136
35
Google Deepmind is hiring a research scientist in Seattle to work on foundational research in language!
job-boards.greenhouse.io/deepmind/job...
loading . . .
Research Scientist, Foundational Research in Language, USA
Seattle, Washington, US
https://job-boards.greenhouse.io/deepmind/jobs/6889774
4 months ago
0
9
2
reposted by
Jacob Eisenstein
Ahmad Beirami
5 months ago
#ICML2025
Is standard RLHF optimal in view of test-time scaling? Unsurprisingly no. We show a simple change to standard RLHF framework that involves 𝐫𝐞𝐰𝐚𝐫𝐝 𝐜𝐚𝐥𝐢𝐛𝐫𝐚𝐭𝐢𝐨𝐧 and 𝐫𝐞𝐰𝐚𝐫𝐝 𝐭𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐚𝐭𝐢𝐨𝐧 (suited to test-time procedure) is optimal!
add a skeleton here at some point
1
17
6
reposted by
Jacob Eisenstein
Naomi Saphra
5 months ago
uh oh! uh oh!
10
37
9
reposted by
Jacob Eisenstein
Gus
5 months ago
Gemma 3 explained: Longer context, image support, and a new 1B model. →
goo.gle/4lV8iaw
Other key enhancements: 🔸 Best model that fits in a single consumer GPU or TPU host 🔸 KV-cache memory reduction with 5-to-1 interleaved attention 🔸 And more! Read the blog for the full details on Gemma 3.
loading . . .
Gemma explained: What’s new in Gemma 3- Google Developers Blog
Google's Gemma 3 model includes vision-language support and architectural changes for resource-friendly multimodal language models.
https://goo.gle/4lV8iaw
1
22
8
lol
www.quantamagazine.org/when-chatgpt...
5 months ago
1
8
0
reposted by
Jacob Eisenstein
5 months ago
on my way back to NYC, i met wise Leon Bottou in the airport. we talked. then i told him "you should tweet that!" and, he delivered much more than a tweet: a blog post with thoughts and insights on AI research only he can deliver this clearly and succinctly.
leon.bottou.org/news/two_les...
1
45
6
Confabulation and overconfidence are still problems for LLMs (among others) but it is just not true that these models are somehow technically constrained to make stuff up rather than abstaining from answering
add a skeleton here at some point
5 months ago
0
7
0
reposted by
Jacob Eisenstein
Arianna Bisazza
6 months ago
Modern LLMs "speak" hundreds of languages... but do they really? Multilinguality claims are often based on downstream tasks like QA & MT, while *formal* linguistic competence remains hard to gauge in lots of languages Meet MultiBLiMP! (joint work w/
@jumelet.bsky.social
&
@weissweiler.bsky.social
)
add a skeleton here at some point
2
21
7
reposted by
Jacob Eisenstein
sarah jeong
6 months ago
ok so!!! something I found really interesting after talking to Korean protesters was... well, it was actually me realizing something about protest in the US: protest is trapped in an unwinnable spiral about "peaceful" versus "violent" protest, and even people who know it's dumb can't break free
add a skeleton here at some point
56
3734
1015
reposted by
Jacob Eisenstein
Thomas Steinke
11 months ago
I'm going to slowly repost my math notes from the other site🐦 here🦋; it's the only thing I posted over there that I think may have some long-term value & worth not deleting. These started out as notes for myself, but people seem to appreciate them. 😅 I'll keep track of all of them in this thread.
6
203
32
reposted by
Jacob Eisenstein
Jeff Dean
6 months ago
🥁Introducing Gemini 2.5, our most intelligent model with impressive capabilities in advanced reasoning and coding. Now integrating thinking capabilities, 2.5 Pro Experimental is our most performant Gemini model yet. It’s #1 on the LM Arena leaderboard. 🥇
34
215
76
We all want LLMs to collaborate with humans to help them achieve their goals. But LLMs are not trained to collaborate, they are trained to imitate. Can we teach LM agents to help humans by first making them help each other?
arxiv.org/abs/2503.14481
loading . . .
Don't lie to your friends: Learning what you know from collaborative self-play
To be helpful assistants, AI agents must be aware of their own capabilities and limitations. This includes knowing when to answer from parametric knowledge versus using tools, when to trust tool outpu...
https://arxiv.org/abs/2503.14481
6 months ago
1
56
20
This gets at something that’s been bothering me about LLM “reasoning” models: while inference scaling is cool and useful (number goes up), what is the cognitive process that reasoning models are supposed to be replicating and what’s the evidence that they’re actually doing that?
add a skeleton here at some point
6 months ago
4
36
8
reposted by
Jacob Eisenstein
Somesh Jha
7 months ago
Nicholas Carlini moves to Anthrophic.
nicholas.carlini.com/writing/2025...
loading . . .
Career Update: Google DeepMind -> Anthropic
TODO
https://nicholas.carlini.com/writing/2025/career-update.html
0
18
6
reposted by
Jacob Eisenstein
Aphreditto 🔆 Diary of a Trans Girl
7 months ago
boomer take but apps that automatically turn ascii smiley faces into emojis instead are destroying cherished relics of the internet's golden age
22
570
75
reposted by
Jacob Eisenstein
Yoshitomo Matsubara
7 months ago
[Update & Hiring] Last month, I joined Search and Recommendation Science team at Yahoo! as a Research Scientist🚀 Our team is hiring another Research Scientist🚀🚀 Send me a DM if you have strong research background in deep learning and NLP and want to be considered for the position🙋🙋🙋
4
9
3
reposted by
Jacob Eisenstein
Brutalismbot
7 months ago
Brutalist Slide, Bucharest - it's still standing [OC]
r/brutalism
3
92
22
reposted by
Jacob Eisenstein
Jonathan Lambert
8 months ago
Dear NSF scientists and staff: If you're looking to talk to a journalist about what's going on at the agency, please reach out (from a non-governmental phone/computer) via signal @jonlambert.12
1
158
118
reposted by
Jacob Eisenstein
Yoav Artzi
8 months ago
I am looking for a postdoc. A serious-looking call coming soon, but this is to get it going. Topics include (but not limited to): LLMs (🫢!), multimodal LLMs, interaction+learning, RL, intersection with cogsci, ... see our work to get an idea:
yoavartzi.com/pubs
Plz RT 🙏
loading . . .
Publications
https://yoavartzi.com/pubs
1
24
15
reposted by
Jacob Eisenstein
interesting results, but i don’t agree with framing this as “bias”, which in this context is typically used in relation to stereotypes. “don’t trust people from new jersey” is bias; “a four day work week would be give people more time with their families” is something else.
8 months ago
2
8
1
reposted by
Jacob Eisenstein
Paul Röttger @ ACL
8 months ago
Are LLMs biased when they write about political issues? We just released IssueBench – the largest, most realistic benchmark of its kind – to answer this question more robustly than ever before. Long 🧵with spicy results 👇
4
81
30
reposted by
Jacob Eisenstein
Ziteng Sun
8 months ago
Inference-time procedures (e.g. Best-of-N, CoT) have been instrumental to recent development of LLMs. Standard RLHF focuses only on improving the trained model. This creates a train/inference mismatch. 𝘊𝘢𝘯 𝘸𝘦 𝘢𝘭𝘪𝘨𝘯 𝘰𝘶𝘳 𝘮𝘰𝘥𝘦𝘭 𝘵𝘰 𝘣𝘦𝘵𝘵𝘦𝘳 𝘴𝘶𝘪𝘵 𝘢 𝘨𝘪𝘷𝘦𝘯 𝘪𝘯𝘧𝘦𝘳𝘦𝘯𝘤𝘦-𝘵𝘪𝘮𝘦 𝘱𝘳𝘰𝘤𝘦𝘥𝘶𝘳𝘦? Check out below.
1
25
10
reposted by
Jacob Eisenstein
Pablo Samuel Castro
8 months ago
Can LLMs be used to discover interpretable models of human and animal behavior?🤔 Turns out: yes! Thrilled to share our latest preprint where we used FunSearch to automatically discover symbolic cognitive models of behavior. 1/12
3
134
55
reposted by
Jacob Eisenstein
Dr. Marissa Kawehi Loving
8 months ago
Y’all they cancelled the NSF MPS Ascend Program after encouraging underrepresented folks to only apply to it and not both the Ascend and the MSPRF. This is astronomically fucked up for many grad students and postdocs of color in math.
#MathSky
12
383
188
reposted by
Jacob Eisenstein
Ahmad Beirami
8 months ago
𝐛𝐞𝐬𝐭-𝐨𝐟-𝐧 is a strong baseline for - improving agents - scaling inference-time compute - preference alignment - jailbreaking models How does 𝐁𝐨𝐧 work? and why is it so strong? Find some answers in the paper we wrote over two Christmas breaks!🧵
2
44
10
If you need a break from all the shitty national and international news, check out this amazing story about innovative things we’re doing with poop here in seattle
add a skeleton here at some point
8 months ago
1
10
1
reposted by
Jacob Eisenstein
Olúfẹ́mi O. Táíwò
8 months ago
"Our politics revolve around the idea that scarce resources mean keeping people out. We are utterly unprepared for a world in which perhaps the scarcest resource will be people." -
@polgreen.bsky.social
www.nytimes.com/2025/01/31/o...
loading . . .
Opinion | Migration Is Remaking Our World, and We Don’t Understand It at All (Gift Article)
Migration is central to our politics and our world, but nobody really understands it.
https://www.nytimes.com/2025/01/31/opinion/trump-migration-world.html?unlocked_article_code=1.tU4.f9sa.w8Lo1eZA4ZXV&smid=url-share
3
241
73
reposted by
Jacob Eisenstein
Carl Zimmer
8 months ago
National Science Foundation suspends salary payments, leaving researchers unable to pay their bills
www.statnews.com/2025/01/30/t...
Story by
@ericboodman.bsky.social
loading . . .
National Science Foundation suspends salary payments, leaving researchers unable to pay their bills
An NSF online payment system remained down after the federal funding freeze was lifted, leaving early-career scientists scrambling to pay bills
https://www.statnews.com/2025/01/30/trump-funding-freeze-national-science-foundation-suspends-salary-payments/
59
1464
1129
Load more
feeds!
log in