Alex Turner
@turntrout.bsky.social
📤 720
📥 6
📝 108
Research scientist at Google DeepMind. All opinions are my own.
https://turntrout.com
reposted by
Alex Turner
The Tennessee Holler
15 days ago
“Heartbroken but also very angry… sickening lies by the administration… reprehensible and disgusting… Alex is clearly not holding a gun… please get the truth out about our son, he was a good man…”
312
14169
6709
One of the more amusing bugs I've seen on my website. At build time, I run a command that counts how many commits I've made and inserts the count into the HTML. However, the deployment machine checked out a shallow version of my repo which didn't have any history.
15 days ago
1
2
0
If (for whatever reason) you want to communicate without the US government listening in... I wrote a comprehensive guide which focuses on the most important steps first.
turntrout.com/privacy-desp...
loading . . .
An Opinionated Guide to Privacy Despite Authoritarianism
In 2025, America is different. Reduce your chance of persecution via smart technical choices.
https://turntrout.com/privacy-despite-authoritarianism
15 days ago
0
0
0
Historically this account is for alignment research and and not politics, but it'll be pretty hard to do good research in a "masked men execute civilians on the street" political environment, ya know? That possibility grows & you should plan for it
15 days ago
0
5
0
reposted by
Alex Turner
just Dave
16 days ago
142
19323
5887
"AI danger comes from reality, not from AI psychology; the danger is intrinsic to the impressive tasks we need AGI to do" -- argument I hear sometimes. Some truth to it but overall wrong, You cannot, cannot predict AI doom without taking a stance on AI psychology!
turntrout.com/instrumental...
loading . . .
No Instrumental Convergence without AI Psychology
Instrumental and success-conditioned convergence both require AI psychology assumptions, so neither is just a "fact about reality."
https://turntrout.com/instrumental-convergence-requires-psychology-assumptions
19 days ago
1
3
1
I pledged 10% of my post-tax income to effective charities, for the rest of my life. I encourage you to think about what you, personally, can do to improve this world.
19 days ago
0
6
1
Come work with me and Alex Cloud this summer in Team Shard at MATS! We have fun, consistently make real alignment progress (we pioneered steering vectors in 2023!), and help scholars tap into their latent abilities.
28 days ago
1
0
0
We saw masked men blow out the face of Renee Good, an innocent mother and US citizen. That happened. The government and some of the news calls this an "enforcement action" or a "fiery confrontation." We live under heavy propaganda and sanity washing (thread)
30 days ago
1
2
0
“If your reward is misspecified, you’re doomed” Maybe not! You can reduce specification gaming with a simple prompt swap during RL, no reward signal improvements needed. Developed concurrent to inoculation prompting, but with RL & prompt contrasting. Presenting: ✨Recontextualization ✨ 🧵
about 2 months ago
1
7
0
The first pretraining results are in, and it looks like models indeed have self-fulfilling misalignment properties. Great work by Tice et al!
alignmentpretraining.ai
loading . . .
Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment
LLMs trained on data about misaligned AIs themselves become less aligned. Luckily, pretraining LLMs with synthetic data about good AIs helps them become more aligned. These alignment priors persist…
https://alignmentpretraining.ai/
about 2 months ago
0
6
1
Modern “reward hacking” does NOT show that reward is the optimization target! Such "reward hacking" is almost entirely specification gaming, not the "reward optimization" I addressed in 2022. /Thread/
about 2 months ago
1
1
0
If an AI kills everyone because it ruthlessly optimized its reward signal because it was trained to predict text of people saying "RL trains AIs to ruthlessly optimize the reward signal" I'll literally die to my pet peeve (people saying "RL trains AI to maximize reward")
about 2 months ago
0
1
0
I just donated $5,200 (+100% employer match from Google) to Civitech (501c3) for their incubator project. Smart analysts I know recommend them as a highly cost-effective way to protect American democracy. đź§µ on my donations this year (Recreated for technical reasons)
about 2 months ago
1
2
0
This is THE key fact amongst the noise and scandal. I generally like Newsom's actions because he responds to that fact --- he seems to take it seriously.
add a skeleton here at some point
about 2 months ago
0
0
0
Gotta say, I was disturbed by Invisible AI's booth at
#NeurIPS
. Employees dressed as cows advertising how they use AI to optimize factory farming (a torture facility for cows). Bad taste
2 months ago
1
4
0
I made accessible design easier by writing alt-text-llm, an AI-powered tool for generating and managing alt text in markdown files. The tool detects missing alt text, suggests context-aware descriptions, and provides an interactive reviewing interface in the terminal.
3 months ago
1
2
0
Self-fulfilling alignment? (image credit: Quintin Pope)
turntrout.com/self-fulfill...
3 months ago
0
4
0
“Output-based training will keep chains-of-thought honest.” Sadly, NO. We show that training on *just the output* can still cause models to hide unwanted behavior in their chain-of-thought. MATS 8.0 Team Shard presents: a 🧵
3 months ago
1
3
1
New Google DeepMind paper: "Consistency Training Helps Stop Sycophancy and Jailbreaks" by
@alexirpan.bsky.social
, me, Mark Kurzeja, David Elson, and Rohin Shah. (thread)
3 months ago
1
18
6
"Authoritarianism can't happen here." Sadly, I think that it IS happening here. Protect yourself and your digital communications using the highly actionable, specific, step-by-step privacy guide I wrote.
3 months ago
2
3
0
Want to get into alignment research? Alex Cloud & I mentor *Team Shard*, responsible for gradient routing, steering vectors, MELBO, and a new unlearning technique (TBA) :) We discover new research subfields. Apply for mentorship this summer at
forms.matsprogram.org/turner-app-8
11 months ago
0
6
2
This book is really fun & informative. I have solid understanding of a bunch of my body's processes now. &I can just start reading random physiology Wikipedia pages and be able to roughly follow. :) My review with insights and my remaining confusions:
turntrout.com/insights-fro...
loading . . .
Insights From “The Manga Guide to Physiology”
This book breaks down complex physiology into digestible parts, using charming visuals & clear explanations. You might be surprised how much you can learn!
https://turntrout.com/insights-from-physiology
about 1 year ago
0
2
0
Mark Kurzeja & I exploited weaknesses in multiple-choice TruthfulQA dataset while hiding the questions! A few simple rules of thumb achieved 79% accuracy. Even well-regarded benchmarks can have flaws. Kudos to the authors for addressing this! Read at
turntrout.com/original-tru...
loading . . .
Gaming TruthfulQA: Simple Heuristics Exposed Dataset Weaknesses
Common factuality benchmark was easily gamed using our simple decision tree. The benchmark is now updated.
https://turntrout.com/original-truthfulqa-weaknesses
about 1 year ago
1
3
0
1) AIs are trained as black boxes, making it hard to understand or control their behavior. This is bad for safety! But what is an alternative? Our idea: train structure into a neural network by configuring which components update on different tasks. We call it "gradient routing."
about 1 year ago
1
16
6
you reached the end!!
feeds!
log in