Alex Turner
@turntrout.bsky.social
📤 640
📥 5
📝 42
Research scientist at Google DeepMind. All opinions are my own.
https://turntrout.com
New Google DeepMind paper: "Consistency Training Helps Stop Sycophancy and Jailbreaks" by
@alexirpan.bsky.social
, me, Mark Kurzeja, David Elson, and Rohin Shah. (thread)
5 days ago
1
15
6
"Authoritarianism can't happen here." Sadly, I think that it IS happening here. Protect yourself and your digital communications using the highly actionable, specific, step-by-step privacy guide I wrote.
10 days ago
2
2
0
Want to get into alignment research? Alex Cloud & I mentor *Team Shard*, responsible for gradient routing, steering vectors, MELBO, and a new unlearning technique (TBA) :) We discover new research subfields. Apply for mentorship this summer at
forms.matsprogram.org/turner-app-8
8 months ago
0
5
2
This book is really fun & informative. I have solid understanding of a bunch of my body's processes now. &I can just start reading random physiology Wikipedia pages and be able to roughly follow. :) My review with insights and my remaining confusions:
turntrout.com/insights-fro...
loading . . .
Insights From “The Manga Guide to Physiology”
This book breaks down complex physiology into digestible parts, using charming visuals & clear explanations. You might be surprised how much you can learn!
https://turntrout.com/insights-from-physiology
10 months ago
0
2
0
Mark Kurzeja & I exploited weaknesses in multiple-choice TruthfulQA dataset while hiding the questions! A few simple rules of thumb achieved 79% accuracy. Even well-regarded benchmarks can have flaws. Kudos to the authors for addressing this! Read at
turntrout.com/original-tru...
loading . . .
Gaming TruthfulQA: Simple Heuristics Exposed Dataset Weaknesses
Common factuality benchmark was easily gamed using our simple decision tree. The benchmark is now updated.
https://turntrout.com/original-truthfulqa-weaknesses
10 months ago
1
3
0
1) AIs are trained as black boxes, making it hard to understand or control their behavior. This is bad for safety! But what is an alternative? Our idea: train structure into a neural network by configuring which components update on different tasks. We call it "gradient routing."
11 months ago
1
16
6
you reached the end!!
feeds!
log in