akbir khan
@akbir.bsky.social
📤 356
📥 156
📝 30
dumbest overseer at @anthropic
https://www.akbir.dev
reposted by
akbir khan
Epoch AI
8 months ago
We’ve added four new benchmarks to the Epoch AI Benchmarking Hub: Aider Polyglot, WeirdML, Balrog, and Factorio Learning Environment! Before we only featured our own evaluation results, but this new data comes from trusted external leaderboards. And we've got more on the way 🧵
1
5
2
reposted by
akbir khan
Epoch AI
8 months ago
4. Factorio Learning Environment by Jack Hopkins, Märt Bakler , and
@akbir.bsky.social
This benchmark uses the factory-building game Factorio to test complex, long-term planning, with settings for lab-play (structured tasks) and open-play (unbounded growth).
jackhopkins.github.io/factorio-lea...
loading . . .
Factorio Learning Environment
Claude Sonnet 3.5 builds factories
https://jackhopkins.github.io/factorio-learning-environment/
1
3
1
reposted by
akbir khan
Johannes Gasteiger🔸
10 months ago
New Anthropic blog post: Subtle sabotage in automated researchers. As AI systems increasingly assist with AI research, how do we ensure they're not subtly sabotaging that research? We show that malicious models can undermine ML research tasks in ways that are hard to detect.
1
4
3
control is a complimentary approach to alignment. its really sensible, practical and can be done now, even before systems are superintelligent.
youtu.be/6Unxqr50Kqg?...
loading . . .
Controlling powerful AI
YouTube video by Anthropic
https://youtu.be/6Unxqr50Kqg?si=n_iFYvyIpPE2tgqT
10 months ago
0
4
1
reposted by
akbir khan
Ethan Mollick
11 months ago
This is a crazy paper. Fine-tuning a big GPT-4o on a small amount of insecure code or even "bad numbers" (like 666) makes them misaligned in almost everything else. They are more likely to start offering misinformation, spouting anti-human values, and talk about admiring dictators. Why is unclear.
7
214
62
www.anthropic.com/news/paris-a...
loading . . .
Statement from Dario Amodei on the Paris AI Action Summit
A call for greater focus and urgency
https://www.anthropic.com/news/paris-ai-summit
11 months ago
0
1
0
This is the entire goal
add a skeleton here at some point
12 months ago
0
5
0
darioamodei.com/on-deepseek-...
loading . . .
Dario Amodei — On DeepSeek and Export Controls
On DeepSeek and Export Controls
https://darioamodei.com/on-deepseek-and-export-controls
12 months ago
0
3
1
reposted by
akbir khan
Hank Green
12 months ago
The fact that Deepseek R1 was released three days /before/ Stargate means these guys stood in front of Trump and said they needed half a trillion dollars while they knew R1 was open source and trained for $5M. Beautiful.
400
13908
1894
reposted by
akbir khan
Zack Witten
12 months ago
Can anyone get a shorter DeepSeek R1 CoT than this?
3
17
1
reposted by
akbir khan
Tom Everitt
12 months ago
Process based supervision done right, and with pretty CIDs to illustrate :)
add a skeleton here at some point
0
8
1
reposted by
akbir khan
Mark Riedl
12 months ago
I don’t really have the energy for politics right now. So I will observe without comment: Executive Order 14110 was revoked (Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence)
2
96
43
R1 model is impressive
12 months ago
0
2
0
reposted by
akbir khan
Sarah Jones
about 1 year ago
429
33037
7161
reposted by
akbir khan
/\__/\__/\__/
about 1 year ago
152
23206
4309
fuck the tabloids were right
www.nytimes.com/2025/01/15/t...
loading . . .
She Is in Love With ChatGPT
A 28-year-old woman with a busy social life spends hours on end talking to her A.I. boyfriend for advice and consolation. And yes, they do have sex.
https://www.nytimes.com/2025/01/15/technology/ai-chatgpt-boyfriend-companion.html
about 1 year ago
0
2
0
reposted by
akbir khan
Ethan Mollick
about 1 year ago
New randomized, controlled trial by the World Bank of students using GPT-4 as a tutor in Nigeria. Six weeks of after-school AI tutoring = 2 years of typical learning gains, outperforming 80% of other educational interventions. And it helped all students, especially girls who were initially behind.
15
354
115
reposted by
akbir khan
Ethan Mollick
about 1 year ago
Generative AI has flaws and biases, and there is a tendency for academics to fix on that (85% of equity LLM papers focus on harms)… …yet in many ways LLMs are uniquely powerful among new technologies for helping people equitably in education and healthcare. We need an urgent focus on how to do that
2
69
15
reposted by
akbir khan
Ethan Mollick
about 1 year ago
On one hand, this paper finds adding inference-time compute (like o1 does) improves medical reasoning, which is an important finding suggesting a way to continue to improve AI performance in medicine On the other hand, scientific illustrations are apparently just anime now
arxiv.org/pdf/2501.06458
2
71
7
my metabolism is noticeably higher in london than the bay.
about 1 year ago
0
2
0
What can AI researchers do *today* that AI developers will find useful for ensuring the safety of future advanced AI systems? To ring in the new year, the Anthropic Alignment Science team is sharing some thoughts on research directions we think are important.
alignment.anthropic.com/2025/recomme...
loading . . .
Recommendations for Technical AI Safety Research Directions
https://alignment.anthropic.com/2025/recommended-directions/
about 1 year ago
1
22
8
reposted by
akbir khan
Hank Green
about 1 year ago
My hottest take is that nothing makes any sense at all outside of the context of the constantly increasing value of human life, but that increase in value is so invisible (and exists in a world that was built for previous, lower values) that we constantly think the opposite has happened.
56
1777
91
reposted by
akbir khan
Zack Witten
about 1 year ago
darioamodei.com/machines-of-...
loading . . .
Dario Amodei — Machines of Loving Grace
How AI Could Transform the World for the Better
https://darioamodei.com/machines-of-loving-grace
0
3
1
Nothing kills my excitement of returning to the US like the response i get from CBP officers.
about 1 year ago
1
7
0
reposted by
akbir khan
Andrew Lampinen
about 1 year ago
Felix Hill was such an incredible mentor — and occasional cold water swimming partner — to me. He's a huge part of why I joined DeepMind and how I've come to approach research. Even a month later, it's still hard to believe he's gone.
7
124
22
reposted by
akbir khan
Jane Wang
about 1 year ago
A brilliant colleague and wonderful soul Felix Hill recently passed away. This was a shock and in an effort to sort some things out, I wrote them down. Maybe this will help someone else, but at the very least it helped me. Rest in peace, Felix, you will be missed.
www.janexwang.com/blog/2025/1/...
loading . . .
Felix — Jane X. Wang
From the moment I heard him give a talk, I knew I wanted to work with Felix . His ideas about generalization and situatedness made explicit thoughts that had been swirling around in my head, incohe...
https://www.janexwang.com/blog/2025/1/2/felix
2
63
11
reposted by
akbir khan
Edward Grefenstette
about 1 year ago
A few great papers out
@ucldark.com
this year. To single out two I love, there's the already well-cited paper on Debate by
@akbir.bsky.social
et al. which got best paper at ICML! [11/17]
1
4
1
just read this cover to cover in like 4 hours. strong recommend.
about 1 year ago
0
4
0
reposted by
akbir khan
Sam Bowman
about 1 year ago
Alongside our paper, we also recorded a roundtable video featuring four of the paper’s authors discussing the results and their implications in detail:
loading . . .
Alignment faking in large language models
YouTube video by Anthropic
https://www.youtube.com/watch?v=9eXV64O2Xp8&feature=youtu.be
1
22
3
reposted by
akbir khan
Sam Bowman
about 1 year ago
New work from my team at Anthropic in collaboration with Redwood Research. I think this is plausibly the most important AGI safety result of the year. Cross-posting the thread below:
5
126
40
Alignment faking occurs in sufficiently smart models.
www.anthropic.com/research/ali...
time.com/7202784/ai-r...
loading . . .
Exclusive: New Research Shows AI Strategically Lying
Experiments by Anthropic and Redwood Research show how Anthropic's model, Claude, is capable of strategic deceit
https://time.com/7202784/ai-research-strategic-lying/
about 1 year ago
0
3
0
reposted by
akbir khan
Aengus Lynch
about 1 year ago
NEW PAPER: Best-of-N Jailbreaking. We modify LLM inputs with simple, randomly generated augmentations and jailbreak frontier models across text, vision, and audio modalities. The algorithm is simple, scalable and highly effective.
1
5
1
why you need SO
add a skeleton here at some point
about 1 year ago
0
2
0
if you can’t recognise o1s progress then you need scalable oversight
about 1 year ago
0
0
0
reposted by
akbir khan
SAKE.aM 🍶
about 1 year ago
Just finished the first two episodes of “Pantheon” on Netflix. I’m so moved. If you got free time and think you’ll love a Black Mirror-ish story themed adult animated series, PLEASEEEEE go watch this show and tell me what you think.
7
65
21
reposted by
akbir khan
Max Roser
about 1 year ago
Sometimes, the most important news is when something isn’t happening. In my new @OurWorldInData article, I highlight that US airlines have transported passengers for more than two light-years since the last plane crash.
ourworldindata.org/us-airline-t...
loading . . .
US airlines have transported passengers for more than two light-years since the last plane crash
Sometimes, the most important news is when something isn’t happening.
https://ourworldindata.org/us-airline-travel
5
102
30
reposted by
akbir khan
Stefan Schubert
about 1 year ago
The Netherlands - a country with 26% of the UK's population - got almost as many consolidator grants as the UK this time.
erc.europa.eu/news-events/...
add a skeleton here at some point
0
8
1
reposted by
akbir khan
Sam Bowman
about 1 year ago
If you're potentially interested in transitioning into AI safety research, come collaborate with my team at Anthropic! Funded fellows program for researchers new to the field here:
alignment.anthropic.com/2024/anthrop...
loading . . .
Introducing the Anthropic Fellows Program
https://alignment.anthropic.com/2024/anthropic-fellows-program/
3
70
17
I’m recruiting Fellows to work with me on Aligning Superhuman models.
alignment.anthropic.com/2024/anthrop...
about 1 year ago
3
3
0
new kdot slams
about 1 year ago
0
3
0
reposted by
akbir khan
SAKE.aM 🍶
about 1 year ago
This nigga Kendrick went to therapy and got worse.
301
14153
2110
The current structure provides you with a path where you end up with unilateral absolute control over the AGI. You stated that you don't want to control the final AGI but during this negotiation, you've shown to us that absolute control is extremely important to you
www.lesswrong.com/posts/5jjk4C...
loading . . .
OpenAI Email Archives (from Musk v. Altman) — LessWrong
As part of the court case between Elon Musk and Sam Altman, a substantial number of emails between Elon, Sam Altman, Ilya Sutskever, and Greg Brockma…
https://www.lesswrong.com/posts/5jjk4CDnj9tA7ugxr/openai-email-archives-from-musk-v-altman
about 1 year ago
0
2
0
impressed by the deepseek model
about 1 year ago
0
0
0
got into a weird habit of looking up someone’s thesis and reading the acknowledgments section to see who influenced them
about 1 year ago
0
1
0
incredibly cool work on demonstrating models truly do reason by Laura Ruis
arxiv.org/abs/2411.12580
about 1 year ago
1
7
0
opening with Bon Iver is such a move
youtu.be/DE_yVb3JMD8?...
loading . . .
Fred again.. & Jim Legxacy - NTS Radio
YouTube video by Fred again . .
https://youtu.be/DE_yVb3JMD8?si=ECydQFi1LvtZ2YWo
about 1 year ago
0
0
0
you reached the end!!
feeds!
log in