Benjamin Hilton
@benjamin-hilton.bsky.social
📤 39
📥 77
📝 19
Alignment @AISI. Semi-informed about economics, physics and governments. views my own
reposted by
Benjamin Hilton
Geoffrey Irving
about 2 months ago
I am very excited that AISI is announcing over £15M in funding for AI alignment and control, in partnership with other governments, industry, VCs, and philanthropists! Here is a 🧵 about why it is important to bring more independent ideas and expertise into this space.
alignmentproject.aisi.gov.uk
loading . . .
The Alignment Project by AISI — The AI Security Institute
The Alignment Project funds groundbreaking AI alignment research to address one of AI’s most urgent challenges: ensuring advanced systems act predictably, safely, and for society’s benefit.
https://alignmentproject.aisi.gov.uk/
1
9
3
Humans are often very wrong. This is a big problem if you want to use human judgment to oversee super-smart AI systems. In our new post,
@girving.bsky.social
argues that we might be able to deal with this issue – not by fixing the humans, but by redesigning oversight protocols.
4 months ago
1
0
0
Want to build an aligned ASI? Our new paper explains how to do that, using debate. Tl;dr: Debate + exploration guarantees + no obfuscated arguments + good human input = outer alignment Outer alignment + online training = inner alignment* * sufficient for low-stakes contexts
5 months ago
1
3
1
reposted by
Benjamin Hilton
Geoffrey Irving
5 months ago
On top of the AISI-wide research agenda yesterday, we have more on the research agenda for the AISI Alignment Team specifically. See Benjamin's thread and full post for details; here I'll focus on why we should not give up on directly solving alignment, even though it is hard. 🧵
add a skeleton here at some point
1
4
2
The Alignment Team at UK AISI now has a research agenda. Our goal: solve the alignment problem. How: develop concrete, parallelisable open problems. Our initial focus is on asymptotic honesty guarantees (more details in the post). 1/5
5 months ago
1
6
1
Interested in getting UK AISI support to do alignment research? Fill in our short, < 5-min form, and we'll get back on proposals within 1 week. (Caveat: While we may reach out about AISI funding your project, filling out this form is not an application for funding.)
5 months ago
1
4
3
you reached the end!!
feeds!
log in