Fazl Barez
@fbarez.bsky.social
📤 244
📥 144
📝 74
Let's build AI's we can trust!
https://fazlbarez.com
I'll be at
#ICML2026
next week—7 main conference papers, 2 orals + an invited talk 🇰🇷 Also hiring 2 RAs (interp + continual learning) at Oxford: what happens inside models that keep learning after deployment? Come chat in Seoul, or apply 👇
tsglab.github.io/vacancies/
Please share with folks!
loading . . .
Vacancies | TSG Lab – Technical Safety & Governance Lab
https://tsglab.github.io/vacancies/
1 day ago
0
1
0
reposted by
Fazl Barez
Oxford Internet Institute
3 days ago
A glimpse into another successful Oxford Connected Life Summit, focused on what it means to live in an increasingly connected world. This year's theme was "New Intelligence, Old Questions," and featured notable speakers and organisations. Huge thanks to the student committee for making this happen!
0
1
1
Heading to
#ICML2026
in Seoul next week with the TSG Lab and Martian 🇰🇷 10 papers: 7 in the main conf, 3 WS: interpretability, AI evaluation, and governance, two oral spotlights Giving an invited talk at the EIML WS Grateful to the students, collaborators, mentors and Claude who made it happen!
5 days ago
0
0
0
Really grateful to have 7 papers accepted at
@icmlconf.bsky.social
onf 2026, including 2 spotlights! Massive thanks to all my collaborators—I’ve been lucky to work with such brilliant people
#ICML2026
10 days ago
1
0
0
Excited to be debating at the Oxford Union this evening Motion: This House Believes that AI is the Great Equalizer Is it? Or isn't it? I'm speaking for the proposition--which might surprise those who know my work. That's rather the point! We'll find out which way the House votes
10 days ago
0
0
1
reposted by
Fazl Barez
Stella Biderman
25 days ago
In film, "we'll fix it in post" is what you say when something went wrong on set and you don't want to redo it. AI research has made it our entire methodology: train the model, then patch whatever comes out. Our new ICML oral argues this can't be the basis of a science of AI. 🧵
3
108
24
reposted by
Fazl Barez
Oxford Martin School
26 days ago
How can we ensure AI-powered robots remain safe when operating in the real world? 🤖 A recent article co-authored by
@aigioxfordmartin.bsky.social
researcher Fazl Barez, explores safety challenges and the importance of being context-aware 🔗Read the research in full:
www.science.org/doi/10.1126/...
loading . . .
Beyond alignment: Why robotic foundation models need context-aware safety
Because AI-enabled robots can be tricked into taking unsafe actions, they require layered, context-aware safety guardrails.
https://www.science.org/doi/10.1126/scirobotics.aef2191
0
2
1
Incredibly excited to announce $1 Million prize pool to solve the world’s most important scientific problem in Interpretability. The goal is to turns hard interpretability questions into tools for human empowerment, oversight and governance.
loading . . .
7 months ago
2
4
1
🚨New AI Safety Course @aims_oxford ! I’m thrilled to launch a new called AI Safety & Alignment (AISAA) course on the foundations & frontier research of making advanced AI systems safe and aligned at @UniofOxford what to expect 👇
robots.ox.ac.uk/~fazl/aisaa/
9 months ago
1
6
1
reposted by
Fazl Barez
Toby Ord
9 months ago
Evaluating the Infinite 🧵 My latest paper tries to solve a longstanding problem afflicting fields such as decision theory, economics, and ethics — the problem of infinities. Let me explain a bit about what causes the problem and how my solution avoids it. 1/N
arxiv.org/abs/2509.19389
loading . . .
Evaluating the Infinite
I present a novel mathematical technique for dealing with the infinities arising from divergent sums and integrals. It assigns them fine-grained infinite values from the set of hyperreal numbers in a ...
https://arxiv.org/abs/2509.19389
2
12
5
🚀 Excited to have 2 papers accepted at
#NeurIP2025
! 🎉 congrats to my amazing co-authors! More details (and more bragging) soon! and maybe even more news on sep 25 👀 See you all in… Mexico? San Diego? Copenhagen? Who knows! 🌍✈️
10 months ago
0
1
0
reposted by
Fazl Barez
Jakob Mökander
10 months ago
🚨 NEW PAPER 🚨: Embodied AI (incl. AI-powered drones, self-driving cars and robots) is here, but policies are lagging. We analyzed the EAI risks and found significant gaps in governance
arxiv.org/pdf/2509.00117
Co-authors Jared Perlo
@fbarez.bsky.social
Alex Robey &
@floridi.bsky.social
1\4
1
3
3
reposted by
Fazl Barez
Martin Tutek
11 months ago
Other works have highlighted that CoTs ≠ explainability
alphaxiv.org/abs/2025.02
(
@fbarez.bsky.social
), and that intermediate (CoT) tokens ≠ reasoning traces
arxiv.org/abs/2504.09762
(
@rao2z.bsky.social
). Here, FUR offers a fine-grained test if LMs latently used information from CoTs for answers!
loading . . .
Chain-of-Thought Is Not Explainability | alphaXiv
View 3 comments: There should be a balance of both subjective and observable methodologies. Adhering to just one is a fools errand.
https://alphaxiv.org/abs/2025.02
1
6
1
reposted by
Fazl Barez
Jeroen ‘Jeremy’ Fransen
about 1 year ago
It is so easy to confuse chain of thought and explainability and in fact in a lot of the media it is presented as if with current LLMs we are allowed to view their actual thought processes. It is not that!
add a skeleton here at some point
0
6
2
Excited to share our paper: "Chain-of-Thought Is Not Explainability"! We unpack a critical misconception in AI: models explaining their steps (CoT) aren't necessarily revealing their true reasoning. Spoiler: the transparency can be an illusion. (1/9) 🧵
about 1 year ago
2
84
36
Technology = power. AI is reshaping power — fast. Today’s AI doesn’t just assist decisions; it makes them. Governments use it for surveillance, prediction, and control — often with no oversight. Technical safeguards aren’t enough on their own — but they’re essential for AI to serve society.
about 1 year ago
1
4
0
reposted by
Fazl Barez
David Duvenaud
about 1 year ago
And Anna Yelizarov,
@fbarez.bsky.social
,
@scasper.bsky.social
, Beatrice Erkers, among others. We'll draw from political theory, cooperative AI, economics, mechanism design, history, and hierarchical agency.
1
3
1
reposted by
Fazl Barez
Yoav Gur Arieh
about 1 year ago
This is a step toward targeted, interpretable, and robust knowledge removal — at the parameter level. Joint work with Clara Suslik, Yihuai Hong, and
@fbarez.bsky.social
, advised by
@megamor2.bsky.social
🔗 Paper:
arxiv.org/abs/2505.22586
🔗 Code:
github.com/yoavgur/PISCES
loading . . .
0
1
1
Come work with me at Oxford this summer! Paid research opportunity to: White-box LLMs & model security Safe RL & reward hacking Interpretability & governance tools Remote or Oxford. Apply by 30 May 23:59 UTC. DM with questions.
about 1 year ago
1
2
0
Come work with me at Oxford! We’re hiring a Postdoc in Causal Systems Modelling to: - Build causal & white-box models that make frontier AI safer and more transparent - Turn technical insights into safety cases, policy briefs, and governance tools ] DM if you have any questions.
about 1 year ago
1
4
4
First-time Area Chair seeking advice! What helped you most when evaluating papers beyond just averaging scores? After suffering through unhelpful reviews as an author, I want to do right by papers in my track.
about 1 year ago
1
0
0
reposted by
Fazl Barez
Mor Geva
over 1 year ago
🎉 Our Actionable Interpretability workshop has been accepted to
#ICML2025
! 🎉 > Follow
@actinterp.bsky.social
> Website
actionable-interpretability.github.io
@talhaklay.bsky.social
@anja.re
@mariusmosbach.bsky.social
@sarah-nlp.bsky.social
@iftenney.bsky.social
Paper submission deadline: May 9th!
3
42
19
reposted by
Fazl Barez
Technical AI Governance @ ICML 2025
over 1 year ago
Organizers: Ben Bucknall,
@lisasoder.bsky.social
,
@ankareuel.bsky.social
@fbarez.bsky.social
,
@carlosmougan.bsky.social
Weiwei Pan, Siddharth Swaroop,
@ankareuel.bsky.social
, Robert Trager
@maosbot.bsky.social
0
5
2
Technical AI Governance (TAIG) at
#ICML2025
this July in Vancouver! Credit to Ben and Lisa for all the work! We have a new centre at Oxford working on technical AI governance with Robert Trager and
@maosbot.bsky.social
many other great minds. We are hiring - please reach out! Quote
add a skeleton here at some point
over 1 year ago
0
6
1
reposted by
Fazl Barez
Naomi Saphra
over 1 year ago
Life update: I'm starting as faculty at Boston University
@bucds.bsky.social
in 2026! BU has SCHEMES for LM interpretability & analysis, I couldn't be more pumped to join a burgeoning supergroup w/
@najoung.bsky.social
@amuuueller.bsky.social
. Looking for my first students, so apply and reach out!
35
244
20
reposted by
Fazl Barez
Itay Itzhak @ COLM 🍁
over 1 year ago
New paper alert! Curious how small prompt tweaks impact LLM accuracy but don’t want to run endless inferences? We got you. Meet DOVE - a dataset built to uncover these sensitivities. Use DOVE for your analysis or contribute samples -we're growing and welcome you aboard!
add a skeleton here at some point
0
4
1
reposted by
Fazl Barez
over 1 year ago
What happens once AI can design better AI, which can itself design better AI? Will we get an "intelligence explosion" where AI capabilities increase very rapidly? Tom Davidson, Rose Hadshar and I have a new paper out with analysis of these dynamics.
1
5
1
reposted by
Fazl Barez
Jakob Foerster
over 1 year ago
My group @FLAIR_Ox is recruiting a postdoc and looking for someone who can get started by the end of April. Deadline to apply is in one week (!), 19th of March at noon, so please help spread the word:
my.corehr.com/pls/uoxrecru...
loading . . .
Job Details
https://my.corehr.com/pls/uoxrecruit/erq_jobspec_version_4.display_form?p_company=10&p_internal_external=E&p_display_in_irish=N&p_process_type=&p_applicant_no=&p_form_profile_detail=&p_display_apply_ind=Y&p_refresh_search=Y&p_recruitment_id=178222
0
19
13
reposted by
Fazl Barez
Tal Haklay
over 1 year ago
1/13 LLM circuits tell us where the computation happens inside the model—but the computation varies by token position, a key detail often ignored! We propose a method to automatically find position-aware circuits, improving faithfulness while keeping circuits compact. 🧵👇
1
26
9
🔍 Excited to share our paper: "Same Question, Different Words: A Latent Adversarial Framework for Prompt Robustness"!
over 1 year ago
2
3
1
New paper alert! 🚨 Important question: Do SAEs generalise? We explore the answerability detection in LLMs by comparing SAE features vs. linear residual stream probes. Answer: probes outperform SAE features in-domain, out-of-domain generalization varies sharply between features and datasets. 🧵
over 1 year ago
1
10
2
reposted by
Fazl Barez
Adi Simhi
over 1 year ago
🚨New arXiv preprint!🚨 LLMs can hallucinate - but did you know they can do so with high certainty even when they know the correct answer? 🤯 We find those hallucinations in our latest work with
@itay-itzhak.bsky.social
,
@fbarez.bsky.social
,
@gabistanovsky.bsky.social
and Yonatan Belinkov
3
21
12
reposted by
Fazl Barez
Oxford Martin AI Governance Initiative
over 1 year ago
We are excited to welcome Fazl Barez
@fbarez.bsky.social
, who joins us as a senior postdoctoral research fellow. He will be leading research initiatives in AI safety and interpretability.
@oxmartinschool.bsky.social
Find out more:
www.oxfordmartin.ox.ac.uk/people/fazl-...
0
6
3
reposted by
Fazl Barez
Yoshua Bengio
over 1 year ago
Very interesting paper about unlearning for AI Safety, a subject that deserves more attention. ⬇️
add a skeleton here at some point
0
50
6
🚨 New Paper Alert: Open Problem in Machine Unlearning for AI Safety 🚨 Can AI truly "forget"? While unlearning promises data removal, controlling emergent capabilities is a inherent challenge. Here's why it matters: 👇 Paper:
arxiv.org/pdf/2501.04952
1/8
over 1 year ago
1
25
9
reposted by
Fazl Barez
over 1 year ago
What happens when "If at first you don't succeed, try again?" meets modern ML/AI insights about scaling up? You jailbreak every model on the market😱😱😱 Fire work led by
@jplhughes.bsky.social
Sara Price
@aengusl.bsky.social
Mrinank Sharma Ethan Perez
arxiv.org/abs/2412.03556
0
3
2
reposted by
Fazl Barez
over 1 year ago
🚨🛡️ Jailbreak Defense in a Narrow Domain 🛡️🚨 Jailbreaking is easy. Defending is hard. Might defending against a single, narrow behavior be easier? Even in this focused setting, all defenses fail 😱
arxiv.org/abs/2412.02159
Appearing at @AdvMLFrontiers (Oral) & @solarneurips
#NeurIPS2024
loading . . .
Jailbreak Defense in a Narrow Domain: Limitations of Existing...
Defending large language models against jailbreaks so that they never engage in a broadly-defined set of forbidden behaviors is an open problem. In this paper, we investigate the difficulty of...
https://arxiv.org/abs/2412.02159v1
2
4
3
Today is a good day for AI Safety! We are launching the AI Luminate AI Safety Benchmark @MLCommons @PeterMattson100 @tangenticAI The first step towards global standard benchmark for AI PRODUCT safety!
over 1 year ago
0
1
0
reposted by
Fazl Barez
Maike Osborne 🏳️⚧️🏳️🌈
over 1 year ago
Our new Chancellor wants us admission tutors to "vet" Chinese applicants. How the heck are we supposed to do that? We're underpaid and overworked academics, not a national intelligence service
www.politico.eu/article/oxfo...
loading . . .
Oxford University’s China dilemma
Politicians are among those vying to run the elite institution — and they’re squaring off over Beijing’s influence on Britain.
https://www.politico.eu/article/oxford-university-china-uk-nobel-winners-diplomat-hong-kong-power-play-ceremony-spies-alumni-chancellor/
5
44
5
ML /AI Safety researchers how do you find time to read papers, work on your projects and be so active on b sky, x etc?
over 1 year ago
0
0
0
🚨 Join us at NeurIPS 2024! 🚨 🧠 PrivacyML: Meaningful Privacy-Preserving ML & Evaluations 🌐
privacy.github.io
Tackling the questions in AI Privacy & Safety: How do we protect training data privacy? What does unlearning mean for AI safety? Can cryptography make AI safer?
loading . . .
https://privacy.github.io
over 1 year ago
1
2
0
you reached the end!!
feeds!
log in