Paul Röttger
@paul-rottger.bsky.social
📤 436
📥 248
📝 56
Departmental Lecturer
@oii.ox.ac.uk
. Evaluating safety & societal impacts of AI.
pinned post!
New paper w/ UK AISI: Millions of people now use AI to help them write and communicate. In three experiments (14k participants, 3m+ human ratings) we show that AI writing assistance systematically distorts writer personas – their perceived beliefs, personality, and identity. 🧵
23 days ago
2
41
11
New paper w/ UK AISI: Millions of people now use AI to help them write and communicate. In three experiments (14k participants, 3m+ human ratings) we show that AI writing assistance systematically distorts writer personas – their perceived beliefs, personality, and identity. 🧵
23 days ago
2
41
11
There’s plenty of evidence for political bias in LLMs, but very few evals reflect realistic LLM use cases — which is where bias actually matters. IssueBench, our attempt to fix this, is accepted at TACL, and I will be at
#EMNLP2025
next week to talk about it! New results 🧵
add a skeleton here at some point
7 months ago
1
32
11
LLMs are good at simulating human behaviours, but they are not going to be great unless we train them to. We hope SimBench can be the foundation for more specialised development of LLM simulators. I really enjoyed working on this with
@tiancheng.bsky.social
et al. Many fun results 👇
add a skeleton here at some point
7 months ago
0
8
3
reposted by
Paul Röttger
Manuel Tonneau
10 months ago
🏆 Thrilled to share that our HateDay paper has received an Outstanding Paper Award at
#ACL2025
Big thanks to my wonderful co-authors:
@deeliu97.bsky.social
, Niyati,
@computermacgyver.bsky.social
, Sam, Victor, and
@paul-rottger.bsky.social
! Thread 👇and data avail at
huggingface.co/datasets/man...
add a skeleton here at some point
2
32
8
Very excited about all these papers on sociotechnical alignment & the societal impacts of AI at
#ACL2025
. As is now tradition, I made some timetables to help me find my way around. Sharing here in case others find them useful too :) 🧵
10 months ago
1
26
6
reposted by
Paul Röttger
Matthias Orlikowski
about 1 year ago
Can LLMs learn to simulate individuals' judgments based on their demographics? Not quite! In our new paper, we found that LLMs do not learn information about demographics, but instead learn individual annotators' patterns based on unique combinations of attributes! 🧵
1
13
4
reposted by
Paul Röttger
Kobi Hackenburg
about 1 year ago
📈Out today in @PNASNews!📈 In a large pre-registered experiment (n=25,982), we find evidence that scaling the size of LLMs yields sharply diminishing persuasive returns for static political messages. 🧵:
1
40
23
Are LLMs biased when they write about political issues? We just released IssueBench – the largest, most realistic benchmark of its kind – to answer this question more robustly than ever before. Long 🧵with spicy results 👇
over 1 year ago
4
82
31
reposted by
Paul Röttger
Xinpeng Wang
over 1 year ago
I’m thrilled to share that our paper on mitigating false refusal in language models has been accepted to ICLR 2025
@iclr-conf.bsky.social
!
arxiv.org/abs/2410.03415
Joint work with chengzhi,
@paul-rottger.bsky.social
,
@barbaraplank.bsky.social
.
0
8
2
Today, we are releasing MSTS, a new Multimodal Safety Test Suite for vision-language models! MSTS is exciting because it tests for safety risks *created by multimodality*. Each prompt consists of a text + image that *only in combination* reveal their full unsafe meaning. 🧵
over 1 year ago
2
31
16
you reached the end!!
feeds!
log in