Xiaoyan Bai
@elenal3ai.bsky.social
📤 441
📥 184
📝 71
PhD @UChicagoCS / BE in CS @Umich / ✨AI/NLP transparency and interpretability/📷🎨photography painting
pinned post!
📖 ≠ 🧪 The Story is Not the Science. Code is submitted but rarely executed during peer review—an issue likely to worsen with research agents. 🧑🔬 We introduce 𝐌𝐞𝐜𝐡𝐄𝐯𝐚𝐥𝐀𝐠𝐞𝐧𝐭, an execution-grounded evaluation of narrative + execution. 𝐕𝐞𝐫𝐢𝐟𝐲 𝐭𝐡𝐞 𝐬𝐜𝐢𝐞𝐧𝐜𝐞, 𝐧𝐨𝐭 𝐣𝐮𝐬𝐭 𝐭𝐡𝐞 𝐬𝐭𝐨𝐫𝐲. 1/n
2 days ago
2
8
4
📖 ≠ 🧪 The Story is Not the Science. Code is submitted but rarely executed during peer review—an issue likely to worsen with research agents. 🧑🔬 We introduce 𝐌𝐞𝐜𝐡𝐄𝐯𝐚𝐥𝐀𝐠𝐞𝐧𝐭, an execution-grounded evaluation of narrative + execution. 𝐕𝐞𝐫𝐢𝐟𝐲 𝐭𝐡𝐞 𝐬𝐜𝐢𝐞𝐧𝐜𝐞, 𝐧𝐨𝐭 𝐣𝐮𝐬𝐭 𝐭𝐡𝐞 𝐬𝐭𝐨𝐫𝐲. 1/n
2 days ago
2
8
4
reposted by
Xiaoyan Bai
Data Science Institute
about 1 month ago
Featured in UChicago News:
@elenal3ai.bsky.social
&
@chenhaotan.bsky.social
's research into why AI can write complex code but fails at 4-digit multiplication:
tinyurl.com/5ukvm7p7
loading . . .
Why can’t powerful AIs learn basic multiplication?
New research reveals why even state-of-the-art large language models stumble on seemingly easy tasks—and what it takes to fix it
https://tinyurl.com/5ukvm7p7
0
2
1
Will be at
#NeurIPS2025
presenting “Concept Incongruence”! 🦄🦆 Curious about a unicorn duck? Stop by, get one, and chat with us! We made a new demo for detecting hidden conflicts in system prompts to spot “concept incongruence” for safer prompts. 🔗:
github.com/ChicagoHAI/d...
🗓️ Dec 3 11AM - 2PM
3 months ago
1
6
2
Research agents are getting smarter. They can write convincing PhD-level reports 🧑🔬 But has anyone checked if the way they find their results makes any sense? Our framework, MechEvalAgents, verifies the science, not just the story 🤖 1/n🧵
3 months ago
1
3
1
reposted by
Xiaoyan Bai
Haokun Liu
3 months ago
We're launching a weekly competition where the community decides which research ideas get implemented. Every week, we'll take the top 3 ideas from IdeaHub, run experiments with AI agents, and share everything: code, successes, and failures. It's completely free and we'll try out ideas for you!
1
6
5
reposted by
Xiaoyan Bai
Lexing Xie
4 months ago
Identifying human morals and values in language is crucial for analysing lots of human- and AI-generated text. We introduce "MoVa: Towards Generalizable Classification of Human Morals and Values" - to be presented at
@emnlpmeeting.bsky.social
oral session next Thu
#CompSocialScience
#LLMs
🧵 (1/n)
8
8
5
🕸️ Here’s a network showing how much different models predict each other as the author of some text!
add a skeleton here at some point
4 months ago
0
8
3
❓ Does an LLM know thyself? 🪞 Humans pass the mirror test at ~18 months 👶 But what about LLMs? Can they recognize their own writing—or even admit authorship at all? In our new paper, we put 10 state-of-the-art models to the test. Read on 👇 1/n 🧵
4 months ago
1
12
5
In our new work, we reverse-engineer two models: a standard fine-tuned (SFT), and an implicit chain-of-thought (ICoT) model to see why models struggle with multi-digit multiplication. 👉Check out the paper here:
arxiv.org/abs/2510.00184
🎉Big thanks to all my amazing collaborators!
add a skeleton here at some point
4 months ago
0
7
1
reposted by
Xiaoyan Bai
4 months ago
AI can accelerate scientific discovery, but only if we get the scientist–AI interaction right. The dream of “autonomous AI scientists” is tempting: machines that generate hypotheses, run experiments, and write papers. But science isn’t just automation.
cichicago.substack.com/p/the-mirage...
🧵
loading . . .
The Mirage of Autonomous AI Scientists
Science as AI’s killer application cannot succeed without scientist-AI interaction: Introducing Hypogenic.ai.
https://cichicago.substack.com/p/the-mirage-of-autonomous-ai-scientists
2
22
8
reposted by
Xiaoyan Bai
Dang Nguyen
5 months ago
HR Simulator™: a game where you gaslight, deflect, and “let’s circle back” your way to victory. Every email a boss fight, every “per my last message” a critical hit… or maybe you just overplayed your hand 🫠 Can you earn Enlightened Bureaucrat status? (link below!)
2
4
8
reposted by
Xiaoyan Bai
5 months ago
🚀 We’re thrilled to announce the upcoming AI & Scientific Discovery online seminar! We have an amazing lineup of speakers. This series will dive into how AI is accelerating research, enabling breakthroughs, and shaping the future of research across disciplines.
ai-scientific-discovery.github.io
1
23
16
reposted by
Xiaoyan Bai
5 months ago
As AI becomes increasingly capable of conducting analyses and following instructions, my prediction is that the role of scientists will increasingly focus on identifying and selecting important problems to work on ("selector"), and effectively evaluating analyses performed by AI ("evaluator").
2
10
8
reposted by
Xiaoyan Bai
6 months ago
We are proposing the second workshop on AI & Scientific Discovery at EACL/ACL. The workshop will explore how AI can advance scientific discovery. Please use this Google form to indicate your interest (corrected link):
forms.gle/MFcdKYnckNno...
More in the 🧵! Please share!
#MLSky
🧠
loading . . .
Program Committee Interest for the Second Workshop on AI & Scientific Discovery
We are proposing the second workshop on AI & Scientific Discovery at EACL/ACL (Annual meetings of The Association for Computational Linguistics, the European Language Resource Association and Internat...
https://forms.gle/MFcdKYnckNnohqap9
1
14
8
⚡️Ever asked an LLM-as-Marilyn Monroe about the 2020 election? Our paper calls this concept incongruence, common in both AI and how humans create and reason. 🧠Read my blog to learn what we found, why it matters for AI safety and creativity, and what's next:
cichicago.substack.com/p/concept-in...
7 months ago
1
9
5
reposted by
Xiaoyan Bai
7 months ago
Prompting is our most successful tool for exploring LLMs, but the term evokes eye-rolls and grimaces from scientists. Why? Because prompting as scientific inquiry has become conflated with prompt engineering. This is holding us back. 🧵and new paper with
@ari-holtzman.bsky.social
.
2
37
15
reposted by
Xiaoyan Bai
8 months ago
When you walk into the ER, you could get a doc: 1. Fresh from a week of not working 2. Tired from working too many shifts
@oziadias.bsky.social
has been both and thinks that they're different! But can you tell from their notes? Yes we can! Paper
@natcomms.nature.com
www.nature.com/articles/s41...
1
26
11
Humbled to receive an honorable mention🌟
add a skeleton here at some point
8 months ago
0
1
0
reposted by
Xiaoyan Bai
8 months ago
Since
@elenal3ai.bsky.social
cannot make it, I presented the poster on concept incongruence:
arxiv.org/abs/2505.14905
0
7
2
I am glad that you found our paper entertaining! This is a great point for my follow-up thread on the implications of concept incongruence. Our main goal is to raise awareness and provide clarity around concept incongruence.
add a skeleton here at some point
9 months ago
1
3
4
🚨 New paper alert 🚨 Ever asked an LLM-as-Marilyn Monroe who the US president was in 2000? 🤔 Should the LLM answer at all? We call these clashes Concept Incongruence. Read on! ⬇️ 1/n 🧵
9 months ago
1
28
18
reposted by
Xiaoyan Bai
Mourad Heddaya
10 months ago
🧑⚖️How well can LLMs summarize complex legal documents? And can we use LLMs to evaluate? Excited to be in Albuquerque presenting our paper this afternoon at @naaclmeeting 2025!
2
23
13
reposted by
Xiaoyan Bai
Haokun Liu
10 months ago
🚀🚀🚀Excited to share our latest work: HypoBench, a systematic benchmark for evaluating LLM-based hypothesis generation methods! There is much excitement about leveraging LLMs for scientific hypothesis generation, but principled evaluations are missing - let’s dive into HypoBench together.
1
11
9
reposted by
Xiaoyan Bai
10 months ago
Encourage your students to submit posters and register! Limited free housing is provided for student participants only, on a first-come (i.e., request)-first-serve basis. We are also actively looking for sponsors. Reach out if you are interested! Please repost! Help spread the words!
add a skeleton here at some point
2
10
10
reposted by
Xiaoyan Bai
Dang Nguyen
10 months ago
1/n You may know that large language models (LLMs) can be biased in their decision-making, but ever wondered how those biases are encoded internally and whether we can surgically remove them?
1
18
13
reposted by
Xiaoyan Bai
Julia Mendelsohn
12 months ago
New preprint! Metaphors shape how people understand politics, but measuring them (& their real-world effects) is hard. We develop a new method to measure metaphor & use it to study dehumanizing metaphor in 400K immigration tweets Link:
bit.ly/4i3PGm3
#NLP
#NLProc
#polisky
#polcom
#compsocialsci
🐦🐦
6
181
75
reposted by
Xiaoyan Bai
Guillaume Lajoie
over 1 year ago
Compositional representations are a key attributes of intelligent systems that generalize well. An issue is that there is no robust way to quantify compositionality. Below is our attempt at such a quantifiable measurement.
arxiv.org/abs/2410.148...
w/ E Elmoznino & T Jiralerspong & Y Bengio
loading . . .
A Complexity-Based Theory of Compositionality
Compositionality is believed to be fundamental to intelligence. In humans, it underlies the structure of thought, language, and higher-level reasoning. In AI, compositional representations can enable ...
https://arxiv.org/abs/2410.14817v1
0
16
4
reposted by
Xiaoyan Bai
Mourad Heddaya
about 1 year ago
How do everyday narratives reveal hidden cause-and-effect patterns that shape our beliefs and behaviors? In our paper, we propose Causal Micro-Narratives to uncover narratives from real-world data. As a case study, we characterize the narratives about inflation in news.
1
34
8
you reached the end!!
feeds!
log in