John (Yueh-Han) Chen
@johnchen6.bsky.social
📤 17
📥 97
📝 18
Graduate Student Researcher @nyu prev @ucberkeley
https://john-chen.cc
pinned post!
Do LLMs show systematic generalization of safety facts to novel scenarios? Introducing our work SAGE-Eval, a benchmark consisting of 100+ safety facts and 10k+ scenarios to test this! - Claude-3.7-Sonnet passes only 57% of facts evaluated - o1 and o3-mini passed <45%! đź§µ
4 months ago
1
3
2
reposted by
John (Yueh-Han) Chen
Maksym Andriushchenko
3 months ago
🚨Excited to release OS-Harm! 🚨 The safety of computer use agents has been largely overlooked. We created a new safety benchmark based on OSWorld for measuring 3 broad categories of harm: 1. deliberate user misuse, 2. prompt injections, 3. model misbehavior.
1
3
2
reposted by
John (Yueh-Han) Chen
NYU Center for Data Science
24 days ago
Frontier AI systems failed to reliably flag safety risks related to more than 40% of common safety facts tested in the SAGE‑Eval benchmark by Yueh-Han (John) Cheni,
@guydav.bsky.social
, and
@brendenlake.bsky.social
.
nyudatascience.medium.com/even-the-top...
loading . . .
Even the Top LLM Failed to Reliably Flag Some Risks Related to 40% of Safety Facts
CDS’ SAGE‑Eval shows top‑performing AI models failed at least 42% of safety warnings in novel scenarios.
https://nyudatascience.medium.com/even-the-top-llm-failed-to-reliably-flag-some-risks-related-to-40-of-safety-facts-0e718e281910
0
2
2
reposted by
John (Yueh-Han) Chen
NYU Center for Data Science
4 months ago
CDS PhD student
@vishakhpk.bsky.social
, with co-authors
@johnchen6.bsky.social
, Jane Pan, Valerie Chen, and CDS Associate Professor @hhexiy.bsky.social, has published new research on the trade-off between originality and quality in LLM outputs. Read more:
nyudatascience.medium.com/in-ai-genera...
loading . . .
In AI-Generated Content, A Trade-Off Between Quality and Originality
New research from CDS researchers maps the trade-off between originality and quality in LLM outputs.
https://nyudatascience.medium.com/in-ai-generated-content-a-trade-off-between-quality-and-originality-acc67b6f9abc
1
2
2
reposted by
John (Yueh-Han) Chen
Guy Davidson
4 months ago
Fantastic new work by
@johnchen6.bsky.social
(with
@brendenlake.bsky.social
and me trying not to cause too much trouble). We study systematic generalization in a safety setting and find LLMs struggle to consistently respond safely when we vary how we ask naive questions. More analyses in the paper!
add a skeleton here at some point
0
10
3
reposted by
John (Yueh-Han) Chen
Brenden Lake
4 months ago
Failures of systematic generalization in LLMs can lead to real-world safety issues. New paper by
@johnchen6.bsky.social
and
@guydav.bsky.social
,
arxiv.org/abs/2505.21828
add a skeleton here at some point
0
5
2
Do LLMs show systematic generalization of safety facts to novel scenarios? Introducing our work SAGE-Eval, a benchmark consisting of 100+ safety facts and 10k+ scenarios to test this! - Claude-3.7-Sonnet passes only 57% of facts evaluated - o1 and o3-mini passed <45%! đź§µ
4 months ago
1
3
2
reposted by
John (Yueh-Han) Chen
Vishakh Padmakumar
5 months ago
What does it mean for
#LLM
output to be novel? In work w/
johnchen6.bsky.social
, Jane Pan, Valerie Chen and He He, we argue it needs to be both original and high quality. While prompting tricks trade one for the other, better models (scaling/post-training) can shift the novelty frontier đź§µ
2
7
4
reposted by
John (Yueh-Han) Chen
Ai2
6 months ago
Meet Ai2 Paper Finder, an LLM-powered literature search system. Searching for relevant work is a multi-step process that requires iteration. Paper Finder mimics this workflow — and helps researchers find more papers than ever 🔍
6
117
32
you reached the end!!
feeds!
log in