Yapei Chang
@yapeichang.bsky.social
π€ 2766
π₯ 653
π 34
βοΈ phd in progress @ UMD | π
https://lilakk.github.io/
π€ Can simple string-matching metrics like BLEU rival reward models for LLM alignment? π We show that given access to a reference, BLEU can match reward models in human preference agreement, and even train LLMs competitively with them using GRPO. π« Introducing BLEUBERI:
7 months ago
1
5
2
π΅οΈββοΈ agents are strong on many tasks, but are they good at interacting with the web? π§Έour BEARCUBS benchmark shows that they struggle on interactive tasks that seem trivial to humans! π check out the paper for how to build robust evaluations & directions for future agent research
add a skeleton here at some point
10 months ago
0
2
0
reposted by
Yapei Chang
Yekyung Kim
10 months ago
Is the needle-in-a-haystack test still meaningful given the giant green heatmaps in modern LLM papers? We create ONERULER π, a multilingual long-context benchmark that allows for nonexistent needles. Turns out NIAH isn't so easy after all! Our analysis across 26 languages π§΅π
1
14
8
current models struggle with complex long-range reasoning tasks π how can we reliably create synthetic training data? π½ check out CLIPPER, a pipeline that generates data conditioning on compressed forms of long input documents!
add a skeleton here at some point
10 months ago
0
8
0
reposted by
Yapei Chang
Jenna Russell
11 months ago
People often claim they know when ChatGPT wrote something, but are they as accurate as they think? Turns out that while general population is unreliable, those who frequently use ChatGPT for writing tasks can spot even "humanized" AI-generated text with near-perfect accuracy π―
10
189
85
reposted by
Yapei Chang
Mark π. Nelson
about 1 year ago
Great blog post (by a 15-author team!) on their release of ModernBERT, the continuing relevance of encoder-only models, and how they relate to, say, GPT-4/llama. Accessible enough that I might use this as an undergrad reading.
loading . . .
Finally, a Replacement for BERT: Introducing ModernBERT
Weβre on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/blog/modernbert
1
75
21
reposted by
Yapei Chang
Michael Saxon
about 1 year ago
π¨I too am on the job marketβΌοΈπ€― I'm searching for faculty positions/postdocs in multilingual/multicultural NLP, vision+language models, and eval for genAI! I'll be at
#NeurIPS2024
presenting our work on meta-evaluation for text-to-image faithfulness! Let's chat there! Papers inπ§΅, see more:
saxon.me
1
49
11
π what monday feels like..
about 1 year ago
0
8
0
private closed-source evals are the future π«£
add a skeleton here at some point
about 1 year ago
0
2
0
i knew something like this had to exist but why did i only discover it now?? no more suffering from looking at my 10+ open arxiv tabs not knowing which one is which...
about 1 year ago
0
27
4
reposted by
Yapei Chang
Marc Marone
about 1 year ago
I noticed a lot of starter packs skewed towards faculty/industry, so I made one of just NLP & ML students:
go.bsky.app/vju2ux
Students do different research, go on the job market, and recruit other students. Ping me and I'll add you!
add a skeleton here at some point
101
176
58
such a creative way of using long-context models! this sounds like a super hard evaluation task, but gemini is already so good at it...
add a skeleton here at some point
about 1 year ago
1
5
0
reposted by
Yapei Chang
Andrew Drozdov
about 1 year ago
Mat is not on π¦βposting on his behalf! It's time to revisit common assumptions in IR! Embeddings have improved drastically, but mainstream IR evals have stagnated since MSMARCO + BEIR. We ask: on private or tricky IR tasks, are rerankers better? Surely, reranking many docs is best?
4
81
28
reposted by
Yapei Chang
Lucy Li
about 1 year ago
The soul-searching journey for figuring out what research area is right for you is tricky since so many papers are cool. I tell my early career students that they should try to differentiate papers that they'd like to read π, implement π¨, *and* write π from papers that they'd only like to read π.
4
67
11
#EMNLP2024
was funπΉnow brainstorming ideas for
#EMNLP2025
ππ»ββοΈ
about 1 year ago
0
4
0
airbnb >>> hotel for conferences
#EMNLP2024
about 1 year ago
0
4
0
reposted by
Yapei Chang
Abhilasha Ravichander
about 1 year ago
β¨I am on the faculty job market in the 2024-2025 cycle!β¨ My research centers on advancing Responsible AI, specifically enhancing factuality, robustness, and transparency in AI systems. If you have relevant positions, let me know!
lasharavichander.github.io
Please share/RT!
loading . . .
Abhilasha Ravichander - Home
https://lasharavichander.github.io/
2
52
23
reposted by
Yapei Chang
Chau Minh Pham
about 1 year ago
Long-form text generation with multiple stylistic and semantic constraints remains largely unexplored. We present Suri π¦: a dataset of 20K long-form texts & LLM-generated, backtranslated instructions with complex constraints. π
arxiv.org/abs/2406.19371
9
36
7
reposted by
Yapei Chang
Marzena Karpinska
about 1 year ago
I really wanted to run NEW
#nocha
benchmark claims on
#o1
but it won't behave π - 6k reasoning tokens is often not enough to get an ans and more means being able to process only short books - OpenAI adds sth to the prompt: ~8k extra tokens-> less room for book+reason+generation!
1
6
4
πHeading to
#EMNLP2024
tmr, presenting PostMark on Tue. morning! π
arxiv.org/abs/2406.14517
Aside from this, I'd love to chat about: β’ long-context training β’ realistic & hard eval β’ synthetic data β’ tbh any cool projects people are working on Also, I'm on the lookout for a summer 2025 internship!
about 1 year ago
0
6
4
reposted by
Yapei Chang
Dustin Wright
about 1 year ago
Looking forward to catching up with old and new friends at
#EMNLP2024
! Iβm on the academic job market so please reach out if you would like to chat π And come talk to me,
@rnv.bsky.social
and
@iaugenstein.bsky.social
on Thu (Nov 14) at poster session G from 2-3:30PM about LLM tropes!
add a skeleton here at some point
0
16
3
reposted by
Yapei Chang
Catherine Breslin
about 1 year ago
Starter packs are a great way to find people. But I followed a few tech/AI starter packs, and now have a sizeable gender skew in who I'm following. To counteract, I started collecting this list. Who else should I be following & add? π
go.bsky.app/LaGDpqg
add a skeleton here at some point
37
205
119
thxx for the acknowledgment π₯° also check out our followup work NoCha (
novelchallenge.github.io
), a benchmark that asks LLMs to verify claims about new fictional books! the leaderboard is updated with very new models
add a skeleton here at some point
about 1 year ago
0
7
1
reposted by
Yapei Chang
Kyle Lo
over 1 year ago
we released our open multimodal language model Molmo today π₯³ π secret sauce? really really high quality set of image+text pairs, which we'll release openly πΉοΈ try it out:
molmo.allenai.org
π read more about it:
molmo.allenai.org/blog
π€ download models:
huggingface.co/collections/...
5
50
20
you reached the end!!
feeds!
log in