@boydgraber.bsky.social
📤 198
📥 358
📝 43
reposted by
Lorenz Meyer
30 days ago
Liebe Medien, ich kann die Schlagzeile "Keine Einigung zwischen USA und Dänemark" nicht mehr sehen. Wenn ein Bewaffneter eine Bank stürmt, titelt ihr doch auch nicht: "Räuber und Kassiererin finden keinen Konsens über Geldübergabe." Hört auf, imperiale Aggression als normale Diplomatie zu framen.
89
5471
1647
Today's the deadline to apply for an AI-specific teaching track position at UMD:
umd.wd1.myworkdayjobs.com/UMCP/job/Uni...
Please join us!
6 months ago
0
2
0
My students and I are presenting three papers on Monday at
#ACL2025
and this thread will recap them (including their videos).
7 months ago
1
7
2
The precursor to this paper "The Incoherence of Coherence" had our most-watched paper video ever, so I thought we had to surpass it somehow ... so we decided to do a song parody (of Roxanne, obviously):
youtu.be/87OBxEM8a9E
add a skeleton here at some point
7 months ago
0
7
2
We had our first human–computer cooperative AI tournament at the UMD. Key takeaways: 1) computers are getting better at trivia 2) they still suck at calibration 3) our teaming mechanic kept the games competitive and mostly fun (at least that’s what the players said).
8 months ago
1
0
0
Today is the deadline to sign up for our Human-Computer trivia competition held on June 14, 2024 in College Park, MD. $150 prize for the team who can answer the most questions with the help of an AI.
8 months ago
1
1
0
Do you like trivia? Can you spot when AI is feeding you BS? Or can you make AIs turn themselves inside out? Then on June 14 at College Park (or June 21 online), we have a competition for you.
8 months ago
1
0
1
reposted by
Alexander Doria
10 months ago
New Pleias paper: "What the HellaSwag?" HellaSwag is currently on of the most widely LLM benchmarks in the world. We introduce a new critical method to assess the validity of standard LLM evals and show it does not accurately measure common sense reasoning.
arxiv.org/abs/2504.07825
1
36
11
reposted by
Nishant Balepur
12 months ago
This was a really fun paper to put together with Rachel and
@boydgraber.bsky.social
allowing me to vent many of my frustrations working with MCQA over the past year 😪🫡 Please check out the paper, we would love to hear your feedback! 📄👇
1
0
1
reposted by
William Jurayj
12 months ago
🚨 You are only evaluating a slice of your test-time scaling model's performance! 🚨 📈 We consider how models’ confidence in their answers changes as test-time compute increases. Reasoning longer helps models answer more confidently! 📝:
arxiv.org/abs/2502.13962
1
13
11
Is anyone in my network connected to Align to Innovate? Or know somebody who is?
alignbio.org
loading . . .
Align to Innovate
Reproducible. Scalable. Sharable. Improving research science with programmable experiments.
https://alignbio.org/
about 1 year ago
0
0
0
reposted by
Andrew Middleton
about 1 year ago
Hi. I'm Andrew. I own New England's oldest map store because last year I moved across the country after an old guy retired and gave it to me Willy Wonka-style. Visit my store in Rhode Island.
www.mapcenter.com
395
12376
1951
In about half an hour, I'll be doing my annual Q&A session on grad admissions:
youtube.com/live/jVjTbPH...
loading . . .
YouTube
Share your videos with friends, family, and the world
https://youtube.com/live/jVjTbPHLbms
about 1 year ago
0
3
0
reposted by
Birds Are Dinosaurs
about 1 year ago
At its heart, Star Trek is a utopian fantasy about a society so advanced that they are capable of holding productive meetings that last no longer than three minutes
129
10011
2641
I just made my way to Bluesky, so I thought it might be a good opportunity to shamelessly remind people to vote in the ACL board elections (where I'm running for an at large post on a platform of improving virtual conferences). Check your e-mail for "Reminder: ACL 2024 Elections - Please Vote".
about 1 year ago
0
4
0
you reached the end!!
feeds!
log in