@boydgraber.bsky.social
๐ค 190
๐ฅ 352
๐ 37
Today's the deadline to apply for an AI-specific teaching track position at UMD:
umd.wd1.myworkdayjobs.com/UMCP/job/Uni...
Please join us!
about 1 month ago
0
2
0
My students and I are presenting three papers on Monday at
#ACL2025
and this thread will recap them (including their videos).
2 months ago
1
7
2
The precursor to this paper "The Incoherence of Coherence" had our most-watched paper video ever, so I thought we had to surpass it somehow ... so we decided to do a song parody (of Roxanne, obviously):
youtu.be/87OBxEM8a9E
add a skeleton here at some point
2 months ago
0
7
2
We had our first humanโcomputer cooperative AI tournament at the UMD. Key takeaways: 1) computers are getting better at trivia 2) they still suck at calibration 3) our teaming mechanic kept the games competitive and mostly fun (at least thatโs what the players said).
4 months ago
1
0
0
Today is the deadline to sign up for our Human-Computer trivia competition held on June 14, 2024 in College Park, MD. $150 prize for the team who can answer the most questions with the help of an AI.
4 months ago
1
1
0
Do you like trivia? Can you spot when AI is feeding you BS? Or can you make AIs turn themselves inside out? Then on June 14 at College Park (or June 21 online), we have a competition for you.
4 months ago
1
0
1
reposted by
Alexander Doria
6 months ago
New Pleias paper: "What the HellaSwag?" HellaSwag is currently on of the most widely LLM benchmarks in the world. We introduce a new critical method to assess the validity of standard LLM evals and show it does not accurately measure common sense reasoning.
arxiv.org/abs/2504.07825
1
36
11
reposted by
Nishant Balepur
7 months ago
This was a really fun paper to put together with Rachel and
@boydgraber.bsky.social
allowing me to vent many of my frustrations working with MCQA over the past year ๐ช๐ซก Please check out the paper, we would love to hear your feedback! ๐๐
1
0
1
reposted by
William Jurayj
7 months ago
๐จ You are only evaluating a slice of your test-time scaling model's performance! ๐จ ๐ We consider how modelsโ confidence in their answers changes as test-time compute increases. Reasoning longer helps models answer more confidently! ๐:
arxiv.org/abs/2502.13962
1
12
11
Is anyone in my network connected to Align to Innovate? Or know somebody who is?
alignbio.org
loading . . .
Align to Innovate
Reproducible. Scalable. Sharable. Improving research science with programmable experiments.
https://alignbio.org/
8 months ago
0
0
0
reposted by
Andrew Middleton
10 months ago
Hi. I'm Andrew. I own New England's oldest map store because last year I moved across the country after an old guy retired and gave it to me Willy Wonka-style. Visit my store in Rhode Island.
www.mapcenter.com
399
12421
1964
In about half an hour, I'll be doing my annual Q&A session on grad admissions:
youtube.com/live/jVjTbPH...
loading . . .
YouTube
Share your videos with friends, family, and the world
https://youtube.com/live/jVjTbPHLbms
10 months ago
0
3
0
reposted by
Birds Are Dinosaurs ๐ฆข=๐ฆ
10 months ago
At its heart, Star Trek is a utopian fantasy about a society so advanced that they are capable of holding productive meetings that last no longer than three minutes
129
9972
2628
I just made my way to Bluesky, so I thought it might be a good opportunity to shamelessly remind people to vote in the ACL board elections (where I'm running for an at large post on a platform of improving virtual conferences). Check your e-mail for "Reminder: ACL 2024 Elections - Please Vote".
10 months ago
0
4
0
you reached the end!!
feeds!
log in