Ken Liu
@kzliu.bsky.social
📤 460
📥 64
📝 14
CS PhD @ Stanford AI Lab, Stanford NLP. Prev Google DeepMind.
https://ai.stanford.edu/~kzliu
New paper! We explore a radical paradigm for AI evals: assessing LLMs on *unsolved* questions. Instead of artificially difficult exams where progress ≠ value, we assess LLMs on organic, unsolved problems via reference-free LLM validation & community verification. LLMs solved ~10/500 so far:
5 months ago
2
6
1
go.bsky.app/AKGJ82V
loading . . .
Stanford NLP PhDs
Join the conversation
https://go.bsky.app/AKGJ82V
about 1 year ago
0
1
0
hi
about 1 year ago
2
3
0
you reached the end!!
feeds!
log in