Alex Gill
@agill32.bsky.social
๐ค 388
๐ฅ 362
๐ 9
NLP researcher at U of U
reposted by
Alex Gill
Nathan Kalman-Lamb
about 2 months ago
Folks, I donโt know how itโs possible, but it gets funnier.
16
475
139
I'll be in Suzhou ๐จ๐ณ at
#EMNLP
this week presenting "What has been Lost with Synthetic Evaluation?" done with
@anamarasovic.bsky.social
&
@lasha.bsky.social
! ๐ ๐Findings Session 1 - Hall C ๐ Wed, November 5, 13:00 - 14:00
arxiv.org/abs/2505.22830
3 months ago
0
11
3
reposted by
Alex Gill
Women in AI Research - WiAIR
3 months ago
๐ง Can large language models build the very benchmarks used to evaluate them? In โWhat Has Been Lost with Synthetic Evaluationโ, Ana Marasoviฤ (
@anamarasovic.bsky.social
) and collaborators ask what happens when LLMs start generating the datasets used to test their reasoning. (1/6๐งต)
2
9
3
๐๐ก๐๐ญ ๐๐๐ฌ ๐๐๐๐ง ๐๐จ๐ฌ๐ญ ๐๐ข๐ญ๐ก ๐๐ฒ๐ง๐ญ๐ก๐๐ญ๐ข๐ ๐๐ฏ๐๐ฅ๐ฎ๐๐ญ๐ข๐จ๐ง? (
arxiv.org/abs/2505.22830
) I'm happy to announce that the preprint release of my first project is online! Developed with the amazing support of
@lasha.bsky.social
&
@anamarasovic.bsky.social
loading . . .
What Has Been Lost with Synthetic Evaluation?
Large language models (LLMs) are increasingly used for data generation. However, creating evaluation benchmarks raises the bar for this emerging paradigm. Benchmarks must target specific phenomena, pe...
http://arxiv.org/abs/2505.22830
8 months ago
1
11
5
you reached the end!!
feeds!
log in