Alex Gill (@agill32.bsky.social)

𝐖𝐡𝐚𝐭 𝐇𝐚𝐬 𝐁𝐞𝐞𝐧 𝐋𝐨𝐬𝐭 𝐖𝐢𝐭𝐡 𝐒𝐲𝐧𝐭𝐡𝐞𝐭𝐢𝐜 𝐄𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧? (arxiv.org/abs/2505.22830) I'm happy to announce that the preprint release of my first project is online! Developed with the amazing support of @lasha.bsky.social & @anamarasovic.bsky.social

loading . . .

What Has Been Lost with Synthetic Evaluation? Large language models (LLMs) are increasingly used for data generation. However, creating evaluation benchmarks raises the bar for this emerging paradigm. Benchmarks must target specific phenomena, pe... http://arxiv.org/abs/2505.22830