@cyrfrench.bsky.social
๐ค 13
๐ฅ 20
๐ 2
reposted by
The Onion
5 days ago
Police Ask For Publicโs Help In Falsifying Report
loading . . .
43
3977
891
It looks to me that the vibe coding trend is incredibly jinxed: you hear amazing success story everywhere, but, alas, as soon as a project goes public, kludges, dangerous shortcuts and incredible bugs reveal themselves.
5 days ago
0
0
0
reposted by
Jodie Burchell
19 days ago
Have you ever looked at the impressive results that LLMs get on benchmarks and wondered if these results are everything they seem? If you'd like to learn about how data leakage calls the results we see on LLM performance into question, check out my latest blog post.
t-redactyl.io/posts/2025-1...
loading . . .
Data leakage is a major issue when measuring LLM performance
Why data leakage and benchmark contamination distort LLM performance claims, from coding puzzles to the LM Arena and training data exhaustion.
https://t-redactyl.io/posts/2025-12-30-data-leakage-llm-measurement/
0
13
10
you reached the end!!
feeds!
log in