Gabi Stanovsky
@gabistanovsky.bsky.social
๐ค 298
๐ฅ 224
๐ 3
Assistant professor at the Hebrew University.
pinned post!
There's a lot of talk about regulating AI, but do regulators know the technology well enough? In our new paper, we survey major reg efforts & find they rely on benchmarking, which we know to be problematic. How did this happen & what can we do about it?
arxiv.org/pdf/2501.15693
8 months ago
1
0
4
reposted by
Gabi Stanovsky
Itay Itzhak
2 months ago
๐จNew paper alert๐จ ๐ง Instruction-tuned LLMs show amplified cognitive biases โ but are these new behaviors, or pretraining ghosts resurfacing? Excited to share our new paper, accepted to CoLM 2025๐! See thread below ๐
#BiasInAI
#LLMs
#MachineLearning
#NLProc
1
4
1
reposted by
Gabi Stanovsky
6 months ago
Can RAG performance get * worse * with more relevant documents?๐ We put the number of retrieved documents in RAG to the test! ๐ฅPreprint๐ฅ:
arxiv.org/abs/2503.04388
1/3
2
3
3
reposted by
Gabi Stanovsky
Adi Simhi
7 months ago
๐จNew arXiv preprint!๐จ LLMs can hallucinate - but did you know they can do so with high certainty even when they know the correct answer? ๐คฏ We find those hallucinations in our latest work with
@itay-itzhak.bsky.social
,
@fbarez.bsky.social
,
@gabistanovsky.bsky.social
and Yonatan Belinkov
3
21
12
reposted by
Gabi Stanovsky
Sebastian Gehrmann
7 months ago
GEM is so back! Our workshop for Generation, Evaluation, and Metrics is coming to an ACL near you. Evaluation in the world of GenAI is more important than ever, so please consider submitting your amazing work. CfP can be found at
gem-benchmark.com/workshop
0
9
6
A vote to stop defining what's LLMs at the start of every paper
8 months ago
0
1
0
There's a lot of talk about regulating AI, but do regulators know the technology well enough? In our new paper, we survey major reg efforts & find they rely on benchmarking, which we know to be problematic. How did this happen & what can we do about it?
arxiv.org/pdf/2501.15693
8 months ago
1
0
4
you reached the end!!
feeds!
log in