Ivan Kartáč
@ivankartac.bsky.social
📤 75
📥 244
📝 3
PhD student @ Charles University. Working on evaluation, explainability, and reasoning in NLP.
reposted by
Ivan Kartáč
Simon Willison
9 days ago
Short musings on "cognitive debt" - I'm seeing this in my own work, where excessive unreviewed AI-generated code leads me to lose a firm mental model of what I've built, which then makes it harder to confidently make future decisions
simonwillison.net/2026/Feb/15/...
loading . . .
How Generative and Agentic AI Shift Concern from Technical Debt to Cognitive Debt
This piece by Margaret-Anne Storey is the best explanation of the term cognitive debt I've seen so far. Cognitive debt, a term gaining traction recently, instead communicates the notion that …
https://simonwillison.net/2026/Feb/15/cognitive-debt/
43
463
105
reposted by
Ivan Kartáč
Kyle Lo
27 days ago
The 5th Generation, Evaluation, and Metrics (GEM) Workshop will be at
#ACL2026
! Call for papers is out. Topics include: 🐟 LMs as evaluators 🐠 Living benchmarks 🍣 Eval with humans and more New for 2026: Opinion & Statement Papers! Full CFP:
gem-workshop.com/call-for-pap...
0
21
8
reposted by
Ivan Kartáč
Melanie Mitchell
about 1 month ago
My latest on Substack -- a write-up of the talk I gave at NeurIPS in December.
aiguide.substack.com/p/on-evaluat...
loading . . .
On Evaluating Cognitive Capabilities in Machines (and Other "Alien" Intelligences)
(Apologies for the length of this post, which means it gets cut off in the email version.
https://aiguide.substack.com/p/on-evaluating-cognitive-capabilities
0
122
40
Our paper "OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs" has been accepted to
#INLG2025
conference! You can read the preprint here:
arxiv.org/abs/2503.11858
6 months ago
1
4
2
reposted by
Ivan Kartáč
Institute of Formal and Applied Linguistics
7 months ago
#ACL2025NLP
in Vienna 🇦🇹 starts today with 23 🤯
@ufal-cuni.bsky.social
folks presenting their work both at the main conference and workshops. Check out our main conference papers today and on Wednesday 👇
1
22
9
reposted by
Ivan Kartáč
Ondrej Dusek
10 months ago
Slides and links to papers at
bit.ly/mlprague25-od
🤓
loading . . .
Ondrej Dusek MLPrague 2025
Evaluating LLM outputs with humans and LLMs Ondřej Dušek MLPrague 30 April 2025 These slides: https://bit.ly/mlprague25-od
https://bit.ly/mlprague25-od
0
2
2
reposted by
Ivan Kartáč
Institute of Formal and Applied Linguistics
10 months ago
Today,
@tuetschek.bsky.social
shared the work of his team on evaluating LLM text generation with both human annotation frameworks and LLM-based metrics. Their approach tackles the benchmark data leakage problem and how to get unseen data for unbiased LLM testing.
1
8
3
reposted by
Ivan Kartáč
Zdeněk Kasner
10 months ago
How do LLMs compare to human crowdworkers in annotating text spans? 🧑🤖 And how can span annotation help us with evaluating texts? Find out in our new paper:
llm-span-annotators.github.io
Arxiv:
arxiv.org/abs/2504.08697
loading . . .
Large Language Models as Span Annotators
Website for the paper Large Language Models as Span Annotators
https://llm-span-annotators.github.io
1
20
9
you reached the end!!
feeds!
log in