Danny To Eun Kim
@teknology.bsky.social
📤 1036
📥 421
📝 22
PhD student @CMU LTI NLP | IR | Evaluation | RAG
https://kimdanny.github.io
reposted by
Danny To Eun Kim
Changdae Oh
about 1 month ago
We tend to conflate "autonomy" with "reliability" in AI agents. But autonomy without trust is catastrophically dangerous. Our new paper formalizes UQ for LLM agents, proposes a new lens: agent uncertainty as a conditional uncertainty reduction process. 📄
huggingface.co/papers/2602....
1
4
1
reposted by
Danny To Eun Kim
Shaily
about 2 months ago
🎭 How do LLMs (mis)represent culture? 🧮 How often? 🧠 Misrepresentations = missing knowledge? spoiler: NO! At
#CHI2026
we are bringing ✨TALES✨ a participatory evaluation of cultural (mis)reps & knowledge in multilingual LLM-stories for India 📜
arxiv.org/abs/2511.21322
1/10
1
46
24
#ChatGPT
began to put ads in their response. Check our paper on “how fair ranking can positively impact the LLM response and content/ad exposure”.
dl.acm.org/doi/10.1145/...
add a skeleton here at some point
2 months ago
0
4
0
#chatGPT
began to put ads in their response. Check out our paper on “Ads detection and integration in the era of LLMs”.
ceur-ws.org/Vol-4038/pap...
add a skeleton here at some point
2 months ago
0
1
0
reposted by
Danny To Eun Kim
Fernando Diaz
3 months ago
as AI increasingly supports shopping and ads, it’s worth remembering that retrieval often shapes who gets exposure in final generated output. in a recent paper,
@teknology.bsky.social
uses methods from fair ranking to assess and address exposure bias in downstream generation.
841.io/doc/fairrag....
0
9
4
Excited to present at
#CLEF2025
#Touché
Lab (Session 2) shared task "Advertisement in RAG"🇪🇸!
@webis.de
🗓️Sept 9 (Tue) ⏲️5:20PM (CEST) / 11:20AM (EST) 📍Florentino Sanz Room 🧠https://arxiv.org/abs/2507.00509 Join us for insights on
#RAG
+ advertising!
6 months ago
0
1
1
reposted by
Danny To Eun Kim
Bhaskar Mitra | ভাস্কর মিত্র
7 months ago
Some exciting news! 🤗 After 3 amazing years at TREC, the Tip-of-the-Tongue (ToT) shared task will be a core task at NTCIR-19 in 2026. The new track will focus on tip-of-the-tongue information needs in English and East Asian languages. More details coming soon. See you all in Tokyo next year!
0
5
3
reposted by
Danny To Eun Kim
Bhaskar Mitra | ভাস্কর মিত্র
7 months ago
Gentle reminder 📢 All run submissions for the Tip-of-the-Tongue (ToT) Track are due next week Wednesday (Aug 27). More info:
trec-tot.github.io/guidelines
#TREC2025
#TRECToT
#TREC2025ToT
add a skeleton here at some point
0
2
3
This year's TREC Tip of the Tongue (ToT) track will be amazing! Based on our rigorous experiments on synthetic ToT query generation presented at
#SIGIR2025
, we extended the track to open domain ToT queries. We provide codes for baseline systems, and submissions are due by August 27th!
add a skeleton here at some point
8 months ago
0
1
1
reposted by
Danny To Eun Kim
Maik Fröbe
8 months ago
To Eun Kim just presented the work on "Tip of the Tongue Query Elicitation for Simulated Evaluation" at
#SIGIR2025
. The approach will be used in the
#TREC2025
Tip-of-the-Tongue track, and we had some sweets at the poster :) The paper is available online:
dl.acm.org/doi/10.1145/...
0
12
3
reposted by
Danny To Eun Kim
Bhaskar Mitra | ভাস্কর মিত্র
8 months ago
Hello TREC-ToTers! We have released the test queries for the TREC 2025 Tip-of-the-Tongue (TREC-ToT) Track. Please see the guidelines for more information:
trec-tot.github.io/guidelines
. Run submission deadline will tentatively be in August.
#TREC2025
#TRECToT
#TREC2025ToT
Please spread the word!
add a skeleton here at some point
0
3
4
❓How do LLMs respond to fair ranking in RAG? 🤩 See how fair ranking boosts downstream utility while promoting fairer attribution of cited sources. Catch our oral presentation at
#ICTIR2025
!
#SIGIR2025
@841io.bsky.social
add a skeleton here at some point
8 months ago
0
7
1
reposted by
Danny To Eun Kim
Maik Fröbe
9 months ago
Do not forget to participate in the
#TREC2025
Tip-of-the-Tongue (ToT) Track :) The corpus and baselines (with run files) are now available and easily accessible via the ir_datasets API and the HuggingFace Datasets API. More details are available at:
trec-tot.github.io/guidelines
0
11
7
reposted by
Danny To Eun Kim
Shaily
9 months ago
🖋️ Curious how writing differs across (research) cultures? 🚩 Tired of “cultural” evals that don't consult people? We engaged with interdisciplinary researchers to identify & measure ✨cultural norms✨in scientific writing, and show that❗LLMs flatten them❗ 📜
arxiv.org/abs/2506.00784
[1/11]
1
72
35
reposted by
Danny To Eun Kim
Bhaskar Mitra | ভাস্কর মিত্র
11 months ago
Hello TREC-ToTers! 👋🏽 Excited to announce the release of TREC 2025 Tip-of-the-Tongue (TREC-ToT) Track guidelines:
trec-tot.github.io/guidelines
. We will release test queries in July and run submission deadline will be in August.
#TREC2025
#TRECToT
#TREC2025ToT
Please register to participate:
loading . . .
TREC 2025 Tip-of-the-Tongue (ToT) Track
Tip of the tongue: The phenomenon of failing to retrieve something from memory, combined with partial recall and the feeling that retrieval is imminent.
https://trec-tot.github.io/guidelines
0
4
3
reposted by
Danny To Eun Kim
Athiya Deviyani
11 months ago
Ever trusted a metric that works great on average, only for it to fail in your specific use case? In our
#NAACL2025
paper (w/
@841io.bsky.social
), we show why global evaluations are not enough and why context matters more than you think. 📄
aclanthology.org/2025.finding...
#NLP
#Evaluation
(🧵1/9)
1
23
7
reposted by
Danny To Eun Kim
Fernando Diaz
11 months ago
If you're interested in OpenAI including shopping results, you might also be interested in
@teknology.bsky.social
's paper relating retrieval diversity/fairness and generation by downstream RAG models. This has implications for individuals selling products online.
arxiv.org/abs/2409.11598
loading . . .
Towards Fair RAG: On the Impact of Fair Ranking in Retrieval-Augmented Generation
Modern language models frequently include retrieval components to improve their outputs, giving rise to a growing number of retrieval-augmented generation (RAG) systems. Yet, most existing work in RAG...
https://arxiv.org/abs/2409.11598
0
9
3
reposted by
Danny To Eun Kim
Fernando Diaz
12 months ago
If you're working on a recall-oriented task or with ranking systems evaluated across varied users, content, or intents, check it out. 5/5
dl.acm.org/doi/10.1145/...
0
1
2
reposted by
Danny To Eun Kim
Fernando Diaz
12 months ago
📢 New Paper: "Recall, Robustness, and Lexicographic Evaluation" (ACM TORS) F Diaz, M Ekstrand (
@md.ekstrandom.net
), B Mitra (
@bmitra.bsky.social
) For IR, NLP, and ML researchers working on ranking systems evaluated for recall and robustness. 🧵 1/5
dl.acm.org/doi/10.1145/...
1
14
6
🚨New Breakthrough in Tip-of-the-Tongue (TOT) Retrieval Research! We address data limitations and offer a fresh evaluation method for these complex queries. Curious how TREC TOT track test queries are created? Check out this thread 🧵 and our paper 📄:
arxiv.org/abs/2502.17776
loading . . .
Tip of the Tongue Query Elicitation for Simulated Evaluation
Tip-of-the-tongue (TOT) search occurs when a user struggles to recall a specific identifier, such as a document title. While common, existing search systems often fail to effectively support TOT scena...
https://arxiv.org/abs/2502.17776
about 1 year ago
2
18
9
reposted by
Danny To Eun Kim
Akhila Yerukola
about 1 year ago
Did you know? Gestures used to express universal concepts—like wishing for luck—vary DRAMATICALLY across cultures? 🤞means luck in US but deeply offensive in Vietnam 🚨 📣 We introduce MC-SIGNS, a test bed to evaluate how LLMs/VLMs/T2I handle such nonverbal behavior! 📜:
arxiv.org/abs/2502.17710
1
33
10
Heading to
#NeurIPS2024
to present our ‘Fair RAG’ paper at the
#AFME2024
workshop! Let's talk about RAG, Information Retrieval, and Fairness. Honored that our paper was selected as one of the Top 5 Spotlight Papers! 🎉 Let’s connect and chat! Paper:
arxiv.org/abs/2409.11598
loading . . .
Towards Fair RAG: On the Impact of Fair Ranking in Retrieval-Augmented Generation
Many language models now enhance their responses with retrieval capabilities, leading to the widespread adoption of retrieval-augmented generation (RAG) systems. However, despite retrieval being a cor...
https://arxiv.org/abs/2409.11598
over 1 year ago
1
11
5
reposted by
Danny To Eun Kim
Andrew Drozdov
over 1 year ago
Slides are up! I presented on "Presentation & Consumption in the context of REML" The full deck is here. There's a lot of gems if you're interested in this space!
retrieval-enhanced-ml.github.io/sigir-ap2024...
add a skeleton here at some point
0
15
6
Those who are attending
#SIGIRAP2024
, come by and learn how retrieval can enhance ML models!
add a skeleton here at some point
over 1 year ago
1
8
1
reposted by
Danny To Eun Kim
Orion Weller
over 1 year ago
Creating a 🦋 starter pack for people working in IR/RAG:
go.bsky.app/88ULgwY
I can’t seem to find everyone though, help definitely appreciated to fill this out (DM or comment)!
add a skeleton here at some point
32
86
24
reposted by
Danny To Eun Kim
Andrew Drozdov
over 1 year ago
Mat is not on 🦋—posting on his behalf! It's time to revisit common assumptions in IR! Embeddings have improved drastically, but mainstream IR evals have stagnated since MSMARCO + BEIR. We ask: on private or tricky IR tasks, are rerankers better? Surely, reranking many docs is best?
4
81
28
reposted by
Danny To Eun Kim
Martin Potthast
over 1 year ago
Time for a starter pack on information retrieval:
go.bsky.app/MXPJoTn
add a skeleton here at some point
17
43
21
reposted by
Danny To Eun Kim
michael ginn
over 1 year ago
Hey all! I started a second starter pack with people who didn't make the first one, please let me know if you'd like to be added:
go.bsky.app/JgneRQk
add a skeleton here at some point
70
65
38
reposted by
Danny To Eun Kim
Sireesh Gururaja
over 1 year ago
I'm keeping track of people at the CMU Language Technologies Institute here:
go.bsky.app/NhTwCVb
. Follow along!
add a skeleton here at some point
0
7
3
you reached the end!!
feeds!
log in