Julia Kreutzer
@juliakreutzer.bsky.social
📤 162
📥 169
📝 20
NLP & ML research
@cohereforai.bsky.social
Ready for our poster today at
#COLM2025
! 💭This paper has had an interesting journey, come find out and discuss with us!
@swetaagrawal.bsky.social
@kocmitom.bsky.social
Side note: being a parent in research does have its perks, poster transportation solved ✅
add a skeleton here at some point
1 day ago
0
9
1
reposted by
Julia Kreutzer
Cohere Labs
9 days ago
We’re not your average lab. We’re a hybrid research environment dedicated to revolutionizing the ML space. And we’re hiring a Senior Research Scientist to co-create with us. If you believe in research as a shared, global effort — this is your chance.
1
4
3
💡A collaborative➕diverse team is key. In real life as in the LLM world 💪🦾 Check out our latest work that builds on this insight. 👇
add a skeleton here at some point
7 days ago
1
3
1
reposted by
Julia Kreutzer
Marzieh Fadaee
about 2 months ago
Breaking into AI research is harder than ever, and early-career researchers face fewer chances to get started. Entry points matter. We started the Scholars Program 3 years ago to give new researchers a real shot — excited to open applications for year 4✨
add a skeleton here at some point
1
6
3
reposted by
Julia Kreutzer
Cohere Labs
about 2 months ago
While effective for chess♟️, Elo ratings struggle with LLM evaluation due to volatility and transitivity issues. New post in collaboration with AI Singapore explores why Elo falls short for AI leaderboards and how we can do better.
1
6
3
reposted by
Julia Kreutzer
Conference on Language Modeling
3 months ago
COLM 2025 is now accepting applications for: Financial Assistance Application --
docs.google.com/forms/d/e/1F...
Volunteer Application --
docs.google.com/forms/d/e/1F...
Childcare Financial Assistance Application --
docs.google.com/forms/d/e/1F...
All due by July 31
loading . . .
COLM 2025 Financial Assistance Application
Goal of the Financial Assistance Program. We at COLM believe our community should be diverse and inclusive. We recognize that some might be less likely to attend because of financial burden of travel ...
https://docs.google.com/forms/d/e/1FAIpQLSfXcDk9zFxjtDNCmS5QUJew7QUUdWF-Zy8FfwMDt0xEpZBSdg/viewform
0
6
4
🍋 Squeezing the most of few samples - check out our LLMonade recipe for few-sample test-time scaling in multitask environments. Turns out that standard methods miss out on gains on non-English languages. We propose more robust alternatives. Very proud of this work that our scholar Ammar led! 🚀
add a skeleton here at some point
3 months ago
0
4
1
🚨LLM safety research needs to be at least as multilingual as our models. What's the current stage and how to progress from here? This work led by
@yongzx.bsky.social
has answers! 👇
add a skeleton here at some point
4 months ago
0
4
2
🚧No LLM safety without multilingual safety - what is missing to closing the language gap? And where does this gap actually originate from? Answers 👇
add a skeleton here at some point
4 months ago
0
1
1
Multilingual 🤝reasoning 🤝 test-time scaling 🔥🔥🔥 New preprint!
@yongzx.bsky.social
has all the details 👇
add a skeleton here at some point
5 months ago
0
5
1
reposted by
Julia Kreutzer
Marzieh Fadaee
5 months ago
1/ Science is only as strong as the benchmarks it relies on. So how fair—and scientifically rigorous—is today’s most widely used evaluation benchmark? We took a deep dive into Chatbot Arena to find out. 🧵
1
28
7
🤓MT eyes on multilingual LLM benchmarks 👉 Here's a bunch of simple techniques that we could adopt easily, and in total get a much richer understanding of where we are with multilingual LLMs. 🍬Bonus question: how can we spur research on evaluation of evaluations?
add a skeleton here at some point
6 months ago
0
3
0
reposted by
Julia Kreutzer
Tom Kocmi
6 months ago
Tired of messy non-replicable multilingual LLM evaluation? So were we. In our new paper, we experimentally illustrate common eval. issues and present how structured evaluation design, transparent reporting, and meta-evaluation can help us to build stronger models.
add a skeleton here at some point
0
7
1
📖New preprint with Eleftheria Briakou
@swetaagrawal.bsky.social
@mziizm.bsky.social
@kocmitom.bsky.social
!
arxiv.org/abs/2504.11829
🌍It reflects experiences from my personal research journey: coming from MT into multilingual LLM research I missed reliable evaluations and evaluation research…
6 months ago
1
11
4
reposted by
Julia Kreutzer
Cohere Labs
6 months ago
🚀 We are excited to introduce Kaleidoscope, the largest culturally-authentic exam benchmark. 📌 Most VLM benchmarks are English-centric or rely on translations—missing linguistic & cultural nuance. Kaleidoscope expands in-language multilingual 🌎 & multimodal 👀 VLMs evaluation
1
18
9
reposted by
Julia Kreutzer
Tom Kocmi
7 months ago
☀️ Summer internship at Cohere! Are you excited about multilingual evaluation, human judgment, or meta-eval? Come help us explore how a rigorous eval really looks like while questioning the status quo in LLM evaluation. I’m looking for an intern (EU timezone preferred), are you interested? Ping me!
2
7
2
reposted by
Julia Kreutzer
Marzieh Fadaee
7 months ago
Command🅰️ technical report is out. Information-dense. Detailed. Pretty. Simply A+! 💎:
cohere.com/research/pap...
add a skeleton here at some point
1
5
1
reposted by
Julia Kreutzer
Conference on Language Modeling
7 months ago
A bit of a mess around the conflict of COLM with the ARR (and to lesser degree ICML) reviews release. We feel this is creating a lot of pressure and uncertainty. So, we are pushing our deadlines: Abstracts due March 22 AoE (+48hr) Full papers due March 28 AoE (+24hr) Plz RT 🙏
3
37
33
💬The first Q&A starts in a few hours. 🔔Also, a reminder to create your Open review profile if you haven't already. Non-institutional accounts require a verification process that can take time. One week till the abstract deadline!
add a skeleton here at some point
7 months ago
0
0
2
reposted by
Julia Kreutzer
Cohere Labs
7 months ago
We’re excited to bring back Expedition Aya 🌍 A 6-week open build challenge to accelerate ML research progress in multilingual, multimodal and efficiency. Join us to expand the world that AI sees.
loading . . .
1
3
1
✨ Multilingual language modeling meets WMT✨ very exciting opportunity to get WMT-style evaluations for MLLMs: unseen tests, human evaluation, meta-evaluation, and that for multiple languages and tasks. Almost too good to be true! 🤩
add a skeleton here at some point
7 months ago
0
2
0
reposted by
Julia Kreutzer
Conference on Language Modeling
7 months ago
COLM's
@juliakreutzer.bsky.social
and
@abosselut.bsky.social
will hold two paper submission Q&A sessions. We run a simple process, but figured this can help authors, especially first-time authors. March 12:
dateful.com/eventlink/14...
March 13:
dateful.com/eventlink/83...
Plz RT 🙏
0
5
4
reposted by
Julia Kreutzer
Tom Kocmi
8 months ago
Guess what? The jubilee 🎉 20th iteration of WMT General MT 🎉 is here, and we want you to participate - as the entry barrier to make an impact is so low! This isn’t just any repeat. We’ve kept what worked, removed what was outdated, and introduced many exciting new twists! Among the key changes are:
1
18
8
reposted by
Julia Kreutzer
Cohere Labs
7 months ago
Who will triumph - Aya Vision or its creators? 🤼 We challenged the research team behind Aya Vision to mystery object trivia. 💎🌽🧈
www.youtube.com/watch?v=iQEd...
loading . . .
Aya Vision Challenge
YouTube video by Cohere
https://www.youtube.com/watch?v=iQEd0KkXnPg&feature=youtu.be&themeRefresh=1
1
4
1
reposted by
Julia Kreutzer
Marzieh Fadaee
7 months ago
✨👓 Aya Vision is here 👓✨ A multilingual, multimodal model designed to understand across languages and modalities (text, images, etc) to bridge the language gap and empower global users!
1
4
2
you reached the end!!
feeds!
log in