Sara Hooker
@sarahooker.bsky.social
๐ค 7696
๐ฅ 161
๐ 50
I lead Cohere For AI. Formerly Research Google Brain. ML Efficiency, LLMs, @trustworthy_ml.
reposted by
Sara Hooker
Princeton Center for Information Technology Policy
8 months ago
โ ๏ธ Leaderboard Illusion: "We find that undisclosed private testing practices benefit a handful of providers who are able to test multiple variants before public release & retract scores if desired..the ability of these providers to choose the best score leads to biased Arena scores" Paper out now!๐ป
1
9
2
It is critical for scientific integrity that we trust our measure of progress. The
@lmarena.bsky.social
has become the go-to evaluation for AI progress. Our release today demonstrates the difficulty in maintaining fair evaluations on the Arena, despite best intentions.
8 months ago
3
42
13
reposted by
Sara Hooker
Marzieh Fadaee
8 months ago
1/ Science is only as strong as the benchmarks it relies on. So how fairโand scientifically rigorousโis todayโs most widely used evaluation benchmark? We took a deep dive into Chatbot Arena to find out. ๐งต
1
29
7
reposted by
Sara Hooker
Jonathan Wenger
9 months ago
This has been a topic close to my heart for a long time. We have an awesome lineup of speakers who have made deep contributions to open-source in ML, e.g.
@sarahooker.bsky.social
,
@chrisrackauckas.bsky.social
, Matt Johnson, Tri Dao,
@stellaathena.bsky.social
, Evan Shelhamer.
add a skeleton here at some point
0
10
2
reposted by
Sara Hooker
Isra Salazar
9 months ago
Today we are releasing Kaleidoscope ๐ A comprehensive multimodal & multilingual benchmark for VLMs! It contains real questions from exams in different languages. ๐ 20,911 questions and 18 languages ๐ 14 subjects (STEM โ Humanities) ๐ธ 55% multimodal questions
1
25
7
It is rare I get to completely disconnect. Very grateful for this week in Patagonia.
10 months ago
1
31
0
reposted by
Sara Hooker
Cohere Labs
10 months ago
We're particularly proud to release Aya Vision 8B - it's compact ๐ญ and efficient ๐, outperforming models up to 11x its size ๐. Releasing open weights helps to make breakthroughs in VLMs accessible to the research community.
1
14
4
reposted by
Sara Hooker
Cohere Labs
10 months ago
Just 2 days after launch, Aya Vision is trending on
@hf.co
๐ฅ๐ฅ We launched open-weights with the goal of making VLM breakthroughs accessible to the research community - so exciting to see such a positive response.
huggingface.co/CohereForAI/...
0
7
2
reposted by
Sara Hooker
(((Steve Chapman)))
10 months ago
Love this post by
@sarahooker.bsky.social
on that other platform: "The first step of any meaningful pursuit is to severely underestimate its difficulty."
0
5
1
reposted by
Sara Hooker
Cohere Labs
10 months ago
Introducing โจ Aya Vision โจ - an open-weights model to connect our world through language and vision Aya Vision adds breakthrough multimodal capabilities to our state-of-the-art multilingual 8B and 32B models. ๐ฟ
loading . . .
1
8
6
reposted by
Sara Hooker
Cohere Labs
11 months ago
๐
loading . . .
0
7
1
reposted by
Sara Hooker
Cohere Labs
11 months ago
An important topic in AI is the climate impacts of the energy-intensive computing hardware needed to train and deploy AI models โก Our policy primer explores ways to move towards more sustainable AI. ๐ฑ ๐
cohere.com/research/pap...
0
2
1
reposted by
Sara Hooker
Cohere Labs
11 months ago
Does more compute equate with greater risk?โก๏ธWhat is our track record predicting what risks emerge with scale? ๐ In this work led by Sara Hooker, we seek to understand the viability of compute thresholds โ๏ธ as a way to mitigate risk. ๐ฆบ
arxiv.org/abs/2407.05694
0
1
1
reposted by
Sara Hooker
Cohere Labs
11 months ago
In this work, we ask "How does model merging stack up when optimizing language models for diverse multitask learning?" ๐๐งฉ ๐https://arxiv.org/abs/2410.10801
0
5
1
reposted by
Sara Hooker
Cohere Labs
12 months ago
Aya Expanse, our open-weight 32B model, outperforms drastically larger models including Claude, Mistral Large 2, & Llama 405B on Scale's Private Multilingual Protocol. We are proud to work on global AI that is efficient and accessible ๐ฅ
1
6
3
reposted by
Sara Hooker
Jekaterina Novikova
12 months ago
Our paper is accepted to ICLR! INCLUDE: Evaluating Multilingual LLMs with Regional Knowledge (
arxiv.org/abs/2411.19799
) A benchmark of ~200k QA pairs across 44 languages, capturing real-world cultural nuances. A collaborative effort led by
@cohereforai.bsky.social
, with contributors worldwide. /1
1
11
4
reposted by
Sara Hooker
Cohere Labs
11 months ago
In this cross-institutional work, we introduce technical governance for AI and 100+ ๐ข open technical problems ๐ง. We provide a taxonomy of open problem areas in TAIG organized by governance capacities and governance targets. ๐https://arxiv.org/pdf/2407.14981
0
2
2
reposted by
Sara Hooker
Cohere Labs
11 months ago
The C4AI Research Grant program is proud to have supported a project focused on building LLM tools for teachers ๐งโ๐ซ This project focused on adapting educational materials to studentsโ skill levels, ensuring more effective and responsible AI integration in classrooms.
1
4
1
Many people have asked me about the France Action Summit. I think a summit is typically most valuable as a catalyst, not as a solution in itself. But, will share some observations.
11 months ago
2
42
12
reposted by
Sara Hooker
Kathy Baxter
11 months ago
Boris Gamazaychikov,
@salesforce.com
Head of
#AI
#Sustainability
announced the AI Energy Score we launched at the AI Action Summit in Paris. ๐ This offers a standardized way to measure & compare the energy efficiency of AI models. ๐ซถ
www.linkedin.com/posts/bgamaz...
loading . . .
https://www.linkedin.com/posts/bgamazay_lack-of-transparency-is-a-fundamental-challenge-activity-7294701669664071682-Lk3N
1
17
4
reposted by
Sara Hooker
Cohere Labs
12 months ago
"Anyone who is serious about what the next generation of models is knows it can't be the current" Thanks to
@baratunde.com
for hosting Head of Cohere For AI,
@sarahooker.bsky.social
on the latest episode of Life with Machines. Check out their full conversation on YouTube:
youtu.be/-BsobAoOJvk
loading . . .
Is AI on the Verge of a Meltdown? | Sara Hooker (Ep. 8)
YouTube video by Baratunde Thurston
https://youtu.be/-BsobAoOJvk
1
13
2
reposted by
Sara Hooker
Cohere Labs
12 months ago
On Scale AI's private multilingual protocol, Aya Expanse is indexed as the best open-weights model. Additionally, in some languages we're outperforming: ๐proprietary models ๐larger models โฐ๏ธmodels built by more researchers with more infrastructure Lots to be proud of today.
1
8
1
As the
@cohereforai.bsky.social
joins the Bluesky family โ we will be sharing paper gems from when we first started as a lab. This paper is part of a larger research agenda where we have focused on how to better represent the long tail = making AI work for almost all real world distributions.
add a skeleton here at some point
12 months ago
1
25
3
Last year we published a fantastic cross-institutional survey on efficiency techniques for language models. Comprehensive and a good starting pointing for researchers working on efficiency.
add a skeleton here at some point
12 months ago
0
9
1
reposted by
Sara Hooker
Cohere Labs
12 months ago
How do we do more ๐ with less ๐? In an era of ever larger models, work on efficiency is ever more important. This cross-institutional collaboration provides a survey of the field for practitioners and researchers alike โ๏ธ. ๐Learn more:
arxiv.org/pdf/2209.000...
0
3
2
reposted by
Sara Hooker
Cohere Labs
12 months ago
We are committed to making meaningful progress in machine learning research through open collaboration. Follow this ๐งตto stay on top of our research contributions.
58
18
2
Changing the spaces where AI breakthroughs happen. โจ Join us. ๐ฅ
add a skeleton here at some point
12 months ago
0
17
3
reposted by
Sara Hooker
Alice Oh
12 months ago
Bye Dagstuhl! Huge thanks to
@841io.bsky.social
et al for organizing, and
@kanarinka.bsky.social
@anitachan.bsky.social
@sarahooker.bsky.social
@haldaume3.bsky.social
and many others for eye opening discussions ๐
2
10
1
British gardens are beautiful even in the gloom
about 1 year ago
1
32
0
reposted by
Sara Hooker
Bob E Hayes
about 1 year ago
This is where the data to build #AI comes from | @melissahei.bsky.social @splendidsteph.bsky.social โWe are using these models all over the world, and thereโs a massive discrepancy between the world weโre seeing and whatโs invisible to these models." @sarahooker.bsky.social
www.technologyreview...
0
10
3
reposted by
Sara Hooker
Bob E Hayes
about 1 year ago
The Reality of #AI and Biorisk "We find that existing studies around AI-related biorisk are nascent, often speculative in nature, or limited in terms of their methodological maturity and transparency." ~ @sarahooker.bsky.social et al.
arxiv.org/abs/2412.0...
#AI
#GenerativeAI
loading . . .
The Reality of AI and Biorisk
To accurately and confidently answer the question 'could an AI model or system increase biorisk', it is necessary to have both a sound theoretical threat model for how AI models or systems could...
https://arxiv.org/abs/2412.01946?utm_source=bluesky&utm_medium=social&utm_campaign=fedica-AI-and-CX
0
5
2
reposted by
Sara Hooker
Ana Brandusescu
about 1 year ago
add a skeleton here at some point
2
120
66
reposted by
Sara Hooker
Bob E Hayes
about 1 year ago
The Reality of #AI and Biorisk "We find that existing studies around AI-related biorisk are nascent, often speculative in nature, or limited in terms of their methodological maturity and transparency." ~ @sarahooker.bsky.social et al.
arxiv.org/abs/2412.0...
#AI
#GenerativeAI
loading . . .
The Reality of AI and Biorisk
To accurately and confidently answer the question 'could an AI model or system increase biorisk', it is necessary to have both a sound theoretical threat model for how AI models or systems could...
https://arxiv.org/abs/2412.01946?utm_source=bluesky&utm_medium=social&utm_campaign=fedica-AI-and-CX
0
9
2
reposted by
Sara Hooker
Marzieh Fadaee
about 1 year ago
๐ Our mission to strengthen the multilingual open-source ecosystem continues!๐
add a skeleton here at some point
1
9
2
reposted by
Sara Hooker
Angelika Romanou
about 1 year ago
Introducing Global-MMLU๐: A multilingual benchmark featuring MMLU translations in 42 languages crafted with: โ Human curation โ Extensive metadata โ Insights into cultural sensitivity Proud to have collaborated with Shivalika Singh,
@sarahooker.bsky.social
and Cohere For AI!
add a skeleton here at some point
0
13
4
reposted by
Sara Hooker
Leshem (Legend) Choshen @EMNLP
about 1 year ago
You would think moral questions are universal, MMLU only asks about US morals... better translations and separation by sensitivity ๐๐
add a skeleton here at some point
0
13
4
reposted by
Sara Hooker
Anka Reuel โก๏ธ NeurIPS
about 1 year ago
Come join us! ๐
add a skeleton here at some point
0
7
2
reposted by
Sara Hooker
Shayne Longpre
about 1 year ago
Interested in how LLMs are really used? We are starting a research project to find out! In collaboration w/
@sarahooker.bsky.social
@ankareuel.bsky.social
and others. We are looking for two junior researchers to join us. Apply by Dec 15th!
forms.gle/H2o3cNCPdG8e...
loading . . .
Google Forms: Sign-in
Access Google Forms with a personal Google account or Google Workspace account (for business use).
https://forms.gle/H2o3cNCPdG8eDke57
0
15
2
Is MMLU Western-centric? ๐ค As part of a massive cross-institutional collaboration: ๐ฝFind MMLU is heavily overfit to western culture ๐ Professional annotation of cultural sensitivity data ๐ Release improved Global-MMLU 42 languages ๐ Paper:
arxiv.org/pdf/2412.03304
๐ Data:
hf.co/datasets/Coh...
about 1 year ago
7
59
18
reposted by
Sara Hooker
Aidan P
about 1 year ago
AI amplifying biorisk has been a major topic in policy & governance work. But does the available evidence match this level of attention? ๐ฆ โ ๏ธ Our new paper looks at the science underpinning ideas that AI could increase biorisks.
arxiv.org/abs/2412.01946
loading . . .
The Reality of AI and Biorisk
To accurately and confidently answer the question 'could an AI model or system increase biorisk', it is necessary to have both a sound theoretical threat model for how AI models or systems could incre...
https://arxiv.org/abs/2412.01946
1
16
3
reposted by
Sara Hooker
Hal Daumรฉ III
about 1 year ago
ยซย noย ยป definitely this was strongly my prior but itโs good to see this worked out to hopefully shape where investments go
add a skeleton here at some point
0
14
2
reposted by
Sara Hooker
Antoine Bosselut
about 1 year ago
Translating MMLU is great, but global users of multilingual
#LLMs
don't care all that much about an LLM's understanding of US Law! Our new
#NLProc
work centers multilingual
#LLM
evaluations toward regional knowledge in 44 languages.
add a skeleton here at some point
1
28
3
reposted by
Sara Hooker
Marzieh Fadaee
about 1 year ago
INCLUDE is a massive benchmark across 44 languages curated from 52 countries and includes both regional and cultural knowledge.
1
9
3
reposted by
Sara Hooker
Marzieh Fadaee
about 1 year ago
Good performance shouldnโt mean 'just in English' anymore ๐ชฉ We provide a robust way to assess models with a new benchmark that captures in-language nuances and cultural contexts.
add a skeleton here at some point
1
18
4
AI amplifying biorisk has been a major focus in AI policy & governance work. Is the spotlight merited? Our recent cross-institutional work asks: Does the available evidence match the current level of attention? ๐
arxiv.org/abs/2412.01946
about 1 year ago
2
59
13
I'll be at NeurIPS next week -- looking forward to seeing many of you there! Let the Vancouver cantonese and sushi food tour begin.
about 1 year ago
2
46
1
reposted by
Sara Hooker
Angelika Romanou
about 1 year ago
To build INCLUDE, we collected ~200K MCQ data from 44 languages and 58 knowledge domains, collected from local sources in 52 countries, representing a rich array of cultural and regional knowledge.
1
6
1
reposted by
Sara Hooker
Angelika Romanou
about 1 year ago
๐ค Why is regional knowledge so important? Users expect
#LLMs
to know information relevant to their environmentsโ customs, culture, etc. To be relevant & relatable, LLMs need to know these nuances. It's not just global knowledge; it's about meeting user needs where they are.
1
4
1
Load more
feeds!
log in