Sara Hooker
@sarahooker.bsky.social
๐ค 7804
๐ฅ 161
๐ 50
I lead Cohere For AI. Formerly Research Google Brain. ML Efficiency, LLMs, @trustworthy_ml.
reposted by
Sara Hooker
Princeton Center for Information Technology Policy
10 months ago
โ ๏ธ Leaderboard Illusion: "We find that undisclosed private testing practices benefit a handful of providers who are able to test multiple variants before public release & retract scores if desired..the ability of these providers to choose the best score leads to biased Arena scores" Paper out now!๐ป
1
9
2
It is critical for scientific integrity that we trust our measure of progress. The
@lmarena.bsky.social
has become the go-to evaluation for AI progress. Our release today demonstrates the difficulty in maintaining fair evaluations on the Arena, despite best intentions.
10 months ago
3
42
13
reposted by
Sara Hooker
Marzieh Fadaee
10 months ago
1/ Science is only as strong as the benchmarks it relies on. So how fairโand scientifically rigorousโis todayโs most widely used evaluation benchmark? We took a deep dive into Chatbot Arena to find out. ๐งต
1
28
7
reposted by
Sara Hooker
Jonathan Wenger
11 months ago
This has been a topic close to my heart for a long time. We have an awesome lineup of speakers who have made deep contributions to open-source in ML, e.g.
@sarahooker.bsky.social
,
@chrisrackauckas.bsky.social
, Matt Johnson, Tri Dao,
@stellaathena.bsky.social
, Evan Shelhamer.
add a skeleton here at some point
0
10
2
reposted by
Sara Hooker
Isra Salazar
11 months ago
Today we are releasing Kaleidoscope ๐ A comprehensive multimodal & multilingual benchmark for VLMs! It contains real questions from exams in different languages. ๐ 20,911 questions and 18 languages ๐ 14 subjects (STEM โ Humanities) ๐ธ 55% multimodal questions
1
25
7
It is rare I get to completely disconnect. Very grateful for this week in Patagonia.
12 months ago
1
31
0
reposted by
Sara Hooker
Cohere Labs
12 months ago
We're particularly proud to release Aya Vision 8B - it's compact ๐ญ and efficient ๐, outperforming models up to 11x its size ๐. Releasing open weights helps to make breakthroughs in VLMs accessible to the research community.
1
14
4
reposted by
Sara Hooker
Cohere Labs
12 months ago
Just 2 days after launch, Aya Vision is trending on
@hf.co
๐ฅ๐ฅ We launched open-weights with the goal of making VLM breakthroughs accessible to the research community - so exciting to see such a positive response.
huggingface.co/CohereForAI/...
0
7
2
reposted by
Sara Hooker
(((Steve Chapman)))
12 months ago
Love this post by
@sarahooker.bsky.social
on that other platform: "The first step of any meaningful pursuit is to severely underestimate its difficulty."
0
5
1
reposted by
Sara Hooker
Cohere Labs
12 months ago
Introducing โจ Aya Vision โจ - an open-weights model to connect our world through language and vision Aya Vision adds breakthrough multimodal capabilities to our state-of-the-art multilingual 8B and 32B models. ๐ฟ
loading . . .
1
8
6
reposted by
Sara Hooker
Cohere Labs
about 1 year ago
๐
loading . . .
0
7
1
reposted by
Sara Hooker
Cohere Labs
about 1 year ago
An important topic in AI is the climate impacts of the energy-intensive computing hardware needed to train and deploy AI models โก Our policy primer explores ways to move towards more sustainable AI. ๐ฑ ๐
cohere.com/research/pap...
0
2
1
reposted by
Sara Hooker
Cohere Labs
about 1 year ago
Does more compute equate with greater risk?โก๏ธWhat is our track record predicting what risks emerge with scale? ๐ In this work led by Sara Hooker, we seek to understand the viability of compute thresholds โ๏ธ as a way to mitigate risk. ๐ฆบ
arxiv.org/abs/2407.05694
0
1
1
reposted by
Sara Hooker
Cohere Labs
about 1 year ago
In this work, we ask "How does model merging stack up when optimizing language models for diverse multitask learning?" ๐๐งฉ ๐https://arxiv.org/abs/2410.10801
0
5
1
reposted by
Sara Hooker
Cohere Labs
about 1 year ago
Aya Expanse, our open-weight 32B model, outperforms drastically larger models including Claude, Mistral Large 2, & Llama 405B on Scale's Private Multilingual Protocol. We are proud to work on global AI that is efficient and accessible ๐ฅ
1
6
3
reposted by
Sara Hooker
Jekaterina Novikova
about 1 year ago
Our paper is accepted to ICLR! INCLUDE: Evaluating Multilingual LLMs with Regional Knowledge (
arxiv.org/abs/2411.19799
) A benchmark of ~200k QA pairs across 44 languages, capturing real-world cultural nuances. A collaborative effort led by
@cohereforai.bsky.social
, with contributors worldwide. /1
1
11
4
reposted by
Sara Hooker
Cohere Labs
about 1 year ago
In this cross-institutional work, we introduce technical governance for AI and 100+ ๐ข open technical problems ๐ง. We provide a taxonomy of open problem areas in TAIG organized by governance capacities and governance targets. ๐https://arxiv.org/pdf/2407.14981
0
2
2
reposted by
Sara Hooker
Cohere Labs
about 1 year ago
The C4AI Research Grant program is proud to have supported a project focused on building LLM tools for teachers ๐งโ๐ซ This project focused on adapting educational materials to studentsโ skill levels, ensuring more effective and responsible AI integration in classrooms.
1
4
1
Many people have asked me about the France Action Summit. I think a summit is typically most valuable as a catalyst, not as a solution in itself. But, will share some observations.
about 1 year ago
2
42
12
reposted by
Sara Hooker
Kathy Baxter
about 1 year ago
Boris Gamazaychikov,
@salesforce.com
Head of
#AI
#Sustainability
announced the AI Energy Score we launched at the AI Action Summit in Paris. ๐ This offers a standardized way to measure & compare the energy efficiency of AI models. ๐ซถ
www.linkedin.com/posts/bgamaz...
loading . . .
https://www.linkedin.com/posts/bgamazay_lack-of-transparency-is-a-fundamental-challenge-activity-7294701669664071682-Lk3N
1
17
4
reposted by
Sara Hooker
Cohere Labs
about 1 year ago
"Anyone who is serious about what the next generation of models is knows it can't be the current" Thanks to
@baratunde.com
for hosting Head of Cohere For AI,
@sarahooker.bsky.social
on the latest episode of Life with Machines. Check out their full conversation on YouTube:
youtu.be/-BsobAoOJvk
loading . . .
Is AI on the Verge of a Meltdown? | Sara Hooker (Ep. 8)
YouTube video by Baratunde Thurston
https://youtu.be/-BsobAoOJvk
1
13
2
reposted by
Sara Hooker
Cohere Labs
about 1 year ago
On Scale AI's private multilingual protocol, Aya Expanse is indexed as the best open-weights model. Additionally, in some languages we're outperforming: ๐proprietary models ๐larger models โฐ๏ธmodels built by more researchers with more infrastructure Lots to be proud of today.
1
8
1
As the
@cohereforai.bsky.social
joins the Bluesky family โ we will be sharing paper gems from when we first started as a lab. This paper is part of a larger research agenda where we have focused on how to better represent the long tail = making AI work for almost all real world distributions.
add a skeleton here at some point
about 1 year ago
0
25
3
Last year we published a fantastic cross-institutional survey on efficiency techniques for language models. Comprehensive and a good starting pointing for researchers working on efficiency.
add a skeleton here at some point
about 1 year ago
0
9
1
reposted by
Sara Hooker
Cohere Labs
about 1 year ago
How do we do more ๐ with less ๐? In an era of ever larger models, work on efficiency is ever more important. This cross-institutional collaboration provides a survey of the field for practitioners and researchers alike โ๏ธ. ๐Learn more:
arxiv.org/pdf/2209.000...
0
3
2
reposted by
Sara Hooker
Cohere Labs
about 1 year ago
We are committed to making meaningful progress in machine learning research through open collaboration. Follow this ๐งตto stay on top of our research contributions.
58
18
2
Changing the spaces where AI breakthroughs happen. โจ Join us. ๐ฅ
add a skeleton here at some point
about 1 year ago
0
17
3
reposted by
Sara Hooker
Alice Oh
about 1 year ago
Bye Dagstuhl! Huge thanks to
@841io.bsky.social
et al for organizing, and
@kanarinka.bsky.social
@anitachan.bsky.social
@sarahooker.bsky.social
@haldaume3.bsky.social
and many others for eye opening discussions ๐
2
10
1
British gardens are beautiful even in the gloom
about 1 year ago
1
31
0
reposted by
Sara Hooker
Bob E Hayes
about 1 year ago
This is where the data to build #AI comes from | @melissahei.bsky.social @splendidsteph.bsky.social โWe are using these models all over the world, and thereโs a massive discrepancy between the world weโre seeing and whatโs invisible to these models." @sarahooker.bsky.social
www.technologyreview...
0
10
3
reposted by
Sara Hooker
Bob E Hayes
about 1 year ago
The Reality of #AI and Biorisk "We find that existing studies around AI-related biorisk are nascent, often speculative in nature, or limited in terms of their methodological maturity and transparency." ~ @sarahooker.bsky.social et al.
arxiv.org/abs/2412.0...
#AI
#GenerativeAI
loading . . .
The Reality of AI and Biorisk
To accurately and confidently answer the question 'could an AI model or system increase biorisk', it is necessary to have both a sound theoretical threat model for how AI models or systems could...
https://arxiv.org/abs/2412.01946?utm_source=bluesky&utm_medium=social&utm_campaign=fedica-AI-and-CX
0
5
2
reposted by
Sara Hooker
Ana Brandusescu
about 1 year ago
add a skeleton here at some point
2
120
65
reposted by
Sara Hooker
Bob E Hayes
about 1 year ago
The Reality of #AI and Biorisk "We find that existing studies around AI-related biorisk are nascent, often speculative in nature, or limited in terms of their methodological maturity and transparency." ~ @sarahooker.bsky.social et al.
arxiv.org/abs/2412.0...
#AI
#GenerativeAI
loading . . .
The Reality of AI and Biorisk
To accurately and confidently answer the question 'could an AI model or system increase biorisk', it is necessary to have both a sound theoretical threat model for how AI models or systems could...
https://arxiv.org/abs/2412.01946?utm_source=bluesky&utm_medium=social&utm_campaign=fedica-AI-and-CX
0
9
2
reposted by
Sara Hooker
Marzieh Fadaee
about 1 year ago
๐ Our mission to strengthen the multilingual open-source ecosystem continues!๐
add a skeleton here at some point
1
9
2
reposted by
Sara Hooker
Angelika Romanou
about 1 year ago
Introducing Global-MMLU๐: A multilingual benchmark featuring MMLU translations in 42 languages crafted with: โ Human curation โ Extensive metadata โ Insights into cultural sensitivity Proud to have collaborated with Shivalika Singh,
@sarahooker.bsky.social
and Cohere For AI!
add a skeleton here at some point
0
13
4
reposted by
Sara Hooker
Leshem (Legend) Choshen @EMNLP
about 1 year ago
You would think moral questions are universal, MMLU only asks about US morals... better translations and separation by sensitivity ๐๐
add a skeleton here at some point
0
13
4
reposted by
Sara Hooker
Anka Reuel โก๏ธ NeurIPS
about 1 year ago
Come join us! ๐
add a skeleton here at some point
0
7
2
reposted by
Sara Hooker
Shayne Longpre
about 1 year ago
Interested in how LLMs are really used? We are starting a research project to find out! In collaboration w/
@sarahooker.bsky.social
@ankareuel.bsky.social
and others. We are looking for two junior researchers to join us. Apply by Dec 15th!
forms.gle/H2o3cNCPdG8e...
loading . . .
Google Forms: Sign-in
Access Google Forms with a personal Google account or Google Workspace account (for business use).
https://forms.gle/H2o3cNCPdG8eDke57
0
15
2
Is MMLU Western-centric? ๐ค As part of a massive cross-institutional collaboration: ๐ฝFind MMLU is heavily overfit to western culture ๐ Professional annotation of cultural sensitivity data ๐ Release improved Global-MMLU 42 languages ๐ Paper:
arxiv.org/pdf/2412.03304
๐ Data:
hf.co/datasets/Coh...
about 1 year ago
7
59
19
reposted by
Sara Hooker
Aidan P
about 1 year ago
AI amplifying biorisk has been a major topic in policy & governance work. But does the available evidence match this level of attention? ๐ฆ โ ๏ธ Our new paper looks at the science underpinning ideas that AI could increase biorisks.
arxiv.org/abs/2412.01946
loading . . .
The Reality of AI and Biorisk
To accurately and confidently answer the question 'could an AI model or system increase biorisk', it is necessary to have both a sound theoretical threat model for how AI models or systems could incre...
https://arxiv.org/abs/2412.01946
1
16
3
reposted by
Sara Hooker
Hal Daumรฉ III
about 1 year ago
ยซย noย ยป definitely this was strongly my prior but itโs good to see this worked out to hopefully shape where investments go
add a skeleton here at some point
0
14
2
reposted by
Sara Hooker
Antoine Bosselut
about 1 year ago
Translating MMLU is great, but global users of multilingual
#LLMs
don't care all that much about an LLM's understanding of US Law! Our new
#NLProc
work centers multilingual
#LLM
evaluations toward regional knowledge in 44 languages.
add a skeleton here at some point
1
28
3
reposted by
Sara Hooker
Marzieh Fadaee
about 1 year ago
INCLUDE is a massive benchmark across 44 languages curated from 52 countries and includes both regional and cultural knowledge.
1
9
3
reposted by
Sara Hooker
Marzieh Fadaee
about 1 year ago
Good performance shouldnโt mean 'just in English' anymore ๐ชฉ We provide a robust way to assess models with a new benchmark that captures in-language nuances and cultural contexts.
add a skeleton here at some point
1
18
4
AI amplifying biorisk has been a major focus in AI policy & governance work. Is the spotlight merited? Our recent cross-institutional work asks: Does the available evidence match the current level of attention? ๐
arxiv.org/abs/2412.01946
about 1 year ago
2
59
13
I'll be at NeurIPS next week -- looking forward to seeing many of you there! Let the Vancouver cantonese and sushi food tour begin.
about 1 year ago
2
46
1
reposted by
Sara Hooker
Angelika Romanou
about 1 year ago
To build INCLUDE, we collected ~200K MCQ data from 44 languages and 58 knowledge domains, collected from local sources in 52 countries, representing a rich array of cultural and regional knowledge.
1
6
1
reposted by
Sara Hooker
Angelika Romanou
about 1 year ago
๐ค Why is regional knowledge so important? Users expect
#LLMs
to know information relevant to their environmentsโ customs, culture, etc. To be relevant & relatable, LLMs need to know these nuances. It's not just global knowledge; it's about meeting user needs where they are.
1
4
1
Load more
feeds!
log in