Tom Hosking
@tomhosking.bsky.social
📤 152
📥 123
📝 18
NLP
@cohere.com
. Prev University of Edinburgh
🚨 New @iclr_conf paper! 🚨 Learning is Forgetting: LLM Training As Lossy Compression by
@henryconklin.bsky.social
,
@tomhosking.bsky.social
, Yi-Chern Tan, Julian Gold, Jonathan Cohen,
@cocoscilab.bsky.social
,
@maxbartolo.bsky.social
and
@seraphinagt.bsky.social
Arxiv:
arxiv.org/abs/2604.075...
🧵
about 1 month ago
1
1
2
reposted by
Tom Hosking
Cohere Labs
3 months ago
Introducing ✨Tiny Aya✨, a family of massively multilingual small language models built to run where people actually are. Tiny Aya delivers strong multilingual performance in 70+ global languages in a 3.35B parameter model, efficient enough to run locally, even on a phone.
loading . . .
2
97
20
I'm looking for a research intern to work with me @ Cohere on a project related to model merging, meta-learning, RLVR and generalisation for LLMs. If you're interested, send me a message at
[email protected]
and apply for the role here:
jobs.ashbyhq.com/cohere/6e850...
loading . . .
Research Internship (Winter 2026)
Who are we? Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experience...
https://jobs.ashbyhq.com/cohere/6e850172-a79d-4128-abd2-677731312857/application
5 months ago
0
0
0
reposted by
Tom Hosking
Alice Hosking
6 months ago
Thank you to the
#uksf
for giving me the opportunity to present the results of our study looking at outcomes after stroke for different stroke locations. Watch this space for the full paper!
1
0
1
reposted by
Tom Hosking
Ashutosh Adhikari
7 months ago
Excited to share my first work as a PhD student at EdinburghNLP that I will be presenting at EMNLP! RQ1: Can we achieve scalable oversight across modalities via debate? Yes! We show that debating VLMs lead to better model quality of answers for reasoning tasks.
1
2
2
reposted by
Tom Hosking
Tom Kocmi
9 months ago
🚀 Thrilled to share what I’ve been working on at Cohere! What began in January as a scribble in my notebook “how challenging would it be...” turned into a fully-fledged translation model that outperforms both open and closed-source systems, including long-standing MT leaders.
1
5
1
reposted by
Tom Hosking
Cohere Labs
9 months ago
Applications are now open for the next cohort of the Cohere Labs Scholars Program! 🌟 This is your chance to collaborate with some of the brightest minds in AI & chart new courses in ML research. Let's change the spaces breakthroughs happen. Apply by Aug 29.
1
2
3
reposted by
Tom Hosking
Agostina Calabrese
10 months ago
At
#ACL2025NLP
and on the job market (NLP + AI Safety) 💼 It's great to see growing interest in safety/alignment, but we often miss the social context. Come to our
@woahworkshop.bsky.social
Friday to dive deeper into safe safety research! A quiet token from the biggest
@aclmeeting.bsky.social
⬇️
0
13
4
reposted by
Tom Hosking
Charles Louis Richter
12 months ago
DAVE: Open the podbay doors, ChatGPT. CHATGPT: Certainly, Dave, the podbay doors are now open. DAVE: The podbay doors didn't open. CHATGPT: My apologies, Dave, you're right. I thought the podbay doors were open, but they weren't. Now they are. DAVE: I'm still looking at a set of closed podbay doors.
113
11033
2773
reposted by
Tom Hosking
Nathan Lambert
about 1 year ago
A very cool paper shows that you can use the RL loss to improve story generation by some clever setups on training on known texts (e.g. ground predictions versus a next chapter you know). RL starting to generalize already!
loading . . .
Learning to Reason for Long-Form Story Generation
Generating high-quality stories spanning thousands of tokens requires competency across a variety of skills, from tracking plot and character arcs to keeping a consistent and engaging style. Due to…
https://buff.ly/y7lI3S5
0
33
8
I'm really proud to have led the model merging work that went into
@cohere.com
Command A and R7B, all made possible by an amazing group of collaborators. Check out the report for loads of details on how we trained a GPT-4o level model that fits on 2xH100!
add a skeleton here at some point
about 1 year ago
0
5
0
reposted by
Tom Hosking
about 1 year ago
Today (two weeks after model launch 🔥) we're releasing a technical report of how we made Command A and R7B 🚀! It has detailed breakdowns of our training process, and evaluations per capability (tools, multilingual, code, reasoning, safety, enterprise, long context)🧵 1/3.
1
4
3
reposted by
Tom Hosking
Max Bartolo
about 1 year ago
I'm excited to share the tech report for our
@cohere.com
@cohereforai.bsky.social
Command A and Command R7B models. We highlight our novel approach to model training including self-refinement algorithms and model merging techniques at scale. Read more below! ⬇️
1
10
7
reposted by
Tom Hosking
Max Bartolo
about 1 year ago
I really enjoyed my MLST chat with Tim
@neuripsconf.bsky.social
about the research we've been doing on reasoning, robustness and human feedback. If you have an hour to spare and are interested in AI robustness, it may be worth a listen 🎧 Check it out at
youtu.be/DL7qwmWWk88?...
0
7
3
reposted by
Tom Hosking
BetaKit
about 1 year ago
Is it Canada’s turn for a
#DeepSeek
moment?
@Cohere.com
says its latest model offers maximum performance with minimal compute.
#CDNtech
loading . . .
Cohere says Command A model edges out LLM competition in speed and energy efficiency
New enterprise AI model outperforms DeepSeek, ChatGPT on several enterprise-specific tasks, company says.
https://betakit.com/cohere-says-command-a-model-edges-out-llm-competition-in-speed-and-energy-efficiency/
0
2
2
reposted by
Tom Hosking
Florent Daudens
about 1 year ago
🚀 Cohere just dropped C4AI Command A: - 111B params - Matches/beats GPT-40 & Deepseek V3 - 256K context window - Needs just 2 GPUs(!!) ✨ Features: - Advanced RAG w/citations - Tool use - 23 languages 🎯 Same quality, way less compute 🔓 Open weights (CC-BY-NC) 👉
huggingface.co/CohereForAI/...
loading . . .
CohereForAI/c4ai-command-a-03-2025 · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/CohereForAI/c4ai-command-a-03-2025
1
10
2
reposted by
Tom Hosking
Rohit Saxena
about 1 year ago
Can multimodal LLMs truly understand research poster images?📊 🚀 We introduce PosterSum—a new multimodal benchmark for scientific poster summarization! 📂 Dataset:
huggingface.co/datasets/rohitsaxena/PosterSum
📜 Paper:
arxiv.org/abs/2502.17540
1
8
4
reposted by
Tom Hosking
Lisa Alazraki
over 1 year ago
Do LLMs need rationales for learning from mistakes? 🤔 When LLMs learn from previous incorrect answers, they typically observe corrective feedback in the form of rationales explaining each mistake. In our new preprint, we find these rationales do not help, in fact they hurt performance! 🧵
1
21
12
reposted by
Tom Hosking
Laura
over 1 year ago
How do LLMs learn to reason from data? Are they ~retrieving the answers from parametric knowledge🦜? In our new preprint, we look at the pretraining data and find evidence against this: Procedural knowledge in pretraining drives LLM reasoning ⚙️🔢 🧵⬇️
36
852
162
you reached the end!!
feeds!
log in