Martin Jaggi
@mjaggi.bsky.social
📤 821
📥 164
📝 41
Prof at EPFL AI • Climbing
reposted by
Martin Jaggi
13 days ago
Announcing the ICML 2026 policy for LLMs in reviewing! Reviewers and authors both pick either conservative or permissive LLM use, and will be matched accordingly. Importantly: authors on papers who choose conservative must obey the conservative policy as reviewers.
2
23
21
reposted by
Martin Jaggi
🤷 Nico Martin
about 1 month ago
👀 I am working on something pretty cool.. Hopefully, it will soon be possible to try
#Apertus
🇨🇭 directly in your browser, powered by Transformers.js 🎉
1
9
1
reposted by
Martin Jaggi
Alexander Doria
28 days ago
The threshold for consistent English/query understanding is now 3M parameters.
3
58
4
reposted by
Martin Jaggi
Alexander Doria
about 1 month ago
Breaking: we release a fully synthetic generalist dataset for pretraining, SYNTH and two new SOTA reasoning models exclusively trained on it. Despite having seen only 200 billion tokens, Baguettotron is currently best-in-class in its size range.
pleias.fr/blog/blogsyn...
3
181
51
reposted by
Martin Jaggi
about 2 months ago
🎉 ICML 2026 Call for Papers (& Position Papers) is here! 🎉 📅 Key Dates Abstract deadline: Jan 23, 2026 AOE Paper deadline: Jan 28, 2026 AOE A few key changes this year: - Attendance for authors of accepted papers is optional - Originally submitted version of accepted papers will be made public ...
1
14
11
so open-weights models are much happier than closed ones i guess, cause they live on in the long run, did i get that right?
add a skeleton here at some point
about 2 months ago
0
2
0
91% of reasoning does not need RL 🤯
arxiv.org/abs/2510.07364
loading . . .
Base Models Know How to Reason, Thinking Models Learn When
Why do thinking language models like DeepSeek R1 outperform their base counterparts? Despite consistent performance gains, it remains unclear to what extent thinking models learn entirely new reasonin...
https://arxiv.org/abs/2510.07364
2 months ago
1
8
0
reposted by
Martin Jaggi
Simon Willison
3 months ago
I just tried the official demo for the new Gemini 2.5 Computer Use model and it started by navigating to Google, solving Google's own CAPTCHA and then running a search!
https://simonwillison.net/2025/Oct/7/gemini-25-computer-use-captchas/
loading . . .
Gemini 2.5 Computer Use can solve Google’s own CAPTCHAs
Google just introduced a new Gemini 2.5 Computer Use model, specially designed to help operate a GUI interface by interacting with visible elements using a virtual mouse and keyboard. I …
https://simonwillison.net/2025/Oct/7/gemini-25-computer-use-captchas/
2
9
4
We're hiring again for AI research engineering roles: Join the team behind the Apertus LLM, if you share our passion to work on impactful AI that's truly open.
careers.epfl.ch/job/Lausanne...
loading . . .
AI Research Engineers - Swiss AI Initiative
AI Research Engineers - Swiss AI Initiative
https://careers.epfl.ch/job/Lausanne-AI-Research-Engineers-Swiss-AI-Initiative/1163395655/
3 months ago
2
5
4
reposted by
Martin Jaggi
Deniz Bayazit
3 months ago
1/🚨 New preprint How do
#LLMs
’ inner features change as they train? Using
#crosscoders
+ a new causal metric, we map when features appear, strengthen, or fade across checkpoints—opening a new lens on training dynamics beyond loss curves & benchmarks.
#interpretability
2
14
6
reposted by
Martin Jaggi
heise online
3 months ago
Schweizer Sprachmodell Apertus: So sieht EU-konforme, transparente KI aus https://www.heise.de/hintergrund/Schweizer-Sprachmodell-Apertus-So-sieht-EU-konforme-transparente-KI-aus-10638501.html?utm_source=flipboard&utm_medium=activitypub Gepostet in Nachrichten
@nachrichten-heiseonline
loading . . .
Schweizer Sprachmodell Apertus: So sieht EU-konforme, transparente KI aus
Vielsprachigkeit, Transparenz, Respekt vor geistigem Eigentum: Das offene große Sprachmodell aus Schweizer KI-Schmieden verinnerlicht europäische Werte.
https://www.heise.de/hintergrund/Schweizer-Sprachmodell-Apertus-So-sieht-EU-konforme-transparente-KI-aus-10638501.html?utm_source=flipboard&utm_medium=activitypub
0
1
1
reposted by
Martin Jaggi
Sung Kim
4 months ago
Hugging Face's FinePDFs The largest publicly available corpus sourced exclusively from PDFs, containing about 3 trillion tokens across 475 million documents in 1733 languages. - Long context - 3T tokens from high-demand domains like legal and science. - Heavily improves over SoTA
1
30
4
you can run the new apertus LLMs fully locally on your (mac) laptop with just 2 lines of code: pip install mlx-lm mlx_lm.generate --model swiss-ai/Apertus-8B-Instruct-2509 --prompt "wer bisch du?" (make sure you have done huggingface-cli login before)
loading . . .
Apertus LLM - a swiss-ai Collection
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/collections/swiss-ai/apertus-llm-68b699e65415c231ace3b059
4 months ago
3
9
4
reposted by
Martin Jaggi
Adrienne Fichter
4 months ago
Am Schluss müssen sich die Medienverlage eine gesonderte Lösung überlegen, da sie kaum für alle Schweizer Blogger, Firmenwebsites, Künstler:innen,Gesundheitsportalen, eCommerce-Plattformen sprechen können. WBK N will weder Opt Out noch Opt In festschreiben.
0
7
1
reposted by
Martin Jaggi
Antoine Bosselut
4 months ago
The next generation of open LLMs should be inclusive, compliant, and multilingual by design. That’s why we
@icepfl.bsky.social
@ethz.ch
@cscsch.bsky.social
) built Apertus.
add a skeleton here at some point
2
25
10
new extensive evaluation of different optimizers for LLM training
arxiv.org/abs/2509.01440
loading . . .
Benchmarking Optimizers for Large Language Model Pretraining
The recent development of Large Language Models (LLMs) has been accompanied by an effervescence of novel ideas and methods to better optimize the loss of deep learning models. Claims from those method...
https://arxiv.org/abs/2509.01440
4 months ago
0
3
2
reposted by
Martin Jaggi
Reto Vogt
4 months ago
Die Schweiz steigt ins Rennen der grossen Sprachmodelle ein. Unter dem Namen
#Apertus
veröffentlichen
@ethz.ch
,
@icepfl.bsky.social
und das
@cscsch.bsky.social
das erste vollständig offene, mehrsprachige
#LLM
des Landes. Fürs MAZ habe ich Apertus kurz analysiert:
www.maz.ch/news/apertus...
loading . . .
Apertus: ein neues Sprachmodell für die Schweiz
https://www.maz.ch/news/apertus-ein-neues-sprachmodell-fuer-die-schweiz
3
25
7
reposted by
Martin Jaggi
EPFL School of Computer and Communication Sciences
4 months ago
EPFL, ETH Zurich & CSCS just released Apertus, Switzerland’s first fully open-source large language model. Trained on 15T tokens in 1,000+ languages, it’s built for transparency, responsibility & the public good. Read more:
actu.epfl.ch/news/apertus...
1
54
35
reposted by
Martin Jaggi
EPFL AI Center
6 months ago
EPFL and ETH Zürich are building together a Swiss made LLM from scratch. Fully open and multilingual, the model is trained on CSCS's supercomputer "Alps" and supports sovereign, transparent, and responsible AI in Switzerland and beyond. Read more here:
ai.epfl.ch/a-language-m...
#ResponsibleAI
loading . . .
A language model built for the public good - EPFL AI Center
ETH Zurich and EPFL will release a large language model (LLM) developed on public infrastructure. Trained on the “Alps” supercomputer at the Swiss National Supercomputing Centre (CSCS), the new LLM ma...
https://ai.epfl.ch/a-language-model-built-for-the-public-good/
0
10
3
huggingface.co/blog/smollm3
loading . . .
SmolLM3: smol, multilingual, long-context reasoner
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/blog/smollm3
6 months ago
0
2
0
loading . . .
6 months ago
0
6
0
reposted by
Martin Jaggi
zeynep tufekci
7 months ago
Why did Grok suddenly start talking about “white genocide in South Africa” even if asked about baseball or cute dogs? Because someone at Musk’s xAi deliberately did this, and we only found out because they were clumsy. My piece on the real dangers of AI. Gift link:
www.nytimes.com/2025/05/17/o...
add a skeleton here at some point
8
245
120
reposted by
Martin Jaggi
Angelika Romanou
8 months ago
If you’re at
@iclr-conf.bsky.social
this week, come check out our spotlight poster INCLUDE during the Thursday 3:00–5:30pm session! I will be there to chat about all things multilingual & multicultural evaluation. Feel free to reach out anytime during the conference. I’d love to connect!
add a skeleton here at some point
0
4
3
Using the 'right' data can hugely speed up LLM training, but how to find the best training data in the vast sea of a whole web crawl? We propose a simple classifier-based selection, enabling multilingual LLMs 🧵
8 months ago
1
7
2
Dion: A Communication-Efficient Optimizer for Large Models (inspired by PowerSGD)
arxiv.org/abs/2504.05295
loading . . .
Dion: A Communication-Efficient Optimizer for Large Models
Training large AI models efficiently requires distributing computation across multiple accelerators, but this often incurs significant communication overhead -- especially during gradient synchronizat...
https://arxiv.org/abs/2504.05295
8 months ago
0
1
0
reposted by
Martin Jaggi
Sung Kim
8 months ago
Prime Intellect's INTELLECT-2 The first decentralized 32B-parameter RL training run open to join for anyone with compute — fully permissionless.
www.primeintellect.ai/blog/intelle...
loading . . .
0
7
4
Anastasia
@koloskova.bsky.social
recently won the European
@ellis.eu
PhD award, for her amazing work on AI and optimization. She will be joining University of Zurich as a professor this summer, and hiring PhD students and postdocs. You should apply to her group! Her website:
koloskova.github.io
loading . . .
Anastasia Koloskova
Anastasia Koloskova, PhD student in Machine Learning at EPFL.
https://koloskova.github.io/
10 months ago
0
9
1
The Swiss AI Initiative has launched open calls for disruptive ideas - Democratizing large-scale AI for the benefit of society. Send your idea by end of March 🏃♂️➡️ , and run on one of the largest public AI clusters globally. Everyone is eligible to apply!
swiss-ai.org
10 months ago
0
19
12
excellent starting point for pretraining recipes:
arxiv.org/abs/2502.02737
loading . . .
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
While large language models have facilitated breakthroughs in many applications of artificial intelligence, their inherent largeness makes them computationally expensive and challenging to deploy in r...
https://arxiv.org/abs/2502.02737
10 months ago
0
3
0
new open weights, 24B model, with comparable performance to Llama 3.3 70B 😮. congrats mistral team!
mistral.ai/news/mistral...
11 months ago
0
12
2
you reached the end!!
feeds!
log in