Martin Jaggi
@mjaggi.bsky.social
📤 811
📥 160
📝 38
Prof at EPFL AI • Climbing
so open-weights models are much happier than closed ones i guess, cause they live on in the long run, did i get that right?
add a skeleton here at some point
2 days ago
0
1
0
91% of reasoning does not need RL 🤯
arxiv.org/abs/2510.07364
loading . . .
Base Models Know How to Reason, Thinking Models Learn When
Why do thinking language models like DeepSeek R1 outperform their base counterparts? Despite consistent performance gains, it remains unclear to what extent thinking models learn entirely new reasonin...
https://arxiv.org/abs/2510.07364
24 days ago
1
7
0
reposted by
Martin Jaggi
Simon Willison
about 1 month ago
I just tried the official demo for the new Gemini 2.5 Computer Use model and it started by navigating to Google, solving Google's own CAPTCHA and then running a search!
https://simonwillison.net/2025/Oct/7/gemini-25-computer-use-captchas/
loading . . .
Gemini 2.5 Computer Use can solve Google’s own CAPTCHAs
Google just introduced a new Gemini 2.5 Computer Use model, specially designed to help operate a GUI interface by interacting with visible elements using a virtual mouse and keyboard. I …
https://simonwillison.net/2025/Oct/7/gemini-25-computer-use-captchas/
2
9
4
We're hiring again for AI research engineering roles: Join the team behind the Apertus LLM, if you share our passion to work on impactful AI that's truly open.
careers.epfl.ch/job/Lausanne...
loading . . .
AI Research Engineers - Swiss AI Initiative
AI Research Engineers - Swiss AI Initiative
https://careers.epfl.ch/job/Lausanne-AI-Research-Engineers-Swiss-AI-Initiative/1163395655/
about 1 month ago
2
5
4
reposted by
Martin Jaggi
Deniz Bayazit
about 1 month ago
1/🚨 New preprint How do
#LLMs
’ inner features change as they train? Using
#crosscoders
+ a new causal metric, we map when features appear, strengthen, or fade across checkpoints—opening a new lens on training dynamics beyond loss curves & benchmarks.
#interpretability
2
14
6
reposted by
Martin Jaggi
heise online
about 1 month ago
Schweizer Sprachmodell Apertus: So sieht EU-konforme, transparente KI aus https://www.heise.de/hintergrund/Schweizer-Sprachmodell-Apertus-So-sieht-EU-konforme-transparente-KI-aus-10638501.html?utm_source=flipboard&utm_medium=activitypub Gepostet in Nachrichten
@nachrichten-heiseonline
loading . . .
Schweizer Sprachmodell Apertus: So sieht EU-konforme, transparente KI aus
Vielsprachigkeit, Transparenz, Respekt vor geistigem Eigentum: Das offene große Sprachmodell aus Schweizer KI-Schmieden verinnerlicht europäische Werte.
https://www.heise.de/hintergrund/Schweizer-Sprachmodell-Apertus-So-sieht-EU-konforme-transparente-KI-aus-10638501.html?utm_source=flipboard&utm_medium=activitypub
0
1
1
reposted by
Martin Jaggi
Sung Kim
2 months ago
Hugging Face's FinePDFs The largest publicly available corpus sourced exclusively from PDFs, containing about 3 trillion tokens across 475 million documents in 1733 languages. - Long context - 3T tokens from high-demand domains like legal and science. - Heavily improves over SoTA
1
30
4
you can run the new apertus LLMs fully locally on your (mac) laptop with just 2 lines of code: pip install mlx-lm mlx_lm.generate --model swiss-ai/Apertus-8B-Instruct-2509 --prompt "wer bisch du?" (make sure you have done huggingface-cli login before)
loading . . .
Apertus LLM - a swiss-ai Collection
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/collections/swiss-ai/apertus-llm-68b699e65415c231ace3b059
2 months ago
3
8
4
reposted by
Martin Jaggi
Adrienne Fichter
2 months ago
Am Schluss müssen sich die Medienverlage eine gesonderte Lösung überlegen, da sie kaum für alle Schweizer Blogger, Firmenwebsites, Künstler:innen,Gesundheitsportalen, eCommerce-Plattformen sprechen können. WBK N will weder Opt Out noch Opt In festschreiben.
0
7
1
reposted by
Martin Jaggi
Antoine Bosselut
2 months ago
The next generation of open LLMs should be inclusive, compliant, and multilingual by design. That’s why we
@icepfl.bsky.social
@ethz.ch
@cscsch.bsky.social
) built Apertus.
add a skeleton here at some point
2
25
10
new extensive evaluation of different optimizers for LLM training
arxiv.org/abs/2509.01440
loading . . .
Benchmarking Optimizers for Large Language Model Pretraining
The recent development of Large Language Models (LLMs) has been accompanied by an effervescence of novel ideas and methods to better optimize the loss of deep learning models. Claims from those method...
https://arxiv.org/abs/2509.01440
2 months ago
0
3
2
reposted by
Martin Jaggi
Reto Vogt
2 months ago
Die Schweiz steigt ins Rennen der grossen Sprachmodelle ein. Unter dem Namen
#Apertus
veröffentlichen
@ethz.ch
,
@icepfl.bsky.social
und das
@cscsch.bsky.social
das erste vollständig offene, mehrsprachige
#LLM
des Landes. Fürs MAZ habe ich Apertus kurz analysiert:
www.maz.ch/news/apertus...
loading . . .
Apertus: ein neues Sprachmodell für die Schweiz
https://www.maz.ch/news/apertus-ein-neues-sprachmodell-fuer-die-schweiz
3
25
8
reposted by
Martin Jaggi
EPFL School of Computer and Communication Sciences
2 months ago
EPFL, ETH Zurich & CSCS just released Apertus, Switzerland’s first fully open-source large language model. Trained on 15T tokens in 1,000+ languages, it’s built for transparency, responsibility & the public good. Read more:
actu.epfl.ch/news/apertus...
1
54
34
reposted by
Martin Jaggi
EPFL AI Center
4 months ago
EPFL and ETH Zürich are building together a Swiss made LLM from scratch. Fully open and multilingual, the model is trained on CSCS's supercomputer "Alps" and supports sovereign, transparent, and responsible AI in Switzerland and beyond. Read more here:
ai.epfl.ch/a-language-m...
#ResponsibleAI
loading . . .
A language model built for the public good - EPFL AI Center
ETH Zurich and EPFL will release a large language model (LLM) developed on public infrastructure. Trained on the “Alps” supercomputer at the Swiss National Supercomputing Centre (CSCS), the new LLM ma...
https://ai.epfl.ch/a-language-model-built-for-the-public-good/
0
10
3
huggingface.co/blog/smollm3
loading . . .
SmolLM3: smol, multilingual, long-context reasoner
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/blog/smollm3
4 months ago
0
2
0
loading . . .
5 months ago
0
6
0
reposted by
Martin Jaggi
zeynep tufekci
6 months ago
Why did Grok suddenly start talking about “white genocide in South Africa” even if asked about baseball or cute dogs? Because someone at Musk’s xAi deliberately did this, and we only found out because they were clumsy. My piece on the real dangers of AI. Gift link:
www.nytimes.com/2025/05/17/o...
add a skeleton here at some point
8
247
122
reposted by
Martin Jaggi
Angelika Romanou
7 months ago
If you’re at
@iclr-conf.bsky.social
this week, come check out our spotlight poster INCLUDE during the Thursday 3:00–5:30pm session! I will be there to chat about all things multilingual & multicultural evaluation. Feel free to reach out anytime during the conference. I’d love to connect!
add a skeleton here at some point
0
4
3
Using the 'right' data can hugely speed up LLM training, but how to find the best training data in the vast sea of a whole web crawl? We propose a simple classifier-based selection, enabling multilingual LLMs 🧵
7 months ago
1
7
2
Dion: A Communication-Efficient Optimizer for Large Models (inspired by PowerSGD)
arxiv.org/abs/2504.05295
loading . . .
Dion: A Communication-Efficient Optimizer for Large Models
Training large AI models efficiently requires distributing computation across multiple accelerators, but this often incurs significant communication overhead -- especially during gradient synchronizat...
https://arxiv.org/abs/2504.05295
7 months ago
0
1
0
reposted by
Martin Jaggi
Sung Kim
7 months ago
Prime Intellect's INTELLECT-2 The first decentralized 32B-parameter RL training run open to join for anyone with compute — fully permissionless.
www.primeintellect.ai/blog/intelle...
loading . . .
0
7
4
Anastasia
@koloskova.bsky.social
recently won the European
@ellis.eu
PhD award, for her amazing work on AI and optimization. She will be joining University of Zurich as a professor this summer, and hiring PhD students and postdocs. You should apply to her group! Her website:
koloskova.github.io
loading . . .
Anastasia Koloskova
Anastasia Koloskova, PhD student in Machine Learning at EPFL.
https://koloskova.github.io/
8 months ago
0
9
1
The Swiss AI Initiative has launched open calls for disruptive ideas - Democratizing large-scale AI for the benefit of society. Send your idea by end of March 🏃♂️➡️ , and run on one of the largest public AI clusters globally. Everyone is eligible to apply!
swiss-ai.org
8 months ago
0
19
12
excellent starting point for pretraining recipes:
arxiv.org/abs/2502.02737
loading . . .
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
While large language models have facilitated breakthroughs in many applications of artificial intelligence, their inherent largeness makes them computationally expensive and challenging to deploy in r...
https://arxiv.org/abs/2502.02737
8 months ago
0
3
0
new open weights, 24B model, with comparable performance to Llama 3.3 70B 😮. congrats mistral team!
mistral.ai/news/mistral...
9 months ago
0
12
2
you reached the end!!
feeds!
log in