Martin Jaggi
@mjaggi.bsky.social
📤 799
📥 160
📝 29
Prof at EPFL AI • Climbing
reposted by
Martin Jaggi
Sung Kim
15 days ago
Hugging Face's FinePDFs The largest publicly available corpus sourced exclusively from PDFs, containing about 3 trillion tokens across 475 million documents in 1733 languages. - Long context - 3T tokens from high-demand domains like legal and science. - Heavily improves over SoTA
1
30
4
you can run the new apertus LLMs fully locally on your (mac) laptop with just 2 lines of code: pip install mlx-lm mlx_lm.generate --model swiss-ai/Apertus-8B-Instruct-2509 --prompt "wer bisch du?" (make sure you have done huggingface-cli login before)
loading . . .
Apertus LLM - a swiss-ai Collection
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/collections/swiss-ai/apertus-llm-68b699e65415c231ace3b059
17 days ago
3
8
4
reposted by
Martin Jaggi
Adrienne Fichter
17 days ago
Am Schluss müssen sich die Medienverlage eine gesonderte Lösung überlegen, da sie kaum für alle Schweizer Blogger, Firmenwebsites, Künstler:innen,Gesundheitsportalen, eCommerce-Plattformen sprechen können. WBK N will weder Opt Out noch Opt In festschreiben.
0
7
1
reposted by
Martin Jaggi
Antoine Bosselut
19 days ago
The next generation of open LLMs should be inclusive, compliant, and multilingual by design. That’s why we
@icepfl.bsky.social
@ethz.ch
@cscsch.bsky.social
) built Apertus.
add a skeleton here at some point
2
21
7
new extensive evaluation of different optimizers for LLM training
arxiv.org/abs/2509.01440
loading . . .
Benchmarking Optimizers for Large Language Model Pretraining
The recent development of Large Language Models (LLMs) has been accompanied by an effervescence of novel ideas and methods to better optimize the loss of deep learning models. Claims from those method...
https://arxiv.org/abs/2509.01440
19 days ago
0
3
2
reposted by
Martin Jaggi
Reto Vogt
20 days ago
Die Schweiz steigt ins Rennen der grossen Sprachmodelle ein. Unter dem Namen
#Apertus
veröffentlichen
@ethz.ch
,
@icepfl.bsky.social
und das
@cscsch.bsky.social
das erste vollständig offene, mehrsprachige
#LLM
des Landes. Fürs MAZ habe ich Apertus kurz analysiert:
www.maz.ch/news/apertus...
loading . . .
Apertus: ein neues Sprachmodell für die Schweiz
https://www.maz.ch/news/apertus-ein-neues-sprachmodell-fuer-die-schweiz
3
25
8
reposted by
Martin Jaggi
EPFL School of Computer and Communication Sciences
20 days ago
EPFL, ETH Zurich & CSCS just released Apertus, Switzerland’s first fully open-source large language model. Trained on 15T tokens in 1,000+ languages, it’s built for transparency, responsibility & the public good. Read more:
actu.epfl.ch/news/apertus...
1
53
32
reposted by
Martin Jaggi
EPFL AI Center
3 months ago
EPFL and ETH Zürich are building together a Swiss made LLM from scratch. Fully open and multilingual, the model is trained on CSCS's supercomputer "Alps" and supports sovereign, transparent, and responsible AI in Switzerland and beyond. Read more here:
ai.epfl.ch/a-language-m...
#ResponsibleAI
loading . . .
A language model built for the public good - EPFL AI Center
ETH Zurich and EPFL will release a large language model (LLM) developed on public infrastructure. Trained on the “Alps” supercomputer at the Swiss National Supercomputing Centre (CSCS), the new LLM ma...
https://ai.epfl.ch/a-language-model-built-for-the-public-good/
0
10
3
huggingface.co/blog/smollm3
loading . . .
SmolLM3: smol, multilingual, long-context reasoner
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/blog/smollm3
3 months ago
0
2
0
loading . . .
3 months ago
0
6
0
reposted by
Martin Jaggi
zeynep tufekci
4 months ago
Why did Grok suddenly start talking about “white genocide in South Africa” even if asked about baseball or cute dogs? Because someone at Musk’s xAi deliberately did this, and we only found out because they were clumsy. My piece on the real dangers of AI. Gift link:
www.nytimes.com/2025/05/17/o...
add a skeleton here at some point
8
248
123
reposted by
Martin Jaggi
Angelika Romanou
5 months ago
If you’re at
@iclr-conf.bsky.social
this week, come check out our spotlight poster INCLUDE during the Thursday 3:00–5:30pm session! I will be there to chat about all things multilingual & multicultural evaluation. Feel free to reach out anytime during the conference. I’d love to connect!
add a skeleton here at some point
0
4
3
Using the 'right' data can hugely speed up LLM training, but how to find the best training data in the vast sea of a whole web crawl? We propose a simple classifier-based selection, enabling multilingual LLMs 🧵
5 months ago
1
7
2
Dion: A Communication-Efficient Optimizer for Large Models (inspired by PowerSGD)
arxiv.org/abs/2504.05295
loading . . .
Dion: A Communication-Efficient Optimizer for Large Models
Training large AI models efficiently requires distributing computation across multiple accelerators, but this often incurs significant communication overhead -- especially during gradient synchronizat...
https://arxiv.org/abs/2504.05295
5 months ago
0
1
0
reposted by
Martin Jaggi
Sung Kim
5 months ago
Prime Intellect's INTELLECT-2 The first decentralized 32B-parameter RL training run open to join for anyone with compute — fully permissionless.
www.primeintellect.ai/blog/intelle...
loading . . .
0
7
4
Anastasia
@koloskova.bsky.social
recently won the European
@ellis.eu
PhD award, for her amazing work on AI and optimization. She will be joining University of Zurich as a professor this summer, and hiring PhD students and postdocs. You should apply to her group! Her website:
koloskova.github.io
loading . . .
Anastasia Koloskova
Anastasia Koloskova, PhD student in Machine Learning at EPFL.
https://koloskova.github.io/
7 months ago
0
9
1
The Swiss AI Initiative has launched open calls for disruptive ideas - Democratizing large-scale AI for the benefit of society. Send your idea by end of March 🏃♂️➡️ , and run on one of the largest public AI clusters globally. Everyone is eligible to apply!
swiss-ai.org
7 months ago
0
19
12
excellent starting point for pretraining recipes:
arxiv.org/abs/2502.02737
loading . . .
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
While large language models have facilitated breakthroughs in many applications of artificial intelligence, their inherent largeness makes them computationally expensive and challenging to deploy in r...
https://arxiv.org/abs/2502.02737
7 months ago
0
3
0
new open weights, 24B model, with comparable performance to Llama 3.3 70B 😮. congrats mistral team!
mistral.ai/news/mistral...
8 months ago
0
12
2
you reached the end!!
feeds!
log in