Pierre Ablin
@pierreablin.bsky.social
๐ค 257
๐ฅ 216
๐ 4
Research scientist at Apple | machine learning, optimization, language modeling pierreablin.com
reposted by
Pierre Ablin
Preetum Nakkiran
10 months ago
Paper๐งต (cross-posted at X): When does composition of diffusion models "work"? Intuitively, the reason dog+hat works and dog+horse doesnโt has something to do with independence between the concepts being composed. The tricky part is to formalize exactly what this means. 1/
add a skeleton here at some point
2
39
17
reposted by
Pierre Ablin
Fabian Schaipp
11 months ago
Learning rate schedules seem mysterious? Why is the loss going down so fast during cooldown? Turns out that this behaviour can be described with a bound from *convex, nonsmooth* optimization. A short thread on our latest paper ๐
arxiv.org/abs/2501.18965
loading . . .
The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training
We show that learning-rate schedules for large model training behave surprisingly similar to a performance bound from non-smooth convex optimization theory. We provide a bound for the constant schedul...
https://arxiv.org/abs/2501.18965
2
31
6
Excited to share Soup-of-Experts, a new neural network architecture that, for any given specific task, can instantiate in a flash a small model that is very good on it. Made with โค๏ธ at Apple Thanks to my co-authors David Grangier, Angelos Katharopoulos, and Skyler Seto!
arxiv.org/abs/2502.01804
11 months ago
0
12
4
reposted by
Pierre Ablin
Mathieu Blondel
11 months ago
Really proud of these two companion papers by our team at GDM: 1) Joint Learning of Energy-based Models and their Partition Function
arxiv.org/abs/2501.18528
2) Loss Functions and Operators Generated by f-Divergences
arxiv.org/abs/2501.18537
A thread.
1
14
4
reposted by
Pierre Ablin
Valรฉrie Castin
11 months ago
How do tokens evolve as they are processed by a deep Transformer? With Josรฉ A. Carrillo,
@gabrielpeyre.bsky.social
and
@pierreablin.bsky.social
, we tackle this in our new preprint: A Unified Perspective on the Dynamics of Deep Transformers
arxiv.org/abs/2501.18322
ML and PDE lovers, check it out!
loading . . .
2
95
16
reposted by
Pierre Ablin
Samuel Vaiter
11 months ago
Byte Pair Encoding is a tokenization method that starts with all characters as initial tokens. It iteratively merges the most frequent adjacent byte pairs in the text, adding new tokens to the vocabulary until reaching a predefined size. The output is a sequence of tokens.
https://buff.ly/42oG80f
loading . . .
1
14
3
reposted by
Pierre Ablin
Gaรซl Varoquaux
11 months ago
๐ ๐ซ We are opening post-doc positions at the intersection of AI, data science, and medicine: โข Large Language Models for French medical texts โข Evaluating digital medical devices: statistics and causal inference
1
27
16
Mixture of experts are all the rage when it comes to shipping low-latency LLMs. Check out this awesome work by Samira et al. about scaling laws for mixture of experts !
add a skeleton here at some point
11 months ago
0
3
0
reposted by
Pierre Ablin
Samira
11 months ago
๐จ One question that has always intrigued me is the role of different ways to increase a model's capacity: parameters, parallelizable compute, or sequential compute? We explored this through the lens of MoEs:
1
18
11
reposted by
Pierre Ablin
Pau Rodriguez
about 1 year ago
Thrilled to share the latest work from our team at @Apple where we achieve interpretable and fine-grained control of LLMs and Diffusion models via Activation Transport ๐ฅ ๐
arxiv.org/abs/2410.23054
๐ ๏ธ
github.com/apple/ml-act
0/9 ๐งต
3
47
20
Excited to see Sigmoid Attention accepted at ICLR 2025 !! Make attention ~18% faster with a drop-in replacement ๐ Code:
github.com/apple/ml-sig...
Paper
arxiv.org/abs/2409.04431
loading . . .
Theory, Analysis, and Best Practices for Sigmoid Self-Attention
Attention is a key part of the transformer architecture. It is a sequence-to-sequence mapping that transforms each sequence element into a weighted sum of values. The weights are typically obtained as...
https://arxiv.org/abs/2409.04431
11 months ago
1
28
5
reposted by
Pierre Ablin
Marco Cuturi
about 1 year ago
The Apple Machine Learning Research (MLR) team in Paris has openings for both FTE roles and a short-term post-doc position to contribute to our team's research agenda. Researchers at Apple's MLR (led by Samy Bengio) target impactful publications in top-tier ML venues and OSS.
1
13
5
Congratulations for these new models !!
add a skeleton here at some point
about 1 year ago
0
4
0
reposted by
Pierre Ablin
Alaa El-Nouby
about 1 year ago
๐๐ผ๐ฒ๐ ๐ฎ๐๐๐ผ๐ฟ๐ฒ๐ด๐ฟ๐ฒ๐๐๐ถ๐๐ฒ ๐ฝ๐ฟ๐ฒ-๐๐ฟ๐ฎ๐ถ๐ป๐ถ๐ป๐ด ๐๐ผ๐ฟ๐ธ ๐ณ๐ผ๐ฟ ๐๐ถ๐๐ถ๐ผ๐ป? ๐ค Delighted to share AIMv2, a family of strong, scalable, and open vision encoders that excel at multimodal understanding, recognition, and grounding ๐งต paper:
arxiv.org/abs/2411.14402
code:
github.com/apple/ml-aim
HF:
huggingface.co/collections/...
3
59
20
reposted by
Pierre Ablin
Gaรซl Varoquaux
about 1 year ago
Great video explaining a clever vectorization for learning on strings and dirty categories: the MinHashEncoder is fast, stateless, and excellent with tree-based learners. It's in
@skrub-data.bsky.social
youtu.be/ZMQrNFef8fg
loading . . .
Why the MinHashEncoder is great for boosted trees
YouTube video by probabl
https://youtu.be/ZMQrNFef8fg
2
75
8
you reached the end!!
feeds!
log in