Raphael Pisoni
@4rtemi5.bsky.social
๐ค 3124
๐ฅ 466
๐ 198
Unsupervised multimodal representation of a learning researcher.
https://www.pisoni.ai
I never really considered how dangerous QK-norm actually is before working on RBF Attention. While solving some obvious issues, it can be the cause of some much less obvious ones.๐งต
9 days ago
1
3
1
Neural networks have a fundamental problem. Feed them garbage data and instead of admitting that they are confused, they will confidently hallucinate. I just open-sourced the HALO-Loss to try and fix this. It give the model a mathematically sound *I don't know!* button.๐งต
loading . . .
18 days ago
1
1
0
I dove deeper into the rabbit hole of RBF-Attention. I refined the Triton kernel, added register-tokens and developed SuSiE positional embedding as a replacement for RoPE in Euclidean space. Go have a look at the repo or the blogpost in the comments if you're interested! :)
add a skeleton here at some point
26 days ago
1
2
0
For some reason I decided to swap out standard dot-product attention for a scaled-rbf kernel. Pretty much expected it to fail to converge or be impossibly slow but the scaled-rbf-attention is getting unexpectedly good results?? ๐
29 days ago
1
2
2
Playing with a new kind of attention. Plots are for the same setup with standard vs. modified attention on one epoch of tiny-stories. Speed is roughly the same as flash-attention. Looking good!๐ค
about 1 month ago
1
3
1
AI isn't coming for your creativity. It's coming for your lack of diligence. People talk a lot about
#AGI
and "super-intelligence," but the immediate disruption is much simpler: AI is killing "vibe-based" decision-making.
3 months ago
1
0
0
Over the past year Michaล Lewandowski and I published a series of papers on Space Folding , and while Michaล went to
#AAAI
to present the latest one, I worked on a blog-post explaining some the central ideas behind the papers. Let me know what you think!
www.pisoni.ai/posts/space-...
loading . . .
The Shape of Thought: Space Folding in Neural Networks
The mathematical description of deep learning has long been dominated by the language of algebra: matrices, gradients, and optimization landscapes. A parallel and perhaps more intuitive language howev
https://www.pisoni.ai/posts/space-folding/
3 months ago
0
5
0
After a long hiatus I decided to update my blog and write about some of the things I did over the last few years. Come have a look!
pisoni.ai
3 months ago
1
3
0
Currently heading to
#EurIPS
in Copenhagen to present our work on space folding and model interpretability. If you're attending and would like to discuss Representation Learning, SSL, Multimodal LLMs, CV, or other topics that YOU are excited about, feel free to reach out.
5 months ago
0
4
0
reposted by
Raphael Pisoni
hardmaru
6 months ago
The US government should subsidize Open AI rather than OpenAI
0
49
8
reposted by
Raphael Pisoni
Yuki Asano
6 months ago
On the occasion of the 1000th citation of our Sinkhorn-Knopp self-supervised representation learning paper, I've written a whole post about the history and the key bits of this method that powers the state-of-the-art SSL vision models. Read it here :):
docs.google.com/document/d/1...
1
22
5
We're ready!
add a skeleton here at some point
7 months ago
0
0
0
The single most undervalued property of neural networks is self-consistency. We should change that!
8 months ago
0
2
0
reposted by
Raphael Pisoni
asker the gauche, glycojohn destroyer of carbs
9 months ago
2
160
25
You've been researching for a while! Time to have some SOTA!
#aislop
loading . . .
9 months ago
0
3
0
You and Adam keep beating Sota? Stop doing that! Poor Sota!
9 months ago
1
9
0
Have some cool idea but only evaluate it on small models? Tough luck buddy. You only get your paper accepted if your experimental results are 0.2% above SOTA and too expensive to falsify! Is academic publishing pay to win yet?
9 months ago
0
3
0
Is there a reason why none of the recent models use RBF-kernel Attention to get rid of the softmax-bottleneck for long context? I tried replacing dot-product attention with the negative squared KQ-distance and was able to remove the softmax without issues and loss in performance!
9 months ago
1
3
1
reposted by
Raphael Pisoni
NeurIPS Conference
9 months ago
NeurIPS is endorsing EurIPS, an independently-organized meeting which will offer researchers an opportunity to additionally present NeurIPS work in Europe concurrently with NeurIPS. Read more in our blog post and on the EurIPS website:
blog.neurips.cc/2025/07/16/n...
eurips.cc
loading . . .
eurips.cc
A NeurIPS-endorsed conference in Europe held in Copenhagen, Denmark
https://eurips.cc/
2
124
42
Has anyone experimented with "conditional gradients"? Thinking about a setup where, within a specific activation range (e.g., right before a ReLU), you'd only permit positive or negative gradients.
10 months ago
1
1
0
Quick question to the SSL experts out there: Usually you evaluate an ssl-model by freezing it and training a linear probing layer. Would it be fair to somehow learn a final layer with more dimensions than classes and do a nearest-neighbor evaluation?
10 months ago
0
0
0
reposted by
Raphael Pisoni
David Picard
10 months ago
There is an oak forest in central France that was planted 400 years ago by Colbert so that France would have quality hard wood by the 2000s to build ships for its navy. This is the type of long term planning that Seldonian predictions can help improving.
1
7
2
reposted by
Raphael Pisoni
Nafnlaus ๐ฎ๐ธ ๐บ๐ฆ
12 months ago
New anti-censorship jailbreak just dropped ;)
1
32
9
Currently on my way to
#ICLR
in Singapore where we'll present our latest paper on space folding in neural networks. Would be happy to meet some people there so if you're at ICLR as well and want to hang out feel free to pm!๐
about 1 year ago
1
3
0
Grok this! What a roller-coaster of emotions...๐คช
about 1 year ago
1
4
0
reposted by
Raphael Pisoni
Wissam Antoun
about 1 year ago
ModernBERT or DeBERTaV3? What's driving performance: architecture or data? To find out we pretrained ModernBERT on the same dataset as CamemBERTaV2 (a DeBERTaV3 model) to isolate architecture effects. Here are our findings:
3
43
15
reposted by
Raphael Pisoni
Dmytro Mishkin
about 1 year ago
Just assembled a slide about local feature training time/dataset size. Anything wrong/missing?
5
18
4
Is the project even still worth doing when wandb runs out of funny names or am I cooked?๐ซ
about 1 year ago
1
1
0
reposted by
Raphael Pisoni
Jeremy Morrell
about 1 year ago
Meta introduced Llama 4 models and added this section near the very bottom of the announcement ๐ฌ โ[LLMs] historically have leaned left when it comes to debated political and social topics.โ
ai.meta.com/blog/llama-4...
5
135
98
reposted by
Raphael Pisoni
ETH CS Department
about 1 year ago
๐Hello, world! We are now live on Bluesky. This is the official account of the Department of Computer Science at ETH Zurich. Follow us for cutting-edge research, the latest innovations, event updates and insights into the future of technology.
inf.ethz.ch
@csateth.bsky.social
@ethzurich.bsky.social
loading . . .
Department of Computer Science
Computer Science Department at ETH Zurich. The department offers highest quality in computer science research and education and adds to business and industry growth.
https://inf.ethz.ch
1
22
8
Recently had the pleasure of helping
@miclew.bsky.social
with a couple of his papers in exchange for him helping me with a couple of mine! This is the first fruit of our common work. We quantify space folding in relu neural networks with a range based measure. Lots of fun to write and read!๐
add a skeleton here at some point
about 1 year ago
0
6
0
x''= 0
about 1 year ago
0
3
0
reposted by
Raphael Pisoni
Gabriele Berton
about 1 year ago
๐ Paper Release! ๐ Curious about image retrieval and contrastive learning? We present: ๐ "All You Need to Know About Training Image Retrieval Models" ๐ The most comprehensive retrieval benchmarkโthousands of experiments across 4 datasets, dozens of losses, batch sizes, LRs, data labeling, and more!
2
40
10
reposted by
Raphael Pisoni
Wallace Marshall
about 1 year ago
anybody else a fan of "three body problem" - remember that part where the aliens attack earth by shutting down our ability to do science? what a crazy, fictional idea, good thing nothing like that could happen in real life.
7
104
17
reposted by
Raphael Pisoni
Rafael Pinto
about 1 year ago
"no b-but deepseek c-can't tiannamen" Here's Grok for you:
2
30
13
reposted by
Raphael Pisoni
Hank Green
about 1 year ago
The fact that Deepseek R1 was released three days /before/ Stargate means these guys stood in front of Trump and said they needed half a trillion dollars while they knew R1 was open source and trained for $5M. Beautiful.
398
13863
1891
Super interesting new CLIP-loss that takes cross-sample similarities into account to learn consistent representations. Also makes pretraining very data efficient. But i think there is a catch...๐
loading . . .
$\mathbb{X}$-Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs
Learning good representations involves capturing the diverse ways in which data samples relate. Contrastive loss - an objective matching related samples - underlies methods from self-supervised to mul...
https://arxiv.org/abs/2407.18134
over 1 year ago
1
4
0
Not that I'm super active on social media recently but I still feel like I need a break... ๐ซฃ
over 1 year ago
0
0
0
Another nail in the coffin of cosine similarity! I started disliking cossim some years ago due to multiple reasons such as the non-linearity around 0.0 and the loss of certainty-information due to the normalization of feature vectors but this study seems to give another good reason to abandon it.
loading . . .
Cosine Similarity: Not the Silver Bullet We Thought It Was | Shaped Blog
In the world of machine learning and data science, cosine similarity has long been a go-to metric for measuring the semantic similarity between high-dimensional objects. However, a new study by resear...
https://www.shaped.ai/blog/cosine-similarity-not-the-silver-bullet-we-thought-it-was
over 1 year ago
1
32
8
reposted by
Raphael Pisoni
Jeremy Howard
over 1 year ago
I'll get straight to the point. We trained 2 new models. Like BERT, but modern. ModernBERT. Not some hypey GenAI thing, but a proper workhorse model, for retrieval, classification, etc. Real practical stuff. It's much faster, more accurate, longer context, and more useful. ๐งต
19
618
181
She said YES!๐ฅฐ
over 1 year ago
2
29
0
reposted by
Raphael Pisoni
Mark Riedl
over 1 year ago
O3 is costly. These numbers are for a single ARC benchmark task
add a skeleton here at some point
2
36
12
Fantastic Muse quote on a fantastic NeurIPS poster! Doesn't get much better than that!๐
add a skeleton here at some point
over 1 year ago
1
8
0
reposted by
Raphael Pisoni
Remi Cadene
over 1 year ago
HOT ๐ฅ fastest, most precise, and most capable hand control setup ever... Less than $450 and fully open-source ๐คฏ by @huggingface, @therobotstudio, @NepYope This tendon-driven technology will disrupt robotics! Retweet to accelerate its democratization ๐ A thread ๐งต
loading . . .
3
73
29
reposted by
Raphael Pisoni
Morticia (MLS, ASCP)
over 1 year ago
The global rise of anti-intellectualism and anti-science is directly related to the global rise of fascism and right-wing authoritarianism. The defense of truth is inherently anti-fascist.
0
57
15
Stand by while annual NeurIPS FOMO is loading...
over 1 year ago
1
4
0
I mean no disrespect but the timing of publishing this before moving to OpenAI, who has changed its ethical standpoint quite frequently and is by now openly deploying its tech on the battlefield is a bit unfortunate. I wish all involved parties the best though so let's hope nobody gets burned.๐ค
add a skeleton here at some point
over 1 year ago
2
16
0
Whaat?
add a skeleton here at some point
over 1 year ago
0
2
0
It's so funny that this movie was so inspiring to so many people. I also did a project to detect and decode Arrival glyph codes in university but our solution wasn't quite as revolutionary...๐
add a skeleton here at some point
over 1 year ago
1
2
0
Load more
feeds!
log in