Philip Whittington
@philipwitti.bsky.social
📤 27
📥 26
📝 0
Doctoral student @ETH Zürich 🇨🇭
reposted by
Philip Whittington
Tiago Pimentel
about 2 months ago
Honoured to receive two (!!) SAC highlights awards at
#ACL2025
😁 (Conveniently placed on the same slide!) With the amazing:
@philipwitti.bsky.social
,
@gregorbachmann.bsky.social
and
@wegotlieb.bsky.social
,
@cuiding.bsky.social
, Giovanni Acampa,
@alexwarstadt.bsky.social
,
@tamaregev.bsky.social
add a skeleton here at some point
0
22
3
reposted by
Philip Whittington
Tiago Pimentel
9 months ago
BPE is a greedy method to find a tokeniser which maximises compression! Why don't we try to find properly optimal tokenisers instead? Well, it seems this is a pretty difficult—in fact, NP-complete—problem!🤯 New paper +
@philipwitti.bsky.social
@gregorbachmann.bsky.social
:)
arxiv.org/abs/2412.15210
loading . . .
Tokenisation is NP-Complete
In this work, we prove the NP-completeness of two variants of tokenisation, defined as the problem of compressing a dataset to at most $δ$ symbols by either finding a vocabulary directly (direct token...
https://arxiv.org/abs/2412.15210
1
45
9
you reached the end!!
feeds!
log in