Marco about 1 month ago
My paper "Tokenization as Finite-State Transduction" was accepted to Computational Linguistics.
This was my final PhD degree requirement :)
The goal was to unify the major tokenization algorithms under a finite-state automaton framework. For example, by encoding a BPE tokenizer as a transducer.