Marco about 1 year ago
#EMNLP has a nice set of tokenization/subword modeling papers this year.
It's a good mix of tokenization algorithms, tokenization evaluation, tokenization-free methods, and subword embedding probing. Lmk if I missed some!
Here is a list with links + presentation time (in chronological order).