If you use LLMs, tokenisation bias probably affects you:
* Text generation: tokenisation bias ⇒ length bias 🤯
* Psycholinguistics: tokenisation bias ⇒ systematically biased surprisal estimates 🫠
* Interpretability: tokenisation bias ⇒ biased logits 🤔
add a skeleton here at some point
4 months ago