What happens if we tokenize cat as [ca, t] rather than [cat]?
LLMs are trained on just one tokenization per word, but they still understand alternative tokenizations. We show that this can be exploited to bypass safety filters without changing the text itself.
#AI #LLMs #tokenization #alignment
7 months ago