Small Language Models (SLMs) donβt have the capacity to remember everything in their training data. Which tokens should they learn to predict, and when should they ask for help? We tackle this question in our new preprint.
You can check it out on arxiv:
arxiv.org/abs/2602.12005
π§΅1/7