#ICML Why are LLMs so powerful but still suck at math? ๐ค A key problem is cross-entropy loss: It is nominal-scale, so tokens are unordered. That makes sense for words, but not for numbers. For a "5" label, predicting โ6โ or โ9โ gives the same loss ๐ฑ Yes, it's crazy! No, nobody has fixed this yet! โฌ๏ธ
7 months ago