#ICML Why are LLMs so powerful but still suck at math? π€ A key problem is cross-entropy loss: It is nominal-scale, so tokens are unordered. That makes sense for words, but not for numbers. For a "5" label, predicting β6β or β9β gives the same loss π± Yes, it's crazy! No, nobody has fixed this yet! β¬οΈ
3 months ago