Dibyakanti Kumar (@dkumar9.bsky.social)

Why does noisy gradient-descent train neural nets? This fundamental question in ML remains unclear. In our hugely revised draft my student @dkumar9.bsky.social gives the full proof that a form of noisy-GD, Langevin Monte-Carlo (#LMC), can learn arbitrary depth 2 nets. arxiv.org/abs/2503.10428

loading . . .

Langevin Monte-Carlo Provably Learns Depth Two Neural Nets at Any Size and Data In this work, we will establish that the Langevin Monte-Carlo algorithm can learn depth-2 neural nets of any size and for any data and we give non-asymptotic convergence rates for it. We achieve this ... https://arxiv.org/abs/2503.10428