I'm quite excited about this and still a bit shocked that it works as well as it does. Imitation via distribution matching has always felt like a clunky, brittle way to command agents. Language + zero-shot RL is natural, instantaneous, and scales well, due to the unsupervised nature of RL Zero.
add a skeleton here at some point
over 1 year ago