Ok, so I can finally talk about this!
We spent the last year (actually a bit longer) training an LLM with recurrent depth at scale.
The model has an internal latent space in which it can adaptively spend more compute to think longer.
I think the tech report ...🐦⬛
add a skeleton here at some point
about 1 year ago