David Picard 6 months ago
I'm super happy about Nicolas' latest work, probably the magnum opus of his PhD.
Read the thread for all the great details.
The main conclusion I draw from this work is that better pretraining, in particular by conditioning on better data, allows us to train SOTA models at a fraction of the cost.
add a skeleton here at some point