I think it’s under appreciated how many interesting, diverse, and deep technical problems there are when training and serving frontier models. People think it’s all just pytorch and indexing errors, but there is soooo much other stuff going on across the entire stack.
add a skeleton here at some point
10 days ago