Sergei Vassilvitskii
@vsergei.bsky.social
📤 285
📥 135
📝 4
Algorithms, predictions, privacy.
https://theory.stanford.edu/~sergei/
Synthetic Data is all the rage in LLM training, but why does it work? In
arxiv.org/abs/2502.08924
we show how to analyze this question through the lens of boosting. Unlike boosting, however, our assumptions on the data and the learning method are inverted.
loading . . .
Escaping Collapse: The Strength of Weak Data for Large Language Model Training
Synthetically-generated data plays an increasingly larger role in training large language models. However, while synthetic data has been found to be useful, studies have also shown that without proper...
https://arxiv.org/abs/2502.08924
12 months ago
1
8
3
you reached the end!!
feeds!
log in