hailey schoelkopf
@hails.computer
π€ 2323
π₯ 127
π 4
so academic twitter is like actually-actually migrating this time huh? i still donβt know if i have it in me to actively use another social network yet π
about 1 year ago
7
48
0
reposted by
hailey schoelkopf
Luca Soldaini π
over 2 years ago
We released Dolma, the dataset for OLMo, AI2's LLM. It's 3+ trillion tokens. We hope it will help w study of language models! Available on HuggingFace w/ ImpACT license
huggingface.co/datasets/allenai/dolma
Overview+datasheet
blog.allenai.org/dolma-3-trillion-tokens-open-llm-corpus-9a0ff4b8da64
loading . . .
Dolma: 3 Trillion Token Open Corpus for Language Model Pretraining
We released Dolma, OLMoβs pretraining dataset. Dolma open dataset of 3 trillion tokens. Available on HuggingFace under the ImpACT license
https://blog.allenai.org/dolma-3-trillion-tokens-open-llm-corpus-9a0ff4b8da64
1
23
11
you reached the end!!
feeds!
log in