4 months ago
The Common Pile comprises text from 30 distinct sources, covering a wide variety of domains including research papers, code, books, educational materials, audio transcripts, governmental text, and more. Some of this text is commonplace in AI, but a lot of it is pretty new.