Hubble is finally out! We used 200k GPU hours from NAIRR and NVIDIA to build a comprehensive resource for the scientific study of LLM memorization. Fully open-source models & data up to 8B params + 500B tokens with controlled data insertion to study memorization risks ๐ญโจ
add a skeleton here at some point
7 months ago