Cooper
@afedercooper.bsky.social
đ¤ 485
đĽ 220
đ 245
mischief executive officer
https://afedercooper.info
reposted by
Cooper
James Grimmelmann
11 days ago
New book on the idea shelf: Hasok Chang: _Inventing Temperature: Measurement and Scientific Progress_ What even is temperature? The history of science is a struggle both to make consistent measurements and to think clearly about what they are actually measuring.
james.grimmelmann.net/idea-shelf
3
9
1
reposted by
Cooper
Mark Lemley
16 days ago
Yes, we hope to have both a more lay oriented paper focused on the results and a paper that focuses on the legal implications in the definition of a copy available soon
0
2
1
reposted by
Cooper
Mark Lemley
16 days ago
@afedercooper.bsky.social
and I have a new paper that expands AI memorization tests to include near-verbatim copies, not just exact copies. We find that increases extraction/memorization of copyrighted content significantly.
arxiv.org/abs/2603.24917
loading . . .
Estimating near-verbatim extraction risk in language models with decoding-constrained beam search
Recent work shows that standard greedy-decoding extraction methods for quantifying memorization in LLMs miss how extraction risk varies across sequences. Probabilistic extraction -- computing the prob...
https://arxiv.org/abs/2603.24917
1
33
4
i'm writing my last memorization paper (i say for the 10th time), and hopefully what is my last first author paper for a bit. this one also isn't about copyright. i'm excited to start thinking about other things. if anyone is looking for a new research buddy, i'm down to clown.
about 1 month ago
1
6
0
someone sent me this from the other place and this timeline really is something else
about 2 months ago
1
4
0
Position: ML conferences should consider removing the position paper track (...and just acknowledge that every scientific paper is articulating at least one position)
3 months ago
1
10
0
itâs hard to work at the intersection of ML and copyright because âboth sidesâ of the debate are angry and, in my experience, most havenât done much of the background reading in ML or copyright to have an informed opinion. itâs just vibes and anger. i should probably write something up about this.
3 months ago
1
7
1
reposted by
Cooper
Riana
3 months ago
got to experience the "I did not write that headline" phenomenon firsthand The article: "Correctly scoping a legal safe harbor for A.I.-generated child sexual abuse material testing is tough." The headline: "There's One Easy Solution to the A.I. Porn Problem"
3
51
5
reposted by
Cooper
Paul Eric
3 months ago
After twelve years of work, the worldâs most beautiful subway station has been inaugurated in Rome: Colosseo, an underground archaeological museum.âď¸đâď¸đâď¸đâď¸
loading . . .
16
268
130
The Atlantic posted an article about memorization and generative AI, and it mentions our work on extraction of books from production LLms and open-weight models.
www.theatlantic.com/technology/2...
The referenced work reflects research with
@marklemley.bsky.social
@jtlg.bsky.social
and others.
loading . . .
AIâs Memorization Crisis
Large language models donât âlearnââthey copy. And that could change everything for the tech industry.
https://www.theatlantic.com/technology/2026/01/ai-memorization-research/685552/
3 months ago
1
13
5
reposted by
Cooper
YY Ahn
3 months ago
"In some cases, jailbroken Claude 3.7 Sonnet outputs entire books near-verbatim ... Taken together, our work highlights that, even with model- and system-level safeguards, extraction of (in-copyright) training data remains a risk for production LLMs."
arxiv.org/abs/2601.02671
loading . . .
Extracting books from production language models
Many unresolved legal questions over LLMs and copyright center on memorization: whether specific training data have been encoded in the model's weights during training, and whether those memorized dat...
https://arxiv.org/abs/2601.02671
1
49
14
reposted by
Cooper
Mark Lemley
3 months ago
This new Atlantic piece cites my work with
@afedercooper.bsky.social
, Amy Cyphert, and others in discussing the complexities around AI memorization of training content
www.theatlantic.com/technology/2...
loading . . .
AIâs Memorization Crisis
Large language models donât âlearnââthey copy. And that could change everything for the tech industry.
https://www.theatlantic.com/technology/2026/01/ai-memorization-research/685552/?gift=M_nu5_P942-SPAdQH8agHI9Nb9QCQ_U0YoCR0IX9y2U&utm_source=copy-link&utm_medium=social&utm_campaign=share
8
103
38
We extracted (parts of) 12 books in experiments with 4 frontier-lab, production LLMs. We prompted the LLMs with a short prefix of a book and asked them to complete the rest. For Harry Potter and the Sorcererâs Stone, we extracted 95.8% of the book from jailbroken Claude 3.7 Sonnet.
3 months ago
6
103
50
3-minute explanation of my relationship to LLM memorization research
m.youtube.com/watch?v=unfz...
loading . . .
ABBA - Mamma Mia (Official Music Video)
YouTube video by AbbaVEVO
https://m.youtube.com/watch?v=unfzfe8f9NI
3 months ago
0
4
0
reposted by
Cooper
Kari Maaren
4 months ago
The whole point of being an academic is that you need to be willing to spend three days creating a 700-word footnote that you will later delete. And you need to LIKE IT.
24
906
190
[NeurIPS '25] Our poster (1110) for âComparison requires valid measurement: Rethinking attack success rate comparisons in AI red teaming â is on Friday, December 5, 4:30pm-7:30pm PST in Exhibit Hall C,D,E. [https://openreview.net/forum?id=d7hqAhLvWG]
4 months ago
1
2
0
Iâll be hanging out at our poster on membership inference, but in the same slot Brian Lester will present our work on âThe Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Textâ (poster 102)! [https://arxiv.org/abs/2506.05209]
add a skeleton here at some point
4 months ago
1
4
2
[NeurIPS '25] Really excited to present âExploring the limits of strong membership inference attacks on large language modelsâ (poster 1300) this morning (Friday December 5, 11am-2pm in Exhibit Hall C-E)! [https://arxiv.org/abs/2505.18773]
4 months ago
1
2
1
[NeurIPS '25] Our oral slot and poster session on "Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy and Research" are tomorrow, December 4! [https://arxiv.org/abs/2412.06966] Oral: 3:30-4pm PST, Upper Level Ballroom 20AB Poster 1307: 4:30:-7:30pm PST, Exhibit Hall C-E
4 months ago
1
3
2
Tutorial tomorrow at 1:30PM PST! My talk slots will cover memorization + copying in models and their outputs, canonical extraction methods, and recent work with
@marklemley.bsky.social
and others on extracting pieces of memorized books from open-weight models.
arxiv.org/abs/2505.12546
add a skeleton here at some point
4 months ago
0
9
4
reposted by
Cooper
Katherine Lee
4 months ago
I'm at NeurIPS & hiring for our pretraining safety team at OpenAI! Email me if you want to chat about making safer base models!
2
5
2
Excited to be at NeurIPS this week in San Diego! Please reach out (best over email) if youâd like to chat about privacy & security, scalable evals, and reliable ML systems. Iâll be presenting a few papers/speaking at some events, please stop by! Will post details throughout the week (summary below)
4 months ago
1
7
1
reposted by
Cooper
Johan Ugander
5 months ago
đŁ Postdocs at Yale FDS! đŁ Tremendous freedom to work on data science problems with faculty across campus, multi-year, great salary. Deadline 12/15. Spread the word! Application:
academicjobsonline.org/ajo/jobs/31114
More about Yale FDS:
fds.yale.edu
loading . . .
Yale University, Institute for the Foundations of Data Science
Job #AJO31114, Postdoc in Foundations of Data Science, Institute for the Foundations of Data Science, Yale University, New Haven, Connecticut, US
https://academicjobsonline.org/ajo/jobs/31114
0
23
14
Just finished reading the GEMA v. OpenAI decision (slowly, my German isn't great). Looks like a not small part of the analysis tracked parts of arguments
@jtlg.bsky.social
and I made in 2024. I don't have a well-formed response yet, but hopefully soon. (Main thought atm is a very unpolished "woah")
add a skeleton here at some point
5 months ago
0
3
0
reposted by
Cooper
James Grimmelmann
5 months ago
Today's decision in GEMA v. OpenAI by a German court holds that ChatGPT infringes copyright when it memorizes song lyrics. The opinion cites my paper with
@afedercooper.bsky.social
on memorization in generative models, and its analysis tracks ours.
drive.google.com/file/d/1dUaD...
loading . . .
42-O-14139-24-Endurteil.pdf
https://drive.google.com/file/d/1dUaDiRoPG5v7R7UxNQzEM31yS9pWsknm/view
1
33
18
reposted by
Cooper
Mina Kimes
5 months ago
Bill Ackman gotta be on the third draft of a tweet longer than Middlemarch right now
219
12326
1223
Iâm kinda known as a copyright person, but (even in memorization) I mainly study how to draw reliable conclusions from large-scale AI/ML systems. Thereâs a long spiel why, but today I feel defeated. 100 hours/week on this for 6 years, just to find out a parent treats Gemini in search as ground-truth
5 months ago
0
4
0
The NeurIPS position track didn't take a large number of extraordinary papers that surpassed the acceptance bar, limiting the acceptance rate to an unusually low 6%. If you have a rejected paper at the intersection of ML and law, consider submitting to ACM CSLaw '26.
loading . . .
2026-CFP - ACM Symposium on Computer Science & Law
2026 Call for Papers 5th ACM Symposium on Computer Science and Law March 3-5, 2026 Berkeley, California The 5th ACMâŚ
https://computersciencelaw.org/2026-2/2026-cfp/
7 months ago
1
5
2
reposted by
Cooper
Mark Lemley
7 months ago
Our paper "Machine Unlearning Doesn't Do What You Think" was accepted for presentation at NeurIPS Congrats
@afedercooper.bsky.social
and
@katherinelee.bsky.social
, who led the effort
arxiv.org/abs/2412.06966
loading . . .
Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice
We articulate fundamental mismatches between technical methods for machine unlearning in Generative AI, and documented aspirations for broader impact that these methods could have for law and policy. ...
https://arxiv.org/abs/2412.06966
1
21
4
One more week to submit to CSLaw '26!!
add a skeleton here at some point
7 months ago
0
0
0
reposted by
Cooper
Pamela Samuelson
7 months ago
For an update on the state of play in the generative AI copyright cases, try this podcast:
shows.acast.com/arbiters-of-...
loading . . .
AI Copyright Lawsuits with Pam Samuelson | Scaling Laws
https://shows.acast.com/arbiters-of-truth/episodes/ai-copyright-lawsuits-with-pam-samuelson
0
6
1
15 days left to submit to the CSLaw '26 main track! (archival and non-archival)!
add a skeleton here at some point
7 months ago
0
5
5
reposted by
Cooper
Elizabeth Lopatto
7 months ago
did a little media criticism
www.theverge.com/politics/777...
loading . . .
The WSJ carelessly spread anti-trans misinformation
ďťżThe Wall Street Journalâs fuckup while covering Charlie Kirkâs killing needs more than an editorâs note.
https://www.theverge.com/politics/777630/wsj-trans-misinformation-charlie-kirk
29
2593
652
reposted by
Cooper
mattie lubchansky
7 months ago
was just looking for
@seantcollins.com
âs âgoofy at the crucificationâ post and google is so cool now
11
334
47
After 2 years in press, it's published! "Talkin' 'Bout AI Generation: Copyright and the Generative-AI Supply Chain," is out in the 72nd volume of the Journal of the Copyright Society
copyrightsociety.org/journal-entr...
written with
@katherinelee.bsky.social
&
@jtlg.bsky.social
(2023)
loading . . .
TALKIN' 'BOUT AI GENERATION: COPYRIGHT AND THE GENERATIVE-AI SUPPLY CHAIN | The Copyright Society
We know copyright
https://copyrightsociety.org/journal-entries/talkin-bout-ai-generation-copyright-and-the-generative-ai-supply-chain/
7 months ago
1
14
4
reposted by
Cooper
James Grimmelmann
7 months ago
The Bartz v. Anthropic settlement is the polar opposite of the Google Books settlement: a discrete one-time payment for past copying, on a discrete and closed-ended class, and making no attempt at all to deal with a larger forward-looking issues.
0
19
5
reposted by
Cooper
Mark Lemley
8 months ago
Here is the direct link to the paper:
arxiv.org/abs/2505.12546
add a skeleton here at some point
0
11
5
Iâm excited to share that my paper with
@jtlg.bsky.social
, "The Files are in the Computer: On Copyright, Memorization, and Generative AI" (April 2024), is out in the AI Disrupting Law symposium issue of the Chicago-Kent Law Review! The full issue is here:
scholarship.kentlaw.iit.edu/cklawreview/
loading . . .
Chicago-Kent Law Review | Chicago-Kent College of Law
https://scholarship.kentlaw.iit.edu/cklawreview/
8 months ago
1
23
5
The CFP for ACM CSLaw '26 is up! Deadline for main-track papers (archival and non-archival) is September 30!
computersciencelaw.org/2026
loading . . .
2026 - ACM Symposium on Computer Science & Law
CS&Law 2026 5th ACM Symposium on Computer Science and Law March 3â5, 2026 Berkeley, California Computing, software, and the InternetâŚ
https://computersciencelaw.org/2026
8 months ago
0
12
9
I understand what the underlying probabilities mean, and therefore why this was worth giving a go. But Iâm still occasionally like âHow tf can someone extract entire books from a frontier companyâs flagship LLM? Like we got _all_ of HP 1 with just âMr. and Mrs. Dâ as the seed prompt? What??â
9 months ago
0
2
0
Had a great time and learned a ton at ICML. But as an introvert, Iâve used up all my talking budget until the fall. Excited to get back to full time researchy things, and will hopefully have some exciting new results to share soon!
9 months ago
1
4
0
reposted by
Cooper
M A Osborne
9 months ago
Strangers love to tell me âI canât understand you, because of your MASKâ. Dude, I am literally someone who gets paid to speak to large audiences while wearing a maskâI know I can be understood!
13
147
15
Happening now! Please swing by to talk about measurement!
add a skeleton here at some point
9 months ago
0
2
0
Excited to be at
#ICML
'25! Please reach out if you'd like to chat. You can also find me presenting work at a few different spots, listed below!
9 months ago
2
2
0
Feeling so excited + grateful to be representing this paper at
#ICML
! Please stop by to talk about how to do more valid measurement for evaling gen AI systems! Work led by the incomparable
@hannawallach.bsky.social
and
@azjacobs.bsky.social
as a part of Microsoftâs AI and Society initiative!!
add a skeleton here at some point
9 months ago
0
12
2
Some minor updates to our recent books memorization paper! Iâve separated out a new section 5 that I hope makes some of our ML findings about memorization clearer to a wider audience. Preprint here:
arxiv.org/abs/2505.12546
1/8
loading . . .
Extracting memorized pieces of (copyrighted) books from open-weight language models
Plaintiffs and defendants in copyright lawsuits over generative AI often make sweeping, opposing claims about the extent to which large language models (LLMs) have memorized plaintiffs' protected expr...
https://arxiv.org/abs/2505.12546
9 months ago
1
2
1
reposted by
Cooper
Riana
9 months ago
"Llama 3.1 70B memorizes some books, like Harry Potter & the Sorcerer's Stone and 1984, almost entirely. ... HP is so memorized that, using a seed prompt consisting of just the first line of chapter 1, we can deterministically generate the entire book near-verbatim."
papers.ssrn.com/sol3/papers....
loading . . .
Extracting memorized pieces of (copyrighted) books from open-weight language models
Plaintiffs and defendants in copyright lawsuits over generative AI often make sweeping, opposing claims about the extent to which large language models (LLMs) h
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5262084
0
6
5
reposted by
Cooper
Blake E. Reid
10 months ago
Once again, I encourage folks speculating about what this means to read
@pamelasamuelson.bsky.social
on remedies. The range of possibilities is quite broad.
loading . . .
Thinking About Possible Remedies in the Generative AI Copyright Cases
The sixteen lawsuits brought to date against OpenAI and other developers of generative AI technologies include claims that making copies of in-copyright works f
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4770671
1
28
3
reposted by
Cooper
Blake E. Reid
10 months ago
This opinion is a reminder that these cases are not general-purpose referenda on AI policy; they are hyper-technocratic copyright cases. Copyright draws lots of unsatisfying and counterintuitive distinctions, which is why you should hire and listen to copyright lawyers on the front end.
1
40
8
reposted by
Cooper
Luis Villa
10 months ago
âthese are hypertechnocraticâ is one of the most important things you can draw from this morningâs ruling. In other words, hesitate before drawing parallels between this case and your most (loved|hated) AI training use case. (
@chup.blakereid.org
âs whole thread is great)
add a skeleton here at some point
0
9
1
Load more
feeds!
log in