On July 11, the Daily Journal published “AI Can Learn, but Not Loot: Pirated Data Off Limits for LLM Training.” The piece was authored by Meaghan Kent, Marcella Ballard, Heather West, and Matthew Julyan. The following is an excerpt:
On June 23, 2025, Judge Alsup in the Northern District of California issued an order in Bartz et al. v. Anthropic PBC, granting in part and denying in part Defendant Anthropic’s motion for summary judgment on the sole issue of whether its use of Plaintiffs’ books as training data for Anthropic’s large language models (LLMs) was “quintessential” fair use.
Central to its mixed holding, the court acknowledged that Anthropic used the works in various ways and for varying purposes, such that each “use” must be identified and assessed separately. Ultimately, the court held that while the use of textual works to train LLMs was “exceedingly transformative” and thereby was protected as fair use when considered against the remaining factors, the separate use of the works to create a central library was only fair use with respect to works purchased or lawfully accessed—i.e., the use of pirated copies to create the central library was not protectible fair use. This decision makes clear that the source of content is a key element in evaluating fair use.
Use Within Anthropic’s Central Library
Anthropic’s co-founder previously admitted to downloading numerous online digital book libraries known to be assembled from unauthorized copies that were stored indefinitely in a central library. While Anthropic later pivoted its data collection approach by purchasing copies that were then scanned into a digitized central library, Anthropic never deleted or removed the pirated copies from its central library.
In its opening brief, Anthropic argued that the key consideration for the court’s fair use analysis was what Anthropic did with those works during the training process—not whether it had lawful access to those materials. The court plainly rejected this argument, explaining that “[c]reating a permanent, general-purpose library was not itself a fair use excusing Anthropic’s piracy,” such that “Anthropic had no entitlement to use pirated copies for its central library.” Moreover, Anthropic was never entitled to create or hold copies of the pirated works, meaning that “almost any unauthorized copying would have been too much.”
Click here to read the article.