US Copyright Office Issues Report Addressing Use of Copyrighted Material to Train Generative AI Systems

Overview

On May 9, 2025, the US Copyright Office (“USCO”) released a highly anticipated pre-publication version of the third and likely final installment of its Report on Copyright and Artificial Intelligence – Part 3: Generative AI Training (the “Report”).^[1]

The very next day President Trump dismissed the Register of Copyrights and Director of the USCO, Shira Perlmutter – two days after dismissing Carla Hayden, the Librarian of Congress.^{^[2]}Perlmutter has since sued President Trump and the acting Librarian of Congress seeking an injunction blocking her removal.^[3]

It is unclear whether Trump’s dismissals (1) were related to the content of the Report (which was critical of some of the arguments advanced by those who favor free use of copyrighted material to train AI models), (2) prompted the unusual pre-publication of the Report or (3) will affect the issuance or content of the final Report.

Regardless, the Report addresses an unsettled question underlying dozens of pending copyright AI lawsuits: does the use of copyrighted works without permission to develop and deploy generative AI (“GenAI”) models qualify as “fair use?”^[4] This is the billion dollar question facing GenAI companies whose business model is predicated on such use.

For private fund managers and others who are increasingly looking to harness the power of GenAI and navigate the attendant legal risk, the Report is particularly timely.

While the Report notes that any fair use analysis must be context-specific, it does offer some helpful general guidelines in assessing fair use for GenAI. According to the Report, transformative use and market effects will be the most significant fair use factors that judges will assess in ruling on GenAI companies’ use of copyrighted material. As detailed below, these factors tend to weigh for or against fair use depending on the circumstances – and one federal judge appears poised to rule broadly in favor of fair use for GenAI.

Ultimately, the Report does not recommend immediate government intervention on fair use or compulsory licensing issues related to GenAI. Instead, it advises allowing the nascent licensing market for GenAI training data to evolve organically. To address potential gaps in data offerings and market inefficiencies, the Report explores alternative mechanisms, including extended collective licensing schemes, which could provide broader and more efficient licensing solutions by aggregating rights on behalf of multiple copyright holders.^[5]

Fair Use Factors

For private fund managers contemplating using GenAI in their investment process, the Report’s discussion of the fair use factors is worth examining.

In addressing the first fair use factor (the purpose and character of the use), the Report notes that determining whether an AI output is transformative is context-dependent. Some use cases are clearly permitted, others clearly are not.^[6] Of note to private fund managers, using GenAI for noncommercial research or analysis where portions of the copyrighted works are not reproduced in the outputs is, according to the Report, likely to constitute fair use.^[7] However, using unlawfully accessed material (via pirated works or by circumventing paywalls) to train a GenAI model that produces unrestricted competing content would not constitute fair use.^[8]

Also of note for private fund managers, the Report highlights that when retrieval-augmented generation (“RAG”) searches^[9] summarize the retrieved copyrighted works rather than providing hyperlinks to the original source, such outputs are less likely to be considered transformative and, therefore, may not qualify as fair use.^[10]

In addressing the second factor (the nature of the copyrighted work), the Report notes that any analysis must be context-specific but a fair use finding is less likely if the material used to train the GenAI is “more expressive” or “previously unpublished.”^[11]

In addressing the third factor (the amount and substantiality of the use), the Report concludes that under certain circumstances, use of an entire work may not in fact weigh against fair use.^[12]

In addressing the fourth factor (market effects), which the Supreme Court has designated as “undoubtedly the single most important element of fair use,”^[13] the Report identifies several potential harms, including lost sales, lost licensing opportunities, RAG-related substitution and market dilution. Here, the USCO wades into “uncharted territory,”^[14] as no court has yet recognized that market dilution can be applied when AI-generated content competes with human-created works.^[15] However, the Report argues that while many GenAI applications promise great public benefits, the sheer unprecedented volume of such applications could pose significant harm to the market for copyrighted works.^[16] If courts apply this theory of market dilution, rightsholders may be able to block any use that might have a general effect on the market for copyrighted works, even if it doesn’t specifically impact the rightsholder. Further, the Report emphasizes that where licensing options already exist – or are reasonably likely to develop – the loss of licensing opportunities will disfavor fair use.^[17]

Ultimately, the USCO concludes that fair use analysis of GenAI applications must remain on a case-by-case basis, but the first and fourth factors will carry “considerable weight.”^[18]

Licensing

The Report outlines four general licensing options: voluntary direct licensing, voluntary collective licensing, extended collective licensing (“ECL”) and compulsory licensing.

Voluntary direct licenses are negotiated on a case-by-case basis between individual rightsholders and AI developers.
Voluntary collective licensing agreements typically involve collective management organizations (“CMO”s) that are authorized by multiple rightsholders to negotiate licenses and administer royalty collection and distribution on their behalf.^[19]
ECL builds on the voluntary collective agreement model to cover the works of all relevant rightsholders in a particular category – even those who haven’t actually joined the CMO – while providing an opt-out mechanism for non-participant rightsholders to negotiate separately.^[20]
Compulsory licensing is a statutory framework that permits use of copyrighted material without the rightsholder’s direct consent, subject to government oversight and often complex rate-setting procedures.^[21]

Voluntary direct and collective licensing markets for GenAI have already emerged, with others in development.^[22] Licensing at scale, however, raises several practical concerns including cost structure, impact on model quality and antitrust issues. Licensing large volumes of copyrighted works at market rates could be prohibitively expensive, particularly given the vast datasets required to train modern AI models. Moreover, if models can only be trained on licensed works, the resulting models may be “tainted by bias and inaccuracy.”^[23] There are also antitrust concerns that big tech companies could crowd out smaller developers who might not be able to afford to negotiate broad data licenses.^[24] The USCO argues these concerns shouldn’t factor into the fair use analysis, and defers to the Department of Justice for guidance (including a possible antitrust exemption) and the Federal Trade Commission for enforcement.^[25]

The Report ultimately advocates for the growth of voluntary licensing regimes for copyrighted works, which can facilitate AI innovation while protecting rightsholders. To support this approach, the Report further argues that ECL could address market inefficiencies without the market risks from “premature” statutory approaches such as compulsory licensing, which may stifle innovation and distort market incentives.^[26]

Litigation

The Report is already impacting pending copyright AI lawsuits. While the USCO defers to the courts to “weigh the statutory factors together” and calls it impossible to prejudge litigation outcomes,^[27] federal courts can and have deferred to the legal interpretations of agencies such as the USCO, depending on their thoroughness, validity and persuasiveness.^[28] As no definitive case law exists on the use of copyrighted material for training GenAI,^[29] content owners have already jumped on the Report, citing it as supplemental authority in a detailed counter to the fair use defense in two pending cases.^[30] Interestingly, in a May 22, 2025 hearing in a federal case against Anthropic PBC in California, Judge William Alsup said he was leaning “toward finding Anthropic PBC violated copyright law when it made initial copies of pirated books, but that its subsequent uses to train their GenAI models qualify as fair use.”^[31] Alsup appeared sympathetic to Anthropic’s argument that its use is “transformative in the extreme” but also might make Anthropic pay for its initial use, noting: “I have a hard time seeing that you can commit what is ordinarily a crime, but get exonerated because you end up using it for a transformative use.”^[32] Alsup could be the first judge in the nation to rule on fair use in the GenAI context. And if his reasoning on the fair use factors survives appeal and is adopted by other courts, it could augur well for developers of GenAI, even if the Report itself provides litigation ammunition for content owners.

Takeaways

Overall, the Report provides some instructive – if not legally binding – guidance for AI companies, copyright owners and private fund managers.

For AI companies and downstream users, the Report suggests that implementing effective guardrails to prevent infringing outputs will weigh in favor of fair use and recommends leveraging existing and emerging data licensing frameworks to train AI models. The Report also flags for AI companies that knowingly training AI models on pirated datasets would almost certainly exceed the boundaries of fair use.^[33] In such cases, and possibly others, courts may be less inclined to accept arguments about transformative use or net societal benefits of GenAI – particularly when such use poses foreseeable market harm to content owners.^[34]

For copyright owners, the Report encourages creators to pursue organized approaches to collective licensing via CMOs while recognizing market dilution as a potential harm from unrestrained AI training. The Report also notes that copyright owners ideally shouldn’t be required to opt out of the use of their material for training AI models.

For private fund managers, the Report offers some guidance that certain non-commercial research uses of GenAI may constitute fair use but that other uses (RAG searches) may not, and therefore carry greater risk. Given the volatility at the Copyright Office and the rapid technological and legal developments in the AI space, private fund managers who use GenAI should continue to pay close attention to this area.

Authored by and Steven Appel.

If you have any questions concerning this Alert, please contact your attorney or one of the authors.

Endnotes

[1] According to the Report, the Copyright Office released the pre-publication version of Part 3 (the only version to be released in such manner) in response to congressional inquiries and interest from stakeholders, stating that a final version would be published in the near future, without any substantive changes expected in the analysis or conclusions. See US Copyright Office, Copyright and Artificial Intelligence Part 3: Generative AI Training Pre-Publication Version (May 2025).

Part 1 of the Report focuses on the need for congressional action to protect against deep fakes. See US Copyright Office, Copyright and Artificial Intelligence Part 1: Digital Replicas (July 2024). Almost one year after Part 1’s release, in April 2025, Congress passed the TAKE IT DOWN Act, which criminalizes the publication of non-consensual intimate imagery, including AI-generated deepfakes.

Part 2 concludes that existing law is sufficient to address the copyrightability of AI outputs without the need for legislative action. The Report notes that copyright does not extend to purely AI-generated material or material where there is insufficient human control over the expressive elements; rather, human authors are entitled to copyright in (i) their works of authorship that are perceptible in AI-generated outputs, (ii) the creative selection, coordination or arrangement of material in the outputs and (iii) creative modifications of the outputs. See US Copyright Office, Copyright and Artificial Intelligence Part 2: Copyrightability (January 2025).

[2] See “Trump fires Copyright Office director after report raises questions about AI training,” TechCrunch, May 11, 2025

[3] See “Ex-Copyright Head Perlmutter Sues Trump to Block Termination,” Bloomberg Law, May 22, 2025. On May 28, 2025, the judge hearing the case denied Perlmutter’s request for emergency injunctive relief reinstating her, ruling from the bench that even if she was likely to succeed on the merits, she had not met the “irreparable harm” standard required to justify relief pending full adjudication of the claims. See “Judge Rejects Ex-Copyright Chief’s Bid to Pause Trump Firing,” Bloomberg Law, May 28, 2025.

[4] Fair use is a doctrine codified in Section 107 of the 1976 Copyright Act. It provides that “the fair use of a copyrighted work . . . is not an infringement of copyright” and lists four non-exclusive factors that must be considered in determining whether a particular use is fair: (1) the purpose and character of the use (the transformative factor), including whether such use is of a commercial nature or is for nonprofit educational purposes; (2) the nature of the copyrighted work; (3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and (4) the effect of the use upon the potential market for or value of the copyrighted work. 17 U.S.C. § 107.

[5] See Report at 85, 107.

[6] As explained in the Report: On one end of the spectrum, training a model on books to support a content moderation tool is a “highly transformative” use – essentially research-driven and non-expressive. At the other end, training a model on art to produce lookalike images isn’t transformative – the expressive output is a substitute for the original. Many uses will fall in between, and these of course are the ones that are – and are likely to be – the subject of uncertainty and litigation. For example, a model trained on sound recordings to generate new music might not copy any one track outright but still serves the same audience and purpose to entertain, which the USCO considers only “modestly transformative.” See Report at 45-46.

[7] See Report at 74.

[8] See Report at 74.

[9] RAG searches are often used by investment teams to search and query research archives, to summarize and highlight insights from PDFs, wikis and news articles. See “How Hedge Funds Are Really Using Generative AI – And Why It Matters for Manager Selection,” Resonanz Capital, April 20, 2025.

[10] See Report at 47.

[11] See Report at 54.

[12] According to the Report, in cases where there is a transformative purpose, and where there is a need to train on a large volume of works to effectively generalize, the copying of entire works may be reasonable, especially where little or none of the copied material will be made available to the public. See Report at 60.

[13] See Report at 61.

[14] See Report at 65.

[15] The statute on its face encompasses any “effect” on the potential market. Relying on comments submitted by the Writers Guild of America and UMG Recordings, the Report argues that the speed and scale at which AI systems generate content can dilute markets for the same kinds of works they were trained on, creating unprecedented competition for sales of an author’s works and more difficulty for audiences in finding them. See Report at 60, 65-66; see also TechCrunch, May 11, 2025.

[16] See Report at 73-74.

[17] See Report at 73.

[18] See Report at 74.

[19] Report at 85.

[20] Affected rightsholders typically must opt out if they do not wish to be bound by the CMO. See Report at 99.

[21] Report at 96.

[22] Some AI companies have accordingly reached multimillion dollar licensing arrangements with publishers such as The Associated Press and the Financial Times. Others, such as OpenAI and Meta, have faced significant legal challenges, including lawsuits turning on the fair use issue. See “US Copyright Office releases major AI training report amid intensifying copyright debate,” PPC Land, May 11, 2025; see also Report at 63. On May 29, 2025, Bloomberg reported that The New York Times, which sued OpenAI and Microsoft Corp. over their unlicensed use of copyrighted material to train chatbots, had agreed to license its content to Amazon.com Inc. for use across its AI platforms. See “New York Times Agrees to License Content to Amazon for AI Use,” Bloomberg, May 29, 2025.

[23] “The Copyright Office Issues A Largely Disappointing Report On AI Training, And Once Again A Major Fair Use Analysis Inexplicably Ignores The First Amendment,” Techdirt, May 12, 2025.

[24] “Copyright Office Punts on AI and Fair Use, One of the Biggest Questions Surrounding Gen AI,” CNET, May 12, 2025.

[25] See Report at 75.

[26] This approach contrasts with how OpenAI, among other tech firms, has called for the government to codify a copyright strategy that gives AI companies leeway through fair use in order to outcompete other actors such as China, which has a long record of violating copyright restrictions. See Report at 107.

[27] See Report at 107.

[28] Skidmore v. Swift & Co., 323 U.S. 134 (1944); see Loper Bright Enterprises v. Raimondo, 603 U.S. 369 (2024).

[29] On February 11, 2025, Judge Stephanos Bibas of the US District Court for the District of Delaware granted summary judgment in favor of Thomson Reuters, finding that Ross Intelligence’s use of Thomson Reuters’ copyrighted legal content to train its AI legal research tool was not a fair use; however, he explicitly noted that fair use in the context of GenAI remains an open question. See Thomson Reuters Enterprise Centre GmbH v. Ross Intelligence Inc..

[30] “With summary judgment pending, Kadrey, Bartz plaintiffs cite Pre-Publication version of Copyright Office Report on fair use and AI training as Supplemental authority, despite its non-final status,” Chat GPT Is Eating the World, May 13, 2025.

[31] See Judge Hints Anthropic’s AI Training on Books Is Fair Use, Bloomberg News, May 22, 2025.

[32] Ibid.

[33] This argument has already emerged in ongoing litigation. During a summary judgment hearing in Bartz v. Anthropic, Judge Alsup indicated that the plaintiffs may have viable claims that the AI startup infringed copyright by sourcing training materials from pirated websites rather than purchasing them. “Training LLMs Is OK, Pirating Isn’t: Anthropic Judge Tips Hand,” Law360, May 22, 2025.

[34] AI companies have argued that their models learn from copied copyrighted material in the same way a human student would (i.e., “fair learning”), and should thus fall under fair use. The Report counters that the speed and precision by which GenAI models learn far exceed the limitations of human learning that form the basis of the rights granted under copyright law. See Report at 47-48.

US Copyright Office Issues Report Addressing Use of Copyrighted Material to Train Generative AI Systems

Overview

Overview

Fair Use Factors

Licensing

Litigation

Takeaways

Stay Connected

Authors

Scott M. Kareff

Steven M. Appel

Endnotes