Generative artificial intelligence (AI) is a technology promising to disrupt how artwork is created, software is developed, and text is written. This disruption brings with it a host of new legal questions surrounding intellectual property protections of these works—in particular, copyright protections. Analysis of new legal issues may be understood in relation to two key elements of a generative AI system: the input data and the output data. The first article in this series will focus on input data, while the next article will focus on output data.
Can a generative AI system use input that is protected by copyright law? In the United States, this question is evaluated under the Fair Use Doctrine. The decision in Authors Guild v. Google, Inc. (the "Google Books case") sheds light on this analysis. Google scanned digital copies of books and made the resulting search function available to the public. The plaintiffs in the case claimed that this constituted copyright infringement. However, the Second Circuit eventually ruled that Google's actions were considered fair use and therefore not infringing:
"Google's unauthorized digitizing of copyright-protected works, creation of a search functionality, and display of snippets from those works are non-infringing fair uses. The purpose of the copying is highly transformative, the public display of text is limited, and the revelations do not provide a significant market substitute for the protected aspects of the originals. Google's commercial nature and profit motivation do not justify denial of fair use."
While the holding in Google Books is specific to the facts of that case, it appears to support the idea that using copyrighted works as input to a generative AI system is protected under fair use. Generative AI may use those works to transform them into (generate) a new work. Furthermore, the display of the original works may be minimal, if not unrecognizable, in the new works created by a generative AI system. It is important to note that this decision is not a green light to use copyrighted works in any application of generative AI. The works produced by a generative AI system may serve as a significant market substitute—this weighs against fair use. What if the copyrighted works were used as input data to a generative AI system to produce art that directly competes in the marketplace? Consider the following uses of a popular generative AI model, Dall‧E, below:
On the left, Dall‧E was asked to generate an image of "an astronaut riding a horse in a photorealistic style." On the right, Dall‧E was also asked to generate an image of "an astronaut riding a horse," but this time it was asked to do so "in the style of Andy Warhol."
As can be seen above, generative AI is capable of producing content that is in the style of a particular artist. This could have significant effects on the market value of an artist's work. Overnight, a software program could generate thousands of unique works in the style of a particular artist, causing the supply of a particular artwork style to increase substantially. Unlike the Google Books case, this type of use of copyrighted works may produce an output that is intended to be a commercial replacement for the copyrighted work used as input for a generative AI system. Fair use may not apply in this scenario.
In a related development, Microsoft and OpenAI (a research company that has developed several AI programs, including Dall‧E, the generative art program that produced the image above) are defending against a class action lawsuit brought in the Northern District of California by two software developers. This lawsuit implicates Copilot, another OpenAI program. Copilot generates computer code based on user descriptions in plain language and is trained on billions of lines of code from repositories like GitHub. In their complaint, the plaintiffs allege that Copilot regularly outputs "verbatim copies" of licensed code stored on GitHub. Notably, the complaint does not allege copyright infringement but instead argues that the companies have violated federal law, including the Digital Millennium Copyright Act, 17 U.S.C. §§ 1201–1205 (DMCA). The complaint also alleges a breach of contract regarding the various licenses and GitHub's terms of service. Despite not alleging copyright infringement at this time, this lawsuit promises to provide answers as to the legality of training AI programs on copyrighted materials. In particular, is Copilot copying protected code and then displaying verbatim copies of portions of that code?
Adding to the list of legal challenges centered on generative AI, a potential class action lawsuit was recently filed against Stability AI Ltd., Midjourney Inc, and DeviantArt Inc. in the Northern District of California. The named defendants provide text to image-generative AI tools. The plaintiffs characterize the generative AI system as "21st-century collage tools" that violate the rights of artists by using their works without consent. A similar complaint was filed by Getty Images against Stability AI in the High Court of Justice in London, claiming Stability AI infringed intellectual property rights, including the copyright owned by Getty images.
Outside of the United States, other countries have provided explicit guidance on the use of copyrighted works in text and data mining applications (TDMs). TDM involves using computational techniques to analyze large amounts of data in order to identify patterns, trends, and other useful information. TDM is used for various purposes, including training artificial intelligence systems. Territories with exceptions include the United Kingdom, the European Union, Japan, and Singapore. Notably, at the time of writing of this article, the United Kingdom's exception applies only to non-commercial purposes. However, ongoing policy discussions signal the possibility that the UK TDM exception may soon be expanded to include commercial purposes. The changing policy surrounding the use of copyrighted works in large data applications (including AI) shows the recognition by governments that updated legal directives are necessary to contend with the fast-developing AI industry.
The answer is not clear at this point whether a generative AI system may use input data that is protected by copyright law. On one hand, new works are being created from the input data with minimal display of any copyrighted works. On the other hand, the manner in which generative AI systems can be used is far ranging—a user of these systems may produce seemingly identical copies of protected works and/or produce works that are intended to be market substitutes.
 Authors Guild v. Google, Inc. 804 F.3d 202, 229 (2d. Cir. 2015) (emphasis added).