Plagiarism and copyright in AI imagery
With humorous and engaging animations, this video by Atomic Shrimp discusses how data is used to train Gen AI's and highlights the ethic and legal concerns when Gen AI systems generate images that almost replicate the original.
Gen AI's such as Midjourney and Stable Diffusion have enabled their users to create art works in seconds. Simply providing a few prompts you can let your imagination soar, creating custom works that look similar to your favourite artists in seconds. But once the "oohs" and "wows" have faded, have you considered what data the Gen AI has been trained on? Did the artists choose to allow a Gen AI company to use their artworks for training?
This short video from the BBC describes the concerns raised by creative industries over the use of Gen AI.
Currently there are several pending litigations from artists, writers, newspaper organisations, music firms and photographic companies against AI developers. Key to their individual complaints are alleged copyright infringement by using their copyright protected works as training data for their large language models (Gen AI), false endorsement and the production of derivative outputs that closely replicate the human creator's works.
The Author's Guild and 17 authors such as George R.R. Martin, Jodi Picoult and John Grisham have filed a class-action suit against Open AI and Microsoft Corporation in the US (December 2023) for copyright infringement of their works of fiction. The complaint describes how the works were copied wholesale without permission by pirate book repositories and fed into large language models (LLMs), producing outputs based on user's prompts that resembled the works of these authors. ChatGPT has been used to generate books that mimic the author's works, such as the recent attempt to produce the volumes 6 and 7 of George R.R.Martin's series Game of Thrones, A Song of Fire and Ice. Their complaint describes that ironically without the input of their copyright protected works, AI companies wouldn't have the commercial products to damage the market for professional authors.
The Chinese courts have become the first to reach a decision regarding the liability of AI providers for copyright infringement of their outputs. In February 2024 the Guangzhou Internet Court ruled in favour of the copyright owner of the Ultraman series (Shanghai Character License Administrative Co., Ltd. (SCLA)) who accused an undisclosed AI company of creating outputs that bore a resemblance to Ultraman. While the company hadn't created the AI model themselves their website allowed users, through prompts to generate images that looked similar to an original copyright protected artistic creation - Ultraman. Although this case addresses copyright infringement of outputs and not the use of copyright protected data used for training AI models, it is one of the first court decisions which addresses the legal questions surrounding Gen AI.
As more of these cases emerge and unfold it will be interesting to see the outcomes and the consequences!
There has been much copyright discussion around whether using copyright materials without permission for training AI models could be considered fair. In the UK as mentioned earlier in the libguide this is called 'fair dealing' (see section on the left fair dealing and copying legally) under the Copyright Designs and Patents Act 1988. In the US this is called 'fair use' under the Copyright Act 1976. Unlike the UK, in the US there is already a possible precedent that could serve useful for the AI companies in the Google Books Project. In 2005 Google Inc was accused by the Authors Guild and individual copyright owners for scanning more than twenty-million books without permission for participating libraries, and creating an electronic database of books, available online as snippets and full text. A decade of litigation later, the judge ruled that Google's use was 'transformative'. Whilst Google digitised and copied whole books, they also created a new transformative service enabling researchers and readers to search snippets, perform text and data mining and find books. This service rather than competing commercially with the copyright owners was seen to benefit them by increasing public exposure to the original works.
Interestingly OpenAI is using fair use in their defence against the New York Times. The court case (December 2023) alleges that OpenAI has used millions of New York articles to train their AI model which competes commercially with the news company as a source of credible information. The New York Times insists that their use is not transformative because they are replicating their work and creating a commercial substitute. In addressing the claims of producing exact reproductions of New York Times articles OpenAI has argued that this is a bug that the AI developers intend to fix.* In May 2024 OpenAi produced their "Approach to data and AI", setting out their "social contract for content in the AI age". Describing a collaboration with artists, journalists and writers and the development of Media Manager. This tool will allow copyright owners and creators to inform OpenAI about the works they own and whether they want their work included or excluded from training data. This self-regulatory approach stipulates "If on rare occasions a model inadvertently repeats expressive content, it is a failure of the machine learning process" and states their measures for improving and resolving such issues.**
* For further copyright discussion of 'fair use' and the NYT v OpenAI case see: Mira T. Sundara Rajan, Is Generative AI Fair Use of Copyright Works? NYT v. OpenAI, Kluwer Copyright Blog, 29.02.2024
** See Bernd Justin Jütte, Open AI’s vison for a social contract – of things to come…, Kluwer Copyright Blog, 03.06.2024.