Using Gen AI Tools and Copyright

Unless copyright has expired or material has already been licensed under terms that allow reuse, e.g. under the Creative Commons Attribution licence, normally permission is required to reproduce, share and reuse material. Reproducing and sharing material without permission or a licence from the copyright owner could be unlawful.

However, in certain cases materials may be reused without permission, for specific purposes defined in the law (‘copyright exceptions’ or ‘permitted acts’) or if certain criteria are met, e.g., ‘fair use’ in the US. Training Gen AI models may rely on these exceptions. Since copyright laws vary across countries, it is also crucial where the training activity took place.

Using copyrighted materials without permission to train Gen AI could, therefore, be perceived as unlawful, or it could be deemed to be permitted under an exception ,e.g. the text and data mining exception in the UK, or ‘fair’. This is being decided in relevant court cases, whose outcomes help shape how copyright applies to Gen AI. See section Gen AI and the Creative Industries for examples of copyright owners who have sued AI companies.

Gen AI and Copyright: Text and Data Mining Exception

UK legislation includes a copyright exception allowing copying for the purposes of computational analysis of text and data, as long as the use is non-commercial, the user has lawful access to the materials and the sources are acknowledged (unless it is impossible to do so for practical reasons). For more detail on this exception see section on the left Text and Data Mining and Copyright.

The question is whether the exception could be applied to train GenAI models. This is important if your research involves developing / training a GenAI model. A court case recently ruled in Germany (Kneschke v LAION), which has similar exceptions including one on TDM for research purposes, should help shed light on this. The photographer / copyright owner of an image sued the LAION organisation for copying the image without permission, for the purposes of creating a dataset to support AI training. The case is quite complex; full details on the case are discussed on the Kluwer copyright blog and the TechnoLlama website. Here we highlight the relevance of the court’s decision to (a) confirm that making a copy of an image in order to extract information from it is covered by the exception and (b) that the activity was non-commercial research. Although the decision did not cover the further use to train the model, comments by the judge suggest that TDM exceptions could extend to AI training.

In an academic setting, asserting the right to rely on the TDM exception to train AI in research is important. Some publishers may have clauses in their terms of use that preclude the use of their articles for AI purposes. This is being challenged; please see relevant guidance by JISC.

If your research involves TDM and you are unsure about publishers’ clauses or encounter technological barriers when copying the data, please contact the Copyright Officer, Donya Rowan for advice.

Prompts, copyright infringement and liabilities

As a user of Gen AI tools, you will be providing prompts in the form of text, images, code, film etc. You could be breaching copyright if your prompts are someone else’s intellectual property and you don’t have permission or a licence to share them with a third party. This may include, for example, articles that the University of Derby subscribes to which are provided for personal research and study or images for which you do not own the copyright.

A highly publicised case reflecting this involves Tesla using a still from the film Blade Runner 2049 without permission in October 2024. Tesla first approached Alcon Entertainment LLC, the producer of the film, to ask for permission to reuse the image. When this was denied, Tesla used the image as a prompt in a GenAI tool to generate a new version, which was shared as part of a promotional event. The outcome of the Alcon vs Tesla case should also provide insight on infringement in the context of Gen AI.

You could also be infringing copyright if your generated output is reproducing substantial parts of original content that is protected by copyright and not licensed for reuse. Several AI tools, usually paid versions, offer indemnities to cover legal expenses in the event of a user being sued for copyright infringement. However, these indemnities are limited and not likely to offer comprehensive cover. More advice on indemnities and their limitations can be found on the Farrer&Co website.

Challenges of Attributing Openly Licensed Data

Copyright breaches can still happen even if the training data and prompts are shared with a licence allowing reuse, such as a Creative Commons licence or an open source software licence. (To find out about Creative Commons, see section on the left What is Creative Commons?)

If Gen AI activities rely on a licence, the terms of the licence must be respected. This includes requirements to attribute the author and meeting specific terms of a licence, for example no-derivatives, share-alike and non-commercial restrictions. These points need to be addressed both if you are creating your own model and if your work is being used to train AI models. Creative Commons have a useful article and flowchart showing in which cases of Gen AI activity different terms of the licences apply.

Attribution is, of course, a requirement of all six CC licences; attribution is also expected for materials that are not openly licensed, as part of good academic practice and research integrity and as part of fair dealing if relying on exceptions. There are concerns that Gen AI outputs do not attribute their sources or, if they do, attributions can be inaccurate or fabricated altogether.

Solutions to this might involve a combination of approaches:

Technical advances, for example retrieval-augmented generation techniques that aim to improve the accuracy of the models and the veracity of the sources used.
Legal requirements for better transparency in disclosing training data sets, as is the case with the EU AI Act.
Audit projects such as the Data Provenance Initiative, which aims to increase the transparency of training data sets through analysing provenance and licensing terms.

For an extensive discussion of infringement and attribution issues in Gen AI, see Johnson A. Generative AI, UK Copyright and Open Licences: considerations for UK HEI copyright advice services [version 1; peer review: 2 approved]. F1000Research 2024, 13:134 ( https://doi.org/10.12688/f1000research.143131.1).

Copyright Guide

Acknowledgement

Attribution for image on this page

Using Gen AI Tools and Copyright

Prompts, copyright infringement and liabilities

Challenges of Attributing Openly Licensed Data