Artificial Intelligence and Copyright Infringement: Analysing GEMA v. OpenAI (Case No. 42 O 14139/24) and the Scope of Text and Data Mining Exceptions Under German and European Copyright Law

Artificial Intelligence and Copyright Infringement: Analysing GEMA v. OpenAI (Case No. 42 O 14139/24) and the Scope of Text and Data Mining Exceptions Under German and European Copyright Law

By Guru Legal

Keywords

GEMA v. OpenAI; AI copyright; text and data mining; TDM exception; German Copyright Act; ChatGPT; LLM training; memorisation; reproduction right; DSM Directive; TRIPS; AI-generated content; music copyright; Munich Regional Court

Abstract

In 2025, the Munich I Regional Court delivered a landmark judgment in GEMA v. OpenAI (Case No. 42 O 14139/24), holding that OpenAI’s ChatGPT had infringed copyright in protected German song lyrics by memorising and reproducing them during training and inference. The court rejected OpenAI’s defence that its training activities were protected by the text and data mining (TDM) exception under German copyright law, finding that memorisation constitutes unauthorised reproduction and that the TDM exception does not extend to the storage of copyrightable expression for the purpose of regeneration. This article examines the judgment’s legal reasoning, its implications for large language model (LLM) training practices, the scope of TDM exceptions under EU and German law, and the broader global debate on AI copyright liability. It argues that the decision signals a decisive shift in how courts will evaluate AI training regimes and underscores the urgent need for legislative clarity in the regulation of generative artificial intelligence.

I. Introduction

The rapid proliferation of large language models (LLMs) has precipitated a global wave of copyright litigation, as rights holders assert that the training of AI systems on protected works constitutes infringement, while AI developers invoke statutory exceptions and fair use doctrines in defence. The judgment of the Munich I Regional Court in GEMA v. OpenAI represents one of the most significant judicial interventions to date in this evolving legal landscape. GEMA, the Gesellschaft fuer musikalische Auffuehrungs- und mechanische Vervielfaeltigungsrechte, is Germany’s principal music rights management organisation, representing the rights of composers, lyricists, and music publishers. Its claim that ChatGPT infringed copyright in protected song lyrics by memorising and reproducing them without authorisation raised foundational questions about the nature of AI training, the scope of the reproduction right, and the limits of the TDM exception under the EU Directive on Copyright in the Digital Single Market (DSM Directive, 2019/790/EU) and its implementation in German law.

This article proceeds as follows. Part II provides background on the parties and the facts of the case. Part III analyses the court’s key findings on memorisation and reproduction. Part IV examines the scope and limits of the TDM exception. Part V situates the judgment in the global context of AI copyright litigation. Part VI discusses the implications for LLM development and regulatory reform. Part VII concludes.

II. Background: GEMA, OpenAI, and the Facts

GEMA administers the performing and mechanical reproduction rights of approximately 95,000 German members and over two million international rights holders through bilateral agreements with sister organisations worldwide. Its mandate encompasses the licensing and enforcement of copyright in musical works, including song lyrics, which constitute literary works protected under Paragraph 2(1) No. 1 of the German Act on Copyright and Related Rights (Urheberrechtsgesetz, UrhG).

The dispute arose when GEMA identified that ChatGPT was capable, in response to user prompts, of generating verbatim or near-verbatim excerpts of copyright-protected German song lyrics. GEMA conducted systematic testing, eliciting reproductions of protected lyrics from ChatGPT, and concluded that these lyrics had been incorporated into the model’s training data and were stored in a manner permitting their regeneration. GEMA contended that OpenAI had used these lyrics in its training dataset without obtaining a licence or paying remuneration, thereby infringing the reproduction right vested in its member composers and lyricists.

OpenAI’s primary defence was that its training activities fell within the TDM exception under Paragraph 44b UrhG, which implements Article 4 of the DSM Directive. OpenAI argued that the automated analysis of large corpora of text, including song lyrics, for the purpose of training AI models constitutes permissible text and data mining, and that the storage of training data incidental to such mining is covered by the exception.

III. The Court’s Key Findings: Memorisation as Reproduction

The Munich I Regional Court rejected OpenAI’s TDM defence on two principal grounds. First, the court held that the process by which ChatGPT was trained did not merely involve automated analysis of copyrighted works but encompassed the memorisation of copyrightable expression in a form permitting its regeneration. The court characterised this memorisation as the creation of an unlicensed reproduction within the meaning of Paragraph 16 UrhG, which grants authors the exclusive right to reproduce their works in any form, including storage in electronic media. The court found that the training process caused ChatGPT to store the informational content of protected lyrics their specific wording, sequence, and poetic structure in a manner that could be retrieved and regenerated, thereby constituting reproduction of the copyrightable expression.

Second, the court addressed the liability of OpenAI for outputs generated in response to user prompts. It held that where a user prompts ChatGPT to produce song lyrics and the model generates verbatim or substantially similar text to a protected work, OpenAI is liable for the resulting reproduction, having designed and deployed a system that makes such reproduction possible. The court declined to accept that user prompting constitutes an independent intervening act sufficient to break the chain of causation between OpenAI’s system design and the infringing output.

The court’s analysis of memorisation is analytically significant. LLMs do not store training data in the manner of a conventional database but encode statistical patterns derived from training data in model weights. The court’s finding that this encoding constitutes reproduction suggests that the reproduction right extends to any form of storage that enables the subsequent regeneration of copyrightable expression, irrespective of the technical mechanism employed. This interpretation aligns with a purposive reading of the reproduction right under international copyright law, including Article 9 of the Berne Convention and Article 9 of the TRIPS Agreement, which protect reproduction in any form.

IV. The Scope and Limits of the TDM Exception

The TDM exception under Article 4 of the DSM Directive and Paragraph 44b UrhG permits the reproduction and extraction of lawfully accessible works for the purpose of text and data mining, subject to the right of rights holders to opt out. The exception is subject to two conditions: the work must be lawfully accessible, and the copies made must be deleted upon completion of the mining process. The court found that OpenAI’s activities exceeded the scope of the exception on both grounds.

With respect to lawful access, the court noted that the inclusion of copyright-protected song lyrics in OpenAI’s training dataset required a licence from the rights holders. The TDM exception covers the act of mining the automated analysis of works to extract patterns, trends, or correlations but does not authorise the reproduction of copyrightable expression beyond what is strictly necessary for that analytical purpose. The court held that the memorisation of lyrical content in model weights constitutes a reproduction that goes beyond mere analysis, and therefore falls outside the scope of the exception.

With respect to the deletion requirement, the court observed that LLMs do not delete training data upon completion of training in any meaningful sense: the informational content of the training data is encoded in the model weights and retained indefinitely. This retention was found to be inconsistent with the temporary and incidental nature of the copies contemplated by the TDM exception.

The court’s interpretation of the TDM exception has significant implications for the AI industry. If the training of LLMs on copyrighted text systematically falls outside the exception, AI developers must either obtain licences for all protected works in their training data or limit their training to non-copyright-protected or openly licensed corpora.

V. Global Context: AI Copyright Litigation

The GEMA v. OpenAI judgment is one of a growing number of cases globally in which rights holders are challenging the use of copyrighted works in AI training. In the United States, authors and publishers have filed multiple actions against OpenAI, Stability AI, and other developers, including The New York Times Company v. Microsoft Corporation and OpenAI (S.D.N.Y. 2023), in which the plaintiff alleges that millions of its articles were used without authorisation to train GPT models. Unlike in Germany, US copyright law does not contain a specific TDM exception, and the key legal question is whether AI training constitutes fair use under Section 107 of the Copyright Act, 1976.

In India, the Copyright Act, 1957 does not contain a specific TDM exception, though the fair dealing provisions under Section 52 may provide limited protection for certain AI training activities involving research or private use. The growing domestic AI industry, including significant investments in LLM development, makes the absence of legislative clarity on AI training and copyright a pressing concern for Indian policymakers. The GEMA v. OpenAI judgment is directly relevant to Indian copyright law reform deliberations, providing a concrete illustration of the copyright risks associated with unsupervised LLM training on protected content.

VI. Implications for LLM Development and Regulatory Reform

The Munich judgment has several significant implications. For AI developers, it signals that training LLMs on copyrighted works without appropriate licences carries substantial legal risk in jurisdictions with strong reproduction rights. Developers operating in European markets must reassess their training data curation practices, implement robust rights clearance workflows, and consider engaging with rights management organisations such as GEMA to negotiate blanket licences for training use.

For legislators and policymakers, the judgment highlights the inadequacy of the current TDM exception framework as applied to generative AI. The EU AI Act, which entered into force in 2024, imposes transparency obligations on providers of general-purpose AI models regarding training data used in their systems, but does not resolve the underlying copyright question. There is an urgent need for targeted legislation clarifying the relationship between AI training and the reproduction right, including the scope of permissible data mining, the conditions under which model weights constitute reproductions, and the remuneration obligations of AI developers.

For rights holders, the judgment provides a powerful precedent for enforcement actions against AI developers whose systems reproduce protected works. Rights management organisations such as GEMA are well positioned to conduct systematic testing of AI outputs and to bring representative actions on behalf of their members.

VII. Conclusion

GEMA v. OpenAI is a landmark judgment that substantially advances the jurisprudence on AI copyright liability. By characterising the memorisation of copyrightable expression during LLM training as reproduction, and by holding that the TDM exception does not cover such memorisation, the Munich I Regional Court has established a demanding standard for AI developers operating in Germany and potentially across the European Union. The judgment underscores the importance of proactive rights clearance, collaborative licensing frameworks between AI developers and rights management organisations, and legislative reform to address the novel copyright challenges posed by generative AI. As LLM technology continues to evolve and AI training corpora grow in scale and diversity, the questions raised by this judgment will only become more pressing for courts, legislators, and industry stakeholders worldwide.

Bibliography

GEMA v. OpenAI, Case No. 42 O 14139/24 (Munich I Regional Court, 2025).

Directive (EU) 2019/790 of the European Parliament and of the Council on Copyright and Related Rights in the Digital Single Market [2019] OJ L 130/92.

German Act on Copyright and Related Rights (Urheberrechtsgesetz, UrhG), as amended.

Berne Convention for the Protection of Literary and Artistic Works (Paris Act, 1971).

Agreement on Trade-Related Aspects of Intellectual Property Rights (TRIPS Agreement, 1994).

The New York Times Company v. Microsoft Corporation and OpenAI Inc., Case No. 1:23-cv-11195 (S.D.N.Y. 2023).

Copyright Act, 1957 (India), as amended by the Copyright (Amendment) Act, 2012.

Regulation (EU) 2024/1689 (EU Artificial Intelligence Act) [2024] OJ L 1689.

About the Author

Leave a Reply

Your email address will not be published. Required fields are marked *

You may also like these