Meta Is Being Sued for Pirating Millions of Books to Train Llama
The legal reckoning for AI training data has arrived at Meta's door — and it arrives at a company already under pressure from a delayed flagship model, a departing chief AI scientist and $145 billion in annual AI spending that investors are struggling to justify.
The Lawsuit
A group of publishers including Elsevier, Cengage, Hachette, Macmillan and McGraw Hill filed a proposed class-action lawsuit against Meta, alleging that the company illegally used millions of copyrighted books and journal articles to train its Llama AI models. The lawsuit claims Meta pirated textbooks, scientific articles and novels without permission, joining a growing wave of copyright disputes between creators and AI companies.
Meta responded by arguing that AI training can qualify as fair use and said it would fight the lawsuit aggressively.
The publishers are not alone. The lawsuit joins a growing body of litigation against AI companies — from the New York Times case against OpenAI to music industry suits against AI audio generators. But the scale of this case, involving some of the world's largest academic and trade publishers, makes it one of the most consequential yet.
Why This Case Matters Beyond Meta
The case expands an increasingly important legal battle over whether using copyrighted materials to train AI systems constitutes transformative use or infringement. Copyright rulings related to AI training could significantly affect how AI vendors build future models, license training data, price enterprise AI products and manage legal risk.
If courts rule against Meta, every AI company that trained on books, articles or other copyrighted text without explicit licences faces potential liability. That means Google, Microsoft, Amazon and Apple — all of which use or license AI models trained on similar data — are watching this case as carefully as Meta is.
Meta's Broader AI Problems This Week
The lawsuit arrives at a difficult moment for Meta's AI division. Meta's flagship model "Avocado" was delayed from March to May 2026 after underperforming against rivals on coding and reasoning benchmarks. Several researchers who recently joined Meta Superintelligence Labs have already left the company. And chief AI scientist Yann LeCun announced his departure to start his own company.
Meta's AI division has seen significant restructuring this year, including leadership changes and the poaching of researchers from other top companies — many of whom have since departed.
Meta is spending more on AI than any company except Amazon. Its top AI scientist is gone. Its flagship model is late. And it is now facing a class-action lawsuit over how it built the models it has already released.
What This Means for GAFAM
The copyright question is the most underreported legal risk in the entire AI industry. Every GAFAM company has trained models on data that may not have been properly licensed. Meta's decision to fight the lawsuit aggressively rather than settle will force a judicial ruling — which could either validate the industry's training practices or expose every major AI company to retroactive liability.
The European Perspective
European copyright law offers significantly less flexibility than US fair use doctrine. The EU Copyright Directive, particularly Article 4, requires AI companies to respect opt-out mechanisms from rights holders — and many European publishers have exercised those opt-outs. If Meta loses in US courts, European regulators will almost certainly use the ruling to accelerate enforcement action under the EU Copyright Directive. The global implications of this case extend far beyond a California courtroom. gafam.ai will be watching.
🔒 This analysis is for GAFAM Intelligence members only.
Already a member? Log in here
🔍 GAFAM INTELLIGENCE