Meta Faces Copyright Lawsuit Over Alleged Use of Pirated Content for AI Training

By News Update On Jan 12, 2025

Meta, the parent company of Facebook, is embroiled in a lawsuit that claims it knowingly used pirated content to train its Llama AI models. The lawsuit, Kadrey v. Metaaccuses the company of using data from LibGen, a shadow library known for distributing copyrighted books and academic materials without permission. The plaintiffs, including prominent authors Sarah Silverman and Ta-Nehisi Coates, argue that this action violated copyright laws and compromised the intellectual property rights of creators.

Allegations of Zuckerberg’s Approval

Court documents reveal that Meta proceeded with using LibGen data despite warnings from some employees about the legal risks involved. The plaintiffs assert that CEO Mark Zuckerberg personally approved the use of the controversial dataset, even after concerns were raised internally. This revelation has drawn attention to Meta’s data acquisition practices and raised questions about the company’s commitment to ethical AI development.

Meta’s Defense: Fair Use

In its defense, Meta argues that its use of the LibGen dataset falls under the “fair use” doctrine, which permits the limited use of copyrighted material for transformative purposes. The company contends that using publicly available texts to train AI models is a legitimate practice that contributes to technological progress. However, authors and creators argue that AI models built on their work devalue their intellectual property and could harm their livelihood. The plaintiffs maintain that Meta’s actions do not qualify as fair use.

Internal Concerns and Hidden Actions

Internal communications have surfaced, revealing that Meta executives and employees were aware of the potential legal issues with using LibGen data. Some employees referred to it as a “pirated dataset,” and there were concerns about its implications for Meta’s legal standing. Despite these warnings, the decision was made to proceed, with approval coming from Zuckerberg himself.

The lawsuit further claims that Meta deliberately concealed its use of copyrighted material. It alleges that an engineer, Nikolay Bashlykov, created a script to remove copyright-related information from e-books obtained through LibGen. Additionally, Meta is accused of stripping attribution from scientific journal articles used in training its AI models. Plaintiffs suggest that these actions were intentional attempts to evade copyright detection.

Torrenting Controversy

The legal battle deepens with allegations that Meta obtained LibGen’s materials through torrenting, a method of file-sharing that allows users to download and upload files simultaneously. The plaintiffs argue that by participating in this process, Meta not only accessed pirated content but also helped distribute it. The lawsuit claims that Ahmad Al-Dahle, Meta’s head of generative AI, approved the torrenting despite objections from employees who raised legal concerns about the practice.

Avoiding Legal Data Acquisition

The plaintiffs argue that Meta intentionally bypassed lawful methods of obtaining books and academic content. Even if Meta had purchased or borrowed these materials legally, they would still need explicit permission to use them for AI training. By engaging in torrenting and accessing LibGen, Meta is accused of knowingly participating in illegal activities, which constitutes copyright infringement.

Impact on Meta’s Reputation

While the lawsuit is centered around Meta’s earlier Llama models, its outcome could have far-reaching implications for the entire AI industry. If the court rules against Meta, it could set a precedent for how AI companies obtain and use training data in the future. The case has already taken a toll on Meta’s public image, with Judge Vince Chhabria recently rejecting the company’s request to heavily redact court documents. He criticized Meta’s attempt to avoid negative publicity, stating that the request was designed to shield the company from public scrutiny rather than protect sensitive information.

Previous Legal Issues for Meta and OpenAI

This is not the first time Meta has faced legal action over AI training practices. In 2023, Silverman and other authors filed a lawsuit against Meta and OpenAI, accusing them of using pirated materials to train their AI models. Although some claims were dismissed, the plaintiffs amended their complaint, bolstering their case. The outcome of this case could have a significant impact on future AI regulation, especially regarding the use of copyrighted content without explicit permission.