Meta internal chats reveal discussions on using copyrighted works for AI training

Reuters

Menlo Park, CA, February 17, 2025 – Newly unsealed court documents from the case Kadrey v. Meta shed light on internal discussions among Meta employees about the use of copyrighted materials to train the company’s artificial intelligence models.

The filings, submitted by plaintiffs that include prominent authors, indicate that Meta staffers debated methods of incorporating copyrighted content—such as books and online data—obtained through legally questionable means, into training sets for models in the company’s Llama family.

According to the documents, internal work chats revealed that some Meta employees advocated an “ask forgiveness, not for permission” approach when considering the use of copyrighted works. In one discussion, research engineer Xavier Martinet suggested acquiring e-books at retail prices as an alternative to negotiating licensing deals with publishers. He noted that many startups were likely already using pirated content for similar purposes, arguing that direct licensing negotiations could be time-consuming.

Senior manager Melanie Kambadur and colleagues also discussed potential data sources, including Libgen—a website known for providing access to copyrighted works without authorization. One chat highlighted that some within the team viewed using Libgen as essential for achieving state-of-the-art model performance, despite its controversial legal status. To mitigate legal exposure, proposals were made to remove data marked as pirated and to refrain from publicly citing the use of such datasets.

The filings further reveal that Meta’s internal strategy included tuning AI models to “avoid IP risky prompts,” such as requests to reproduce extensive excerpts from copyrighted texts. Additional conversations touched on the possibility of revisiting previous decisions on training sets, with some team members arguing that Meta’s proprietary data from its social platforms was insufficient to meet the growing demands for training material.

Meta maintains that training its models on copyrighted works falls under “fair use,” a position that is contested by the plaintiffs in the case. The plaintiffs, which include well-known authors Sarah Silverman and Ta-Nehisi Coates, argue that Meta’s practices violate copyright law. In response, Meta has bolstered its legal team with Supreme Court litigators from the law firm Paul Weiss.

The case, pending in the U.S. District Court for the Northern District of California, continues to raise complex questions about the balance between technological innovation, intellectual property rights, and the legal frameworks governing AI training data.

Tags

Comments (0)

What is your opinion on this topic?

Leave the first comment