Menlo Park, CA, February 17, 2025 – Newly unsealed court documents from the case Kadrey v. Meta shed light on internal discussions among Meta employees about the use of copyrighted materials to train the company’s artificial intelligence models.
The filings, submitted by plaintiffs that include prominent authors, indicate that Meta staffers debated methods of incorporating copyrighted content—such as books and online data—obtained through legally questionable means, into training sets for models in the company’s Llama family.
According to the documents, internal work chats revealed that some Meta employees advocated an “ask forgiveness, not for permission” approach when considering the use of copyrighted works. In one discussion, research engineer Xavier Martinet suggested acquiring e-books at retail prices as an alternative to negotiating licensing deals with publishers. He noted that many startups were likely already using pirated content for similar purposes, arguing that direct licensing negotiations could be time-consuming.
Senior manager Melanie Kambadur and colleagues also discussed potential data sources, including Libgen—a website known for providing access to copyrighted works without authorization. One chat highlighted that some within the team viewed using Libgen as essential for achieving state-of-the-art model performance, despite its controversial legal status. To mitigate legal exposure, proposals were made to remove data marked as pirated and to refrain from publicly citing the use of such datasets.
The filings further reveal that Meta’s internal strategy included tuning AI models to “avoid IP risky prompts,” such as requests to reproduce extensive excerpts from copyrighted texts. Additional conversations touched on the possibility of revisiting previous decisions on training sets, with some team members arguing that Meta’s proprietary data from its social platforms was insufficient to meet the growing demands for training material.
Meta maintains that training its models on copyrighted works falls under “fair use,” a position that is contested by the plaintiffs in the case. The plaintiffs, which include well-known authors Sarah Silverman and Ta-Nehisi Coates, argue that Meta’s practices violate copyright law. In response, Meta has bolstered its legal team with Supreme Court litigators from the law firm Paul Weiss.
The case, pending in the U.S. District Court for the Northern District of California, continues to raise complex questions about the balance between technological innovation, intellectual property rights, and the legal frameworks governing AI training data.
Read next
12:24
A former Meta executive, Sarah Wynn-Williams, testified before U.S. senators on Wednesday, alleging that the company compromised national security to build a lucrative business in China.
15:36
Safety Features
Meta is rolling out its “Teen Accounts” feature to Facebook and Messenger, offering enhanced privacy and parental controls to protect young users online. This comes as lawmakers push for stricter social media regulations, responding to growing concerns over children’s safety.
14:38
Market showdown
The EU is preparing to rule on Apple and Meta’s alleged breaches of the Digital Markets Act, potentially issuing fines as it seeks to curb Big Tech’s dominance and reinforce fair competition across the digital landscape.
12:29
RUSSIA
Russian President Vladimir Putin has signed a law banning the distribution of advertisements on the information platforms of "extremist and undesirable" organizations. Platforms such as Instagram and Facebook (which are banned in Russia) will also be affected by these amendments.
07:45
llama 4
Meta Platforms has unveiled the latest iteration of its large language model, introducing two new versions—Llama 4 Scout and Llama 4 Maverick—on Saturday.
What is your opinion on this topic?
Leave the first comment