Menlo Park, CA, February 17, 2025 – Newly unsealed court documents from the case Kadrey v. Meta shed light on internal discussions among Meta employees about the use of copyrighted materials to train the company’s artificial intelligence models.
The filings, submitted by plaintiffs that include prominent authors, indicate that Meta staffers debated methods of incorporating copyrighted content—such as books and online data—obtained through legally questionable means, into training sets for models in the company’s Llama family.
According to the documents, internal work chats revealed that some Meta employees advocated an “ask forgiveness, not for permission” approach when considering the use of copyrighted works. In one discussion, research engineer Xavier Martinet suggested acquiring e-books at retail prices as an alternative to negotiating licensing deals with publishers. He noted that many startups were likely already using pirated content for similar purposes, arguing that direct licensing negotiations could be time-consuming.
Senior manager Melanie Kambadur and colleagues also discussed potential data sources, including Libgen—a website known for providing access to copyrighted works without authorization. One chat highlighted that some within the team viewed using Libgen as essential for achieving state-of-the-art model performance, despite its controversial legal status. To mitigate legal exposure, proposals were made to remove data marked as pirated and to refrain from publicly citing the use of such datasets.
The filings further reveal that Meta’s internal strategy included tuning AI models to “avoid IP risky prompts,” such as requests to reproduce extensive excerpts from copyrighted texts. Additional conversations touched on the possibility of revisiting previous decisions on training sets, with some team members arguing that Meta’s proprietary data from its social platforms was insufficient to meet the growing demands for training material.
Meta maintains that training its models on copyrighted works falls under “fair use,” a position that is contested by the plaintiffs in the case. The plaintiffs, which include well-known authors Sarah Silverman and Ta-Nehisi Coates, argue that Meta’s practices violate copyright law. In response, Meta has bolstered its legal team with Supreme Court litigators from the law firm Paul Weiss.
The case, pending in the U.S. District Court for the Northern District of California, continues to raise complex questions about the balance between technological innovation, intellectual property rights, and the legal frameworks governing AI training data.
Read next
08:00
Meta and Russian search engine Yandex have been secretly tracking what Android users do on their web browsers—even when users are in private or incognito mode—according to experts from Radboud University and IMDEA Networks.
06:00
Meta is turning to nuclear power to meet its rising energy needs for artificial intelligence and computing.
18:06
Meta announced it has removed 60 coordinated accounts linked to Iran that targeted Azeri-speaking audiences in Azerbaijan and Türkiye across multiple platforms.
08:00
Meta
Meta has signed another major solar energy agreement, acquiring 650 megawatts (MW) of clean power capacity across Kansas and Texas to support the rapid expansion of its AI-driven data centers, the company confirmed Thursday.
06:00
African countries are stepping up efforts to hold global tech companies like Meta accountable for how they handle user data. This could be a turning point for digital control on the continent.
What is your opinion on this topic?
Leave the first comment