The source said Meta ignored his lawyer's warning and used pirated books to train the AI model. 02/15 Update SLTechnology News&Howtos

The source said Meta ignored his lawyer's warning and used pirated books to train the AI model.

2026-02-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)12/24 Report--

CTOnews.com, Dec. 13 (Reuters)-Meta Platforms is still bent on going its own way despite the legal risks of using thousands of pirated books to train its AI model, according to new documents in a copyright infringement lawsuit.

According to CTOnews.com, the lawsuit was jointly launched this summer by comedian Sarah Silverman, Pulitzer Prize winner Michael Chabon and other famous writers, who accused Meta of using his work without permission to train artificial intelligence language model Llama. This week, the case was tried in conjunction with another similar lawsuit.

Last month, a California judge dismissed some of Silverman's lawsuit, but said it would allow the author to modify the content of the lawsuit. The new complaint includes a chat transcript of a Meta researcher discussing access to a dataset on the Discord chat server, which could be important evidence that Meta is aware of the possible copyright risks of using books. Conversations in chat transcripts show that they have discussed the legal risks of using pirated books to train AI models. Among them, Tim Dettmers, a Meta researcher, mentioned that lawyers in Meta's legal department had said that if these books were used to train AI models, there might be legal problems. The data cannot be used, and if used, the model cannot be released, the lawyer said.

This year, a number of technology companies have faced similar charges from content creators, accusing them of infringing copyright when building generative AI models.

If successful, these lawsuits may adversely affect the development of generative AI, as it may increase the cost of building AI models and force AI to pay artists, authors, and other content creators for the use of their work.

At the same time, new EU interim rules on artificial intelligence may force companies to disclose the data they use to train models, which could expose them to greater legal risks.

Meta released its first version of the Llama large language model in February this year and published a list of data sets for training, including the "Books3" section of the "ThePile" dataset. According to the lawsuit documents, the creator of the data set said it contained 196640 books. The company did not disclose training data for its latest version of Llama 2, which opened to business users this summer.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.