Stand-up comic Sarah Silverman has failed separate lawsuits against OpenAI and Meta, claiming copyright infringement after their AI models allegedly used her content for training without her permission.
Silverman, along with authors Christopher Golden and Richard Kadrey, allege that OpenAI and Meta’s respective artificial intelligence-backed language models were trained on illegally-acquired datasets containing the authors’ works, according to the suit.
The complaints state that ChatGPT and Meta’s LLaMA honed their skills using “shadow library” websites like Bibliotik, Library Genesis and Z-Library, among others, which are illegal given that most of the material uploaded on these sites is protected by authors’ rights to the intellectual property over their works.
When asked to create a dataset, ChatGPT reportedly produced a list of titles from these illegal online libraries.
“The books aggregated by these websites have also been available in bulk via torrent systems,” says the proposed class-action suit against OpenAI, which was filed in San Francisco federal court on Friday along with another suit against Facebook parent Meta Platforms.
Exhibits included with the suit show ChatGPT’s response when asked to summarize books by Silverman, Golden and Kadrey.
The first example shows the AI bot’s summary of Silverman’s memoir, The Bedwetter; then Golden’s award-winning novel Ararat; and finally Kadrey’s Sandman Slim.
The suit says ChatGPT’s synopses of the titles fails to “reproduce any of the copyright management information Plaintiffs included with their published works” despite generating “very accurate summaries.”
This “means that ChatGPT retains knowledge of particular works in the training dataset and is able to output similar textual content,” it added.
The authors’ suit against Meta also points to the allegedly illicit sites used to train LLaMA, the ChatGPT competitor the Mark Zuckerberg-owned company launched in February.
AI models are all trained using large sets of data and algorithms. One of the datasets LLaMA uses to get smarter is called The Pile, and was assembled by nonprofit AI research group EleutherAI.
Silverman, Goldman and Kadrey’s suit points to a paper published by EleutherAI that details how one of its datasets, called Books3, was “derived from a copy of the contents of the Bibliotik private tracker.”
Bibliotik — one of the handful of “shadow libraries” named in the lawsuit — are “flagrantly illegal,” the court documents said.
The authors say in both claims that they “did not consent to the use of their copyrighted books as training material” for either of the AI models, claiming OpenAI and Meta therefore violated six counts of copyright laws, including negligence, unjust enrichment and unfair competition.
Although the suit says that the damage “cannot be fully compensated or measured in money,” the plaintiffs are looking for statutory damages, restitution of profits and more.
The authors’ legal counsel did not immediately respond to The Post’s request for comment.
The Post has also reached out to OpenAI and Meta for comment.
The lawyers representing the three authors — Joseph Saveri and Matthew Butterick — are involved in multiple suits involving authors and AI models, according to their LLMlitigation website.
In 2022, they filed a suit against OpenAI’s GitHub Copilot — which turns natural language into code and was acquired by Microsoft for $7.5 billion in 2018 — claiming that it violates privacy, unjust enrichment and unfair competition laws, and also commits fraud, among other things.
Saveri and Butterick also filed a complaint earlier this year challenging AI image generator Stable Diffusion, and have represented a slew of other book authors in class-action litigation against AI tech.
Source by [New York Post]