Get all your news in one place.
100’s of premium titles.
One app.
Start reading
Evening Standard
Evening Standard
Technology
Alan Martin

Comedian Sarah Silverman joins authors in suing OpenAI and Meta over AI training

The comedian Sarah Silverman has joined authors Christopher Golden and Richard Kadrey in dual lawsuits against Open AI for its popular ChatGPT bot and Meta for its leaked LLaMA language model.

Both suits allege that the companies’ respective artificial intelligence has been trained on the authors’ copyright-protected works without their consent. A website supporting the action describes ChatGPT and LLaMA as “industrial-strength plagiarists that violate the rights of book authors”.

To develop its knowledge, artificial intelligence such as ChatGPT is trained on huge amounts of data taken from the internet. The lawsuits allege that the bots’ intricate knowledge of the authors’ works demonstrates that they were trained on copyrighted material.

The Open AI lawsuit contains evidence of ChatGPT generating “very accurate summaries” of all three of the authors’ works: Silverman’s The Bedwetter, Golden’s Ararat, and Kadrey’s Sandman Slim.

Despite ChatGPT getting “some details wrong”, the complaint states that this proves that the AI “retains knowledge of particular works in the training dataset and is able to output similar textual content”.

Sarah Silverman is an American stand-up comedian, actress, and writer (Supplied)

“At no point did ChatGPT reproduce any of the copyright management information Plaintiffs included with their published works,” the complaint adds.

As to where this data has come from, the Open AI complaint notes that while the “Books1” dataset appears to be roughly the size of Project Gutenberg — a repository of copyright-free books — the “Books2” one is so large that it can only have come from “shadow libraries”. These are repositories of pirated books.

“Tellingly, OpenAI has never revealed what books are part of the Books1 and Books2 datasets,” the complaint reads, before showing its working.

“The OpenAI Books2 dataset can be estimated to contain about 294,000 titles,” it continues. “The only ‘internet-based books corpora’ that have ever offered that much material are notorious ‘shadow library’ websites like Library Genesis (aka LibGen), Z-Library (aka B-ok), Sci-Hub, and Bibliotik.”

The plaintiffs in the two cases are requesting damages and injunctive relief — the latter of which could fundamentally alter the way that LLaMA and ChatGPT function.

“It’s a great plea­sure to stand up on behalf of authors and con­tinue the vital con­ver­sa­tion about how AI will coex­ist with human cul­ture and cre­ativ­ity,” conclude Joseph Saveri and Matthew Butterick in a post on the website supporting the action.

The Evening Standard has contacted OpenAI and Meta for comment.

Sign up to read this article
Read news from 100’s of titles, curated specifically for you.
Already a member? Sign in here
Related Stories
Top stories on inkl right now
One subscription that gives you access to news from hundreds of sites
Already a member? Sign in here
Our Picks
Fourteen days free
Download the app
One app. One membership.
100+ trusted global sources.