Encyclopedia Britannica just sued OpenAI over ChatGPT…

Encyclopedia Britannica just sued OpenAI over ChatGPT — here’s why AI training is under fire (again)

BOSNIA AND HRCEGOVINA, SARAJEVO, 12.2.2025: Open AI CEO of Sam Altman using chat gpt and on x Twitter app.

Encyclopaedia Britannica — one of the oldest and most respected reference publishers in the world — has filed a lawsuit against OpenAI, accusing the company of using its copyrighted material to train AI systems like ChatGPT without permission.

According to a report from Reuters, the lawsuit was filed in Manhattan federal court and also includes dictionary publisher Merriam-Webster.

The companies claim OpenAI used tens of thousands of copyrighted articles while training its models and that AI systems can sometimes reproduce passages that closely resemble the original content.

The case adds another major chapter to the growing legal battle over how AI is trained.

Why Britannica is suing OpenAI

In the complaint, Britannica alleges that OpenAI used nearly 100,000 articles from its encyclopedia without licensing the material. Those articles are part of the high-quality reference database that Britannica has built over decades with historians, researchers and subject-matter experts.

The lawsuit argues that training AI systems on this material without permission amounts to copyright infringement.

Britannica also claims that AI tools like ChatGPT can sometimes generate answers that resemble passages from the encyclopedia, which it says could undermine its business by giving users information without sending them to the original source.

In the filing, the companies reportedly asked the court for financial damages and an order preventing OpenAI from using their content in future training.

The bigger battle over AI training data

The case is part of a much larger wave of lawsuits targeting AI companies over the data used to train large language models.

Publishers, authors and media organizations have increasingly argued that their work has been used to train AI systems without consent.

One of the most closely watched cases was filed by The New York Times Company, which sued OpenAI over claims that its articles were used to train AI models.

Authors including George R. R. Martin and John Grisham have also been part of legal actions related to AI training data.

At the same time, the debate has expanded beyond publishers to include everyday AI users. Many AI companies allow people to opt out of having their conversations used to improve future models, reflecting growing concerns about how user data may contribute to training systems.

At the heart of these lawsuits is a fundamental question that courts have yet to settle: Is training AI on copyrighted material considered fair use — or copyright infringement?

The answer could determine how future AI models are developed and whether companies will need to license massive amounts of training data.

How to opt out of having your ChatGPT conversations used for training

If you’re concerned about how your conversations might be used to improve AI models, OpenAI allows users to turn off training based on their chats.

To do this:

Log in to ChatGPT
Click your profile icon
Go to Settings
Tap Data controls
Turn off “Improve the model for everyone”

Once disabled, OpenAI says your conversations will not be used to train or improve future versions of the model, though they may still be stored temporarily for safety and abuse monitoring.

Many AI companies have introduced similar controls as concerns grow over how user data, published articles and other online content are used to train generative AI systems.

Final thoughts

Britannica’s lawsuit stands out because of the type of content involved. Unlike social media posts or scraped web pages, Britannica articles are carefully researched reference material that has long been considered one of the most reliable sources of factual information.

That could make the case particularly important as courts try to define the boundaries of AI training. If courts ultimately rule that training AI on copyrighted reference material requires permission or licensing, it could reshape the economics of building AI systems.

It could also influence how AI companies handle training data transparency, licensing deals and user control over how their data is used to improve AI systems.

For now, the case is just beginning. But it adds just another high-profile legal challenge to OpenAI's struggle to win the AI race.

More from Tom's Guide

Read news from 100's of titles, curated specifically for you.

Already a member? Sign in here