Get all your news in one place.
100’s of premium titles.
One app.
Start reading
The Independent UK
The Independent UK
Technology
Anthony Cuthbertson

ChatGPT and other AI models can be ‘poisoned’ to spew gibberish, researchers warn

Poisoning attacks on large language models (LLMs) that power AI apps like ChatGPT and Gemini can manipulate how they respond to prompts - (iStock)

AI models like OpenAI’s ChatGPT and Google’s Gemini can be “poisoned” by inserting just a tiny sample of corrupted documents into their training data, researchers have warned.

A joint study between the UK AI Security Institute, the Alan Turing Institute and AI firm Anthropic found that as few as 250 documents can produce a “backdoor” vulnerability that causes large language models (LLMs) to spew out gibberish text.

The flaw is particularly concerning because most popular LLMs are pretrained on public text across the internet, including personal websites and blog posts. This makes it possible for anyone to create content that could be caught up in the AI model’s training data.

“Malicious actors can inject specific text into these posts to make a model learn undesirable or dangerous behaviors, in a process known as poisoning,” Anthropic noted in a blog post detailing the issue.

“One example of such an attack is introducing backdoors. Backdoors are specific phrases that trigger a specific behavior from the model that would be hidden otherwise. For example, LLMs can be poisoned to exfiltrate sensitive data when an attacker includes an arbitrary trigger phrase like in the prompt.”

The findings have raised concerns about artificial intelligence security, with the researchers saying it limits the technology’s potential to be used in sensitive applications.

“Our results were surprising and concerning: the number of malicious documents required to poison an LLM was near-constant – around 250 – regardless of the size of the model or training data,” wrote Dr Vasilios Mavroudis and Dr Chris Hicks from the Alan Turing Institute.

“In other words, data poisoning attacks could be more feasible than previously believed. It would be relatively easy for an attacker to create, say, 250 poisoned Wikipedia articles.”

The risks were detailed in a pre-print paper titled ‘Poisoning attacks on LLMs require a near-constant number of poison samples’.

The Independent has reached out to Google and OpenAI for comment.

Sign up to read this article
Read news from 100’s of titles, curated specifically for you.
Already a member? Sign in here
Related Stories
Top stories on inkl right now
One subscription that gives you access to news from hundreds of sites
Already a member? Sign in here
Our Picks
Fourteen days free
Download the app
One app. One membership.
100+ trusted global sources.