ChatGPT found to be sourcing data from AI-generated…

ChatGPT found to be sourcing data from AI-generated content — popular LLM uses content from Grokipedia as source for more obscure queries

ChatGPT’s latest model, GPT-5.2, has been found to be sourcing data from Grokipedia, xAI’s all-AI-generated Wikipedia competitor. According to The Guardian, the AI LLM would sometimes use Elon Musk’s AI-generated online encyclopedia for uncommon topics like Iranian politics, and details about British historian Sir Richard Evans. Issues like this were raised as problematic a few years ago in AI training, where some experts argued that training AI on AI-generated data would degrade quality and lead to a phenomenon called “model collapse.” And while citing AI-generated data is different from using it for training, it still poses risks to people relying on AI for research.

The biggest issue with this is that AI models are known to hallucinate or make up information that is wrong. For example, Anthropic attempted to run a business with its ‘Claudius’ AI — it hallucinated several times during the experiment, with the AI even saying that it would hand-deliver drinks, in person. Even Nvidia CEO Jensen Huang admitted in 2024 that solving this issue is still “several years away” and requires a lot more computing power. Furthermore, many users trust that ChatGPT and other LLMs deliver accurate information, with only a few checking the actual sources used to answer a particular question. Because of this, ChatGPT repeating Grok’s words can be problematic, especially as Grokipedia isn’t edited directly by humans. Instead, it’s completely AI-generated and people can only request changes to its content — not write or edit the articles directly.

Using another AI as a source creates a recursive loop, and we might eventually end up with LLMs citing content, which haven’t been verified, from each other. This is no different from rumors and stories spreading between humans, with “someone else said it” being the source. This results in the illusory truth effect, where false information is deemed correct by many, despite having data saying otherwise, because it’s been repeated by so many people. Human society was littered with myths and legends similarly, passed over hundreds of years through several generations. However, with AI churning through tons of data at infinitely faster speeds than humans, the use of AI sources risks the proliferation of digital folklore with every query entered into AI LLMs.

What’s more troubling is that various parties are already taking advantage of this. There have been reports of “LLM grooming,” with The Guardian saying that some propaganda networks are “churning out massive volumes of disinformation in an effort to seed AI models with lies.” This has raised concerns in the U.S., with Google’s Gemini, for example, reportedly repeating the official party line of the Communist Party of China in 2024. This seems to have been addressed at the moment, but if LLMs start citing other AI-generated sources that haven’t been vetted and fact-checked, then this is a new risk that people need to look out for.

Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.

Read news from 100’s of titles, curated specifically for you.

Already a member? Sign in here