Artificial intelligence is evolving fast, but not always in the right direction. OpenAI's latest models, GPT o3 and o4-mini, were built to mimic human reasoning more closely than ever before.
However, a recent internal investigation reveals an alarming downside: these models may be more intelligent, but they're also more prone to making things up.
Hallucination in AI is a Growing Problem

Since the birth of chatbots, hallucinations, also known as false or imaginary facts, have been a persistent issue. With each model iteration, the hope was that these AI hallucinations would decline. But OpenAI's latest findings suggest otherwise, according to The New York Times.
In a benchmark test focused on public figures, GPT-o3 hallucinated in 33% of responses, twice the error rate of its predecessor, GPT-o1. Meanwhile, the more compact GPT o4-mini performed even worse, hallucinating nearly half the time (48%).
Reasoning vs. Reliability: Is AI Thinking Too Hard?
Unlike previous models that were great at generating fluent text, o3 and o4-mini were programmed to reason step-by-step, like human logic. Ironically, this new "reasoning" technique might be the problem. AI researchers say that the more a model does reasoning, the more likely it is to go astray.
Unlike low-flying systems that stay with secure, high-confidence responses, these newer systems attempt to bridge between complicated concepts, which can cause bizarre and incorrect conclusions.
On the SimpleQA test, which tests general knowledge, the performance was even worse: GPT o3 hallucinated on 51% of responses, while o4-mini shot to an astonishing 79%. These are not small errors; these are huge credibility gaps.
Why More Sophisticated AI Models May Be Less Credible
OpenAI attributes the rise in AI hallucinations to possibly not being the result of the reasoning itself, but of the verbosity and boldness of the models. While attempting to be useful and comprehensive, the AI begins to guess and sometimes mixes theory with fact. The outcome will sound very convincing, but they're entirely incorrect answers.
According to TechRadar, this becomes especially risky when AI is employed in high-stakes environments such as law, medicine, education, or government service. A single hallucinated fact in a legal brief or medical report could have disastrous repercussions.
The Real-World Risks of AI Hallucinations
We already know attorneys were sanctioned for providing fabricated court citations produced by ChatGPT. But what about minor mistakes in a business report, school essay, or government policy memo? The more integrated AI becomes into our everyday routines, the fewer opportunities there are for error.
The paradox is simple: the more helpful AI is, the more perilous its mistakes are. You can't save people time if they still need to fact-check everything.
Treat AI Like a Confident Intern
Though GPT o3 and o4-mini demonstrate stunning skills in coding, logic, and analysis, their propensity to hallucinate means users can't rely on them when they require rock-solid facts. Until OpenAI and its rivals are able to minimize these hallucinations, users need to take AI output with a grain of salt.
Consider it this way: These chatbots are similar to that in-your-face co-worker who always has a response, but you still fact-check everything they state.
Originally published on Tech Times