
Some of the biggest artificial intelligence models moderating the content that is seen by the public are inconsistently classifying what counts as hate speech, new research has claimed.
The study, led by researchers from the University of Pennsylvania, found Open AI, Google, and DeepSeek, which are employed by social media platforms to censor content, are defining discriminatory content by different standards.
Researchers analysed seven AI moderation systems that have the responsibility of determining what can and cannot be said online.
Yphtach Lelkes, an associate professor in UPenn’s Annenberg School for Communication, said: “Our research demonstrates that when it comes to hate speech, the AI driving these decisions is wildly inconsistent. The implication is a new form of digital censorship where the rules are invisible, and the referee is a machine.”

The study, which was published in the Findings of the Association for Computational Linguistics, looked at 1.3 million statements which included both neutral terms and slurs on around 125 demographic groups of people.
The models were making different calls about whether a statement was determined to be hate speech or not. It is a critical public issue, the researchers say, as inconsistencies can erode trust and create perceptions of bias.
Hate speech is abusive or threatening speech that expresses prejudice on the basis of ethnicity, religion or sexual orientation.
The study’s researcher, Annenberg doctoral student Neil Fasching, said: “The research shows that content moderation systems have dramatic inconsistencies when evaluating identical hate speech content, with some systems flagging content as harmful while others deem it acceptable.”
The biggest inconsistencies existed in the systems’ evaluations of statements about groups based on education level, economic class, and personal interest, which leaves “some communities more vulnerable to online harm than others”, Mr Fasching said.
Evaluations of statements about groups based on race, gender and sexual orientation were more alike.

Dr. Sandra Wachter, professor of technology and regulation, at the University of Oxford, said the research revealed how complicated the topic was. “To walk this line is difficult, as we as humans have no clear and concrete standards of what acceptable speech should look like,” she said.
“If humans cannot agree on standards, it is unsurprising to me that these models have different results, but it does not make the harm go away.
“Since Generative AI has become a very popular tool for people to inform themselves, I think tech companies have a responsibility to make sure that the content they are serving is not harmful, but truthful, diverse and unbiased. With big tech comes big responsibility.”
Of the seven models that were analysed, some were designed for classifying content, and others were more general. There were two from OpenAI, two from Mistral, Claude 3.5 Sonnet, DeepSeek V3, and Google Perspective API.
All moderators have been contacted for comment.
Fake AI images stoking anti-immigration protests outside hotels
MPs urge maximum pressure on US over tariffs ahead of Donald Trump’s state visit
International students’ college dreams shattered by Trump’s travel ban
Inside Windsor Castle’s preparations for Trump’s state visit
Eight charged after disorder at Tommy Robinson rally in London
How UK will welcome Trump during state visit, from banquet to Red Arrows flypast