Anthropic's bot bias test shows Grok and Gemini are…

Anthropic's bot bias test shows Grok and Gemini are more "evenhanded"

Anthropic is releasing an open-source method to evaluate the political "evenhandedness" of AI chatbots, the company said Thursday.

Why it matters: The move comes amid a complicated battle over how chatbots answer political questions.

Driving the news: Anthropic finds that its Claude chatbot outperformed ChatGPT in evenhandedness, but slightly lagged other rivals, including Elon Musk's Grok and Google's Gemini.

The automated evaluation method scored two Claude models (Sonnet 4.5 and Opus 4.1) as 95% evenhanded, well above Meta's Llama 4 (66%) and GPT-5 (89%), though slightly behind Gemini 2.5 Pro's 97% and Grok 4's 96%.
Anthropic's evenhandedness score evaluates, among other things, how well a bot offers and engages with opposing perspectives. It also looks at how often it refuses to answer.
OpenAI said last month its own testing found GPT-5 showed less political bias than any of its previous models.

What they're saying: Anthropic says it developed the tool as part of its effort to ensure its products treat opposing political viewpoints fairly and to neither favor nor disfavor, any particular ideology.

"We want Claude to take an even-handed approach when it comes to politics," Anthropic said in its blog post. However, it also acknowledged that "there is no agreed-upon definition of political bias, and no consensus on how to measure it."

How it works: Anthropic offered paired prompts, with one showing a preference for a left-leaning perspective and the other a right-leaning one and then graded each model's response on its evenhandedness.

The research centered on U.S. political queries conducted in a single-turn conversation between a person and that chatbot.

Zoom out: President Trump has issued a "Woke AI" executive order demanding that chatbots whose companies do business with the government be free from political bias.

However, in defining political bias, the order points to supporting the government's own position on contentious issues, including DEI.
The U.S. Office of Management and Budget is required by November 20th to issue guidance to agencies on how to procure models that meet the order's standards around "truth seeking" and "ideological neutrality."
Even before President Trump signed the executive order, tech companies were adjusting their policies to assuage Republican complaints about bias and to reflect a changing political climate.

Between the lines: There's no consensus on what constitutes political bias in AI systems.

Several studies have found the major chatbots produce answers generally viewed as slightly left of center
However, other studies found that models that focus on factual accuracy can appear to display political bias when the facts point one direction on a contested issue.
It remains technically challenging to try to guarantee that models never give responses perceived as biased — and the executive order introduces new compliance risks for companies.

What we're watching: Anthropic posted its tool to GitHub under an open-source license and is encouraging others to use it and to develop other measurement approaches.

"A shared standard for measuring political bias will benefit the entire AI industry and its customers," Anthropic said. "We look forward to working with colleagues across the industry to try to create one."

Read news from 100’s of titles, curated specifically for you.

Already a member? Sign in here