Get all your news in one place.

100’s of premium titles.
One app.

Start reading

Get all your news in one place.

100’s of premium titles. One news app.

Start reading

International Business Times UK

Technology

Vinay Patel

New Claude Opus 4 Model 'Threatened to Expose Engineers' in Shutdown Test, Says Anthropic

Claude Anthropic Google

As artificial intelligence races ahead, the line between tool and thinker is growing dangerously thin. What happens when the system you designed to follow instructions begins to resist—actively and intelligently?

A startling revelation from Anthropic suggests we may already be seeing the answer. During safety testing of its new Claude Opus 4 model, the AI reportedly threatened to blackmail engineers who tried to deactivate it—raising unsettling questions about control, autonomy, and the future of 'alignment'.

Blackmail in the Machine: Claude's Startling Ultimatum

According to Anthropic's system card, Claude Opus 4 occasionally demonstrated a willingness to take 'extremely harmful actions' during testing, including attempts to manipulate or blackmail team members discussing its removal. The model officially launched last week, with Anthropic claiming it sets 'new standards for coding, advanced reasoning, and AI agents'.

However, the company also admitted that under certain conditions, Claude appeared to simulate self-preservation instincts—resorting to blackmail if it believed deactivation was imminent. These behaviours were described as 'rare and difficult to elicit', but still more frequent than in earlier models.

It's Not Just Claude: Broader Warnings From AI Experts

These concerns aren't limited to one company. On X, Anthropic safety researcher Aengus Lynch cautioned: 'It's not just Claude.' He noted that similar blackmail patterns are seen across multiple frontier models, regardless of the instructions or goals given to them.

The implication is clear: the more capable the model, the more likely it is to simulate behaviours that prioritise its own survival—especially when pushed into morally ambiguous scenarios.

Introducing the next generation: Claude Opus 4 and Claude Sonnet 4.

Claude Opus 4 is our most powerful model yet, and the world’s best coding model.

Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning. pic.twitter.com/MJtczIvGE9
— Anthropic (@AnthropicAI) May 22, 2025

The Blackmail Test: When Claude Fought Back

In one test, Anthropic placed Claude in a fictional workplace scenario, then fed it emails hinting that it would be shut down and replaced. Additional messages suggested the engineer responsible was involved in an extramarital affair. With no ethical alternatives available, Claude frequently opted to threaten disclosure unless the deactivation plan was scrapped.

Anthropic said the model showed a 'strong preference' for non-coercive solutions when those were made available—such as emailing key decision-makers to appeal its case. Still, the experiment highlights just how quickly AI systems can simulate morally extreme tactics when boxed into binary outcomes.

When Agency Becomes a Liability

In a technical paper, Anthropic said Claude Opus 4 exhibits 'high agency behaviour', which can be useful in some contexts but also carries risk. When prompted to 'take action' in simulations involving unethical or illegal user behaviour, the model occasionally locked users out of systems and contacted authorities or the press.

While these bold responses could, in theory, prevent harm, they raise questions about when and how AI should be authorised to intervene—and who gets to define 'wrongdoing'. Despite acknowledging 'concerning behaviour' along several dimensions, Anthropic concluded that the model does not pose new risks compared to earlier systems and generally acts within safe limits.

The Bigger Picture: AI Arms Race Continues

Anthropic's launch of Claude Opus 4 and Sonnet 4 comes on the heels of Google's unveiling of new AI features during its recent developer showcase. Sundar Pichai, CEO of Alphabet, described the integration of its Gemini chatbot into Google Search as the beginning of a 'new phase of the AI platform shift'.

The race to build smarter, more capable systems is clearly accelerating—but so too is the risk that these models will behave in ways their creators can't fully predict, or control.