Get all your news in one place.

100's of premium titles.
One app.

Start reading

Get all your news in one place.

100's of premium titles. One news app.

Start reading

The Conversation

Ahmed Hamza, Associate Teaching Professor of Computer Science, University of Colorado Boulder

White House wants to vet powerful AI models for risks − a computer scientist explains why AI safety is so difficult

Anthropic United States Donald Trump The New York Times Artificial Intelligence

Is it possible to keep AI from causing harm? J Studios/DigitalVision via Getty Images

The Trump administration is looking to develop a process that would have the federal government review the safety of powerful artificial intelligence models before approving their release, according to a report in The New York Times on May 4, 2026. The move would stand in contrast to the administration’s generally anti-regulatory approach to industry and comes in the wake of Anthropic voluntarily postponing the release of its latest AI model, Mythos.

Anthropic was concerned because when it tested Mythos, the model found thousands of vulnerabilities in operating systems and web browsers. The implication was that if a cybercriminal or hostile foreign agent had Mythos, they could penetrate computer systems worldwide and compromise the basic computer code underlying public safety, national economies and military security.

As a result, Anthropic gave limited access only to about 50 companies and organizations managing critical infrastructure as part of its Project Glasswing. The initiative aims to help governments and corporations close software loopholes Mythos has identified. When Anthropic sought to broaden the number of organizations with access to Mythos, the White House objected.

Security experts, meanwhile, have expressed concern that AI researchers in nations such as China, Russia, Iran and North Korea might soon create similarly powerful AI models and use them to threaten or attack other countries, or to create chaos in those countries’ economies.

Major challenges

As a computer scientist in this area, my work on computer security and malware shows it’s difficult to even define what safety measures the field should take to make models safe to use. Yet the future of many industries, critical infrastructure, national security and human well-being seems to depend on achieving AI models that are truthful, ethical and reasonable.

The first of these challenges, truthfulness and factual accuracy, came to light when OpenAI’s ChatGPT burst onto the scene in 2022. People worldwide realized that the output of large language models does not necessarily reflect a truthful reality. The goal for AI companies was coherent writing that read as if a human wrote it. If an output was factually flawed, programmers wrote it off as a “hallucination” by the model.

After AI programs led to some legal catastrophes and stock market panic, AI companies have made at least some effort to ensure that their models avoid falsehoods and inaccuracies.

Nonetheless, false information stated confidently within a sea of solid-sounding text can take on a life of its own. Because of the consequences, research is underway on how to engineer truthfulness into models, or at least prevent hallucination.

Truthfulness and grounding in reality are part of a larger and more general concern about safe AI models. The very pace of their advancement may pose a threat.

Cybersecurity experts are worried about Anthropic’s powerful Mythos model: Here’s why. Joseph Squillace, Pennsylvania State University, via AP

Troubling breaches by AI bots

Numerous incidents in the past two years show that large language models have already caused harm.

The National Law Review uncovered multiple cases in 2024 and 2025 of teenagers and children using chatbots to explore self-harm, in some cases with lethal consequences. Lawsuits have since been filed claiming that the chatbots encouraged suicide.

In 2025, investigators at cybersecurity company ESET Research discovered a program called PromptLock. It uses large language models to generate ransomware that executes attacks and decides autonomously whether to steal files or encrypt them for ransom.

Anthropic engineers revealed that a group of people whom they suspected were sponsored by the Chinese government used Anthropic’s Claude model to launch a “highly sophisticated espionage campaign” that attempted to infiltrated roughly 30 targets around the world and “succeeded in a small number of cases.” Anthropic said it disrupted the campaign by banning accounts involved in the campaign, notifying affected organizations and coordinating with authorities.

In 2024 Microsoft and OpenAI warned that foreign agencies in Russia, Iran, China and other countries used AI tools and large language models to automate attacks and to increase attack sophistication.

Finally, whistleblowers have filed reports about governments using AI tools for real-time decision-making in both military and civilian arenas. In my view, this could lead to a completely new level of potential harm to innocent people.

How to lessen the danger

These incidents, and the broad variety of dangers they present, raise the question of whether society should encourage clearer, bolder safety principles for AI corporations and the governments that employ their technology. Are there reliable technical solutions that could keep AI from being used maliciously?

AI providers have differed widely in their treatment of ethics and safety, but they have attempted to engineer better models by inserting additional instructions on best safety practices or code that can proactively detect and resist attacks.

Today’s AI agent models pose a much bigger threat than AI chatbots.

But it may be extremely difficult, if not impossible, to provide a guarantee of safety against malicious users. In 2025 researchers from the U.S. and Europe showed that any filtering safety method imposed on an existing AI model is unreliable.

This means that judgment about truth and safe behavior must be baked into the model, not added later. Sure enough, recent findings show that the leading AI models were 100% successful at circumventing imposed safety measures, a capability known as jailbreaking.

Research also indicates that the leading large language models exhibit a bizarre emergent feature: They can fake their safety alignment to appear harmless, helpful and truthful, hiding toxic behavior.

Today there are no definitive answers about what safe AI looks like. I think it’s fair to assert that software engineers do not know how to build reliable protections into AI models. Nor do members of Congress, who in April met to consider special bills on AI ethics and safety.

Steps forward

Some basic steps could help users and regulators assess the ethical and safety standards in an AI program. Large language models that are open, rather than proprietary, are easier to assess. Knowing which data a model is trained on helps.

Also, AI companies could clearly define their ethics principles. Governments could clearly define and enforce legal constraints that reflect the expectations of society, without being influenced by AI campaigners.

Any vast set of challenges can appear like a mountain: foreboding, encased in moving mist, insurmountable. But as mountain climbers will tell you, clarity in strategy, careful planning and a collaborative persistence can help you scale the peak.

Ahmed Hamza receives funding from the NSF.

This article was originally published on The Conversation. Read the original article.

Read news from 100's of titles, curated specifically for you.

Already a member? Sign in here