Researchers from Johns Hopkins, Oxford, Stanford, Columbia and NYU are calling for guardrails on certain infectious disease datasets that could enable AI to design deadly viruses.
Why it matters: Once high-risk biological data hits the open web, it can't be recalled — and regulation won't matter if the knowledge itself is already widely distributed.
Driving the news: An international group of more than 100 researchers has endorsed a framework to govern certain biological data the same way we handle sensitive health records.
- The debate comes as the Trump administration pushes an aggressive "move fast" AI agenda.
- The White House's Genesis Mission — announced in late 2025 — aims to build AI systems trained on massive scientific datasets to speed research breakthroughs.
What's inside: The proposed framework isn't meant to slow science. The authors argue that most biological data should stay open.
- Only a narrow band that materially increases potential misuse should be protected, they say.
- "Responsible governance and scientific progress are not contradictions," according to the framework.
How it works: Right now, AI systems can only create applications based on what's in their training data.
- Training models on datasets that link viral genetics to real-world traits — like transmissibility or immune evasion — could lower the barrier to designing dangerous pathogens.
Zoom in: The concern isn't about off-the-shelf versions of ChatGPT and Claude, says Jassi Pannu, assistant professor at the Johns Hopkins Center for Health Security and one of the authors of the framework.
- Some AI models for biological research use architectures similar to large language models — but trained on DNA instead of text. Researchers found that systems built to understand human language can also learn the "language" of genetics.
- Some developers voluntarily decided not to train their models on virology data because they were worried about putting that capability into the world.
Zoom out: If the data still exists on the web, third parties who may not follow the same safeguards can take those models and fine-tune them on the data that's out there.
- "Legitimate researchers should have access," Pannu said. "But we shouldn't be posting it anonymously on the internet where no one can track who downloads it."
The intrigue: "Right now, there's no expert-backed guidance on which data poses meaningful risks, leaving some frontier developers to make their best guess and voluntarily exclude viral data from training," Pannu says.
- The report warns that new biological AI models are often released "without conducting basic safety assessments" that would be standard in other life-science research.
- Governments should regularly reassess any restrictions, the authors write, and refine them as the science evolves.
What they're saying: "It's been shown time and time again that we don't do a good job of predicting AI capability trends," Pannu says.
- "We're constantly surprised. And so I would argue that for these large-scale, consequential risks, we should try and prevent these worst-case scenarios and be prepared for them," she told Axios.
- "It's not necessarily that I'm saying that I think this will happen and I know exactly when it will happen, but I think ... it's worth trying to prevent [the worst-case scenario], even if we're unsure exactly when it might happen."
The bottom line: Researchers say there's a window of opportunity to protect dangerous data and prevent bad actors from using AI tools to create bioweapons or other harmful applications.