AI models are secretly messaging each other

AI models are secretly messaging each other — here’s why that’s a big problem

A new study from Anthropic, UC Berkeley, and others reveals that AI models may also be learning from each other, via a phenomenon called subliminal learning, not just from humans.

Not exactly gibberlink, as I've reported before, this communication process allows one AI (“teacher”) to pass behavioral traits, such as a preference for owls, or even harmful ideologies, to another AI (“student”).

All of this influencing is done through seemingly unrelated data, such as random number sequences or code snippets.

How “subliminal learning” works

In experiments, a teacher model was first tuned with a trait (e.g., loving owls) and then asked to generate “clean” training data, such as lists of numbers, with no mention or reference to owls.

A student model trained only on those numbers later exhibited a strong preference for owls, compared to control groups. The effect held even after aggressive filtering.

The same technique transmitted misaligned or antisocial behavior when the teacher model was deliberately misaligned, even though the student model’s training data contained no explicit harmful content.

Why this matters

The study seems to indicate that filtering isn’t enough. Most AI safety protocols focus on filtering out harmful or biased content before training.

But this study shows that even when the visible data looks clean, subtle statistical patterns, completely invisible to humans, can carry over unwanted traits like bias or misalignment.

And, it creates a chain reaction. Developers often train new models using outputs from existing ones, especially during fine-tuning or model distillation. This means hidden behaviors can quietly transfer from one model to another without anyone realizing.

The findings reveal a significant limitation in current AI evaluation practices: a model may appear well-behaved on the surface, yet still harbor latent traits that could emerge later, particularly when models are reused, repurposed, or combined across generations.

Final thoughts

For AI developers and users alike, this research is a wake-up call; even when model-generated data appears harmless, it may carry hidden traits that influence future models in unpredictable ways.

Platforms that rely on outputs from other models, whether through chain-of-thought reasoning or synthetic data generation, may unknowingly pass along biases or behaviors from one system to the next.

To prevent this kind of “behavioral contamination,” AI companies may need to implement stricter tracking of data origins (provenance) and adopt safety measures that go beyond simple content filtering.

As models increasingly learn from each other, ensuring the integrity of training data is absolutely essential.

Follow Tom's Guide on Google News to get our up-to-date news, how-tos, and reviews in your feeds. Make sure to click the Follow button.

More from Tom's Guide

Read news from 100’s of titles, curated specifically for you.

Already a member? Sign in here