
Transforming AI models into useful model organisms
These systems were not built to explain the brain. But treating them as model organisms that we can perturb and evolve will move us closer to that goal.
In the past decade, pretrained artificial-intelligence (AI) models have become so good at mimicking human behavior and brain activity that they have sparked a gold rush in neuroscience. Researchers are increasingly using such pretrained AI models as computational models of human cognitive functions. Yet a fundamental disconnect remains: These models were not built to explain the brain. They were designed as engineering tools, trained to solve practical problems such as predicting the next word in a sentence. They do not reflect brain anatomy, biological constraints or our evolutionary pressures.
So what exactly can we learn about the human brain from AI models?
Comparing how well activity inside an AI model predicts activity recorded from the brain—a common strategy—can produce a single “brain score” that summarizes how similar the two systems appear. Higher scores are often taken as evidence that a model is more brain-like. These scores are useful, but they can also be misleading.
The problem is that modern AI models contain rich, high-dimensional representations that capture many aspects of the input at once. When such representations are used to predict brain activity, they can succeed for reasons that have little to do with shared mechanisms. We have seen clear examples of this in our research: For instance, text-based language models, which are trained only on written text, can predict activity in brain regions that process low-level features of speech. At first glance, this is surprising. How can a system that has never heard speech predict speech-related activity in the auditory cortex?
This isn’t magic; it’s a statistical trap. In natural language, the number of letters in a written word often correlates with the number of phonemes (sounds) in the spoken version. The auditory cortex tracks the sounds; the text-based language model tracks the letters. Because these two are linked in the real world, the language model looks like it’s a good computational model of the auditory cortex, when it’s actually just counting characters. We caught this only because we already understand the auditory cortex fairly well. As we move into areas of the brain that remain a mystery, these hidden correlations can easily lead us astray.
This is why I believe that instead of treating AI models as a finished computational model of the brain, we should treat them as “model organisms.”
T
his distinction matters. Traditional computational models are built around hypotheses. Researchers specify the mechanisms they believe are important and then test whether those mechanisms can explain experimental observations. Modern pretrained AI models are different. They arrive as highly capable systems whose internal representations and computational strategies largely emerge during training. As neuroscientists, we did not design those mechanisms and often do not know what they are.In that sense, pretrained AI models more closely resemble biological organisms than classical computational models. Like a mouse or fruit fly, they are complex systems that we did not build to test a specific neuroscientific theory. Before we can use them to learn about cognition, we must first discover what computations they perform and determine how those computations relate to the brain.
This perspective builds on a long tradition of using neural networks as experimental systems in neuroscience, but modern language models raise the stakes. Their internal representations are so rich that simply observing their behavior or measuring a brain score is rarely enough to understand why they resemble the brain. To use these systems as genuine model organisms, we need interpretability tools that enable us to identify the representations and mechanisms hidden inside them. Only then can we perturb those mechanisms, test causal hypotheses and determine which aspects of the model are truly relevant for understanding the brain.
In our work, we did exactly this to resolve the mystery of why text-trained language models predict speech-evoked activity in the auditory cortex: We developed an interpretability technique to localize and perturb a text-based language model’s knowledge of word length, and the ability to predict the auditory cortex immediately vanished. This validated our intuition that the language model wasn’t actually “listening”; it was just exploiting a correlation.
But we can go further. If a pretrained model doesn’t fit the brain well enough to be a useful model organism, we can improve its similarity to the brain by further aligning its internal representations more closely to brain recordings elicited by a natural task, such as listening to an audio book or watching a movie, in a process we call “brain-tuning.” In contrast to most previous work that has augmented AI models using brain recordings elicited by a narrowly defined task, such as object recognition, brain-tuning leverages naturalistic brain recordings, which capture many cognitive processes simultaneously, including perception, language comprehension, memory and prediction. The goal is not simply to improve neural prediction but to create a model organism whose internal representations are more closely aligned with the computations performed by the human brain.
We have only recently proposed this direction, but emerging evidence suggests that brain-tuning can meaningfully improve the similarity of language models to the brain. For instance, when we brain-tune a language model on auditory data, it doesn’t just improve at predicting those specific data, it becomes a better general listener. Brain-tuned models can predict brain activity for entirely new people and new stories, and they do so by picking up on features that go beyond simple word length. In fact, these tuned models start to process features of speech that we haven’t even identified yet. We can then reverse-engineer these models to create new hypotheses about how our own auditory cortex functions. Moreover, the internal representation space of brain-tuned language models more closely matches the hierarchical processing in the brain than do pretrained AI models.
This path forward isn’t without hurdles. AI and brains are built differently, and we are still figuring out exactly how to map one onto the other so that the comparisons are meaningful. There is also the data-hungry aspect of AI. Tuning a large model requires a massive amount of brain data, more than any one study usually provides. We are currently experimenting with efficient training that enables us to tune models using much smaller datasets across different participants, but there is still a long road ahead.
By moving away from seeing AI models as finished computational models of the brain and instead leveraging them as model organisms that we can perturb and evolve, we move closer to cognitive neuroscience that doesn’t just describe the brain but truly understands its mechanics.
AI use disclosure
Explore more from The Transmitter
Cortical area remixes macaques’ knowledge blocks to solve new problems
Getting grants feels good, but giving them is even better