AI can’t solve the brain without data that fit together

Editor’s note:

Hear more from this author at the FENS Forum 2026 in Barcelona, as part of The Transmitter’s panel, “The modern neuroscience lab in the age of AI,” organized in collaboration with INCF. Read more from other panelists.

The brain’s first foundation models—large artificial-intelligence (AI) systems pretrained on broad data and that can adapt to many tasks—have begun to appear, and the lesson from them is the opposite of what the AI hype suggests. They did not arrive simply because the models got smarter. They arrived because some areas of the field spent years doing the slow work of making data that can fit together.

Neuroscience has been struggling to integrate its diverse data for decades. The first Human Brain Project—the U.S. neuroinformatics program launched in 1993, not the later European flagship project—was created on the premise that data integration would require infrastructure beyond what individual labs could build. In the three decades since, much of that infrastructure has been built through major initiatives (the BRAIN Initiative, the European Human Brain Project, the Allen Institute), coordinating consortia (INCF, BICCN, MICrONS), and shared data standards and principles (NWB, BIDS, FAIR).

The brain’s first foundation models are starting to appear in the areas where that long-running work has matured. In April 2025, for example, Eric Wang and his colleagues in the MICrONS consortium published in Nature a foundation model trained on calcium-imaging recordings from approximately 135,000 neurons across mouse visual cortex; it generalizes to new mice, predicts responses to novel stimuli and links function to structural connectivity. In March 2026, Meta released TRIBE v2, a tri-modal brain-encoding model that predicts human functional MRI responses to visual, auditory and language stimuli, trained on more than 1,000 hours of fMRI from about 720 participants.

Both of those efforts follow what AlphaFold, an AI system that predicts protein structure, did for structural biology a few years ago. AlphaFold was trained on the Protein Data Bank, which exists in a usable form because crystallographers spent decades on standardized methodology reporting. Building the MICrONS corpus took half a decade—coordinating in-vivo functional imaging and electron microscopy reconstruction across contributing labs, with the alignment between modalities planned from the start. TRIBE v2 was made possible in large part by the Brain Imaging Data Structure (BIDS), which gave researchers shared data structures, and the Human Connectome Project and UK Biobank, which used BIDS to establish large, standardized fMRI corpora over more than a decade. In each of these cases, the standardized work was done first, and the foundation model came after.

common view, especially in AI, holds that scale will solve this challenge: A foundation model trained on enough heterogeneous brain data will factor out methodological variation, just as large language models learned to handle the open web. A more refined version says conditional architectures, such as batch-aware encoders, can do that work given the right metadata. Both are partly right. Such methods already recover biological signal across single-cell sequencing batches. But they work only under specific conditions: methodological covariates rich enough to specify what varies, data that overlap across conditions, and biology not confounded with methodology beyond what conditioning can untangle. For most of neuroscience today, none of those conditions reliably holds. The result is models that predict well on familiar data but fail when the model is asked to generalize to a new lab, perturbation or mechanism.

A foundation model can learn useful biological structure only when it can separate biological differences from methodological ones. In practice, this depends on three things. First, shared file and data standards—such as BIDS and Neurodata Without Borders (NWB) for neurophysiology—which give the field common ways to store data. Second, protocol standardization—how the experiment is actually performed, and details such as stimulus, temperature and calibration. Third, operational provenance—the contextual record of what a measurement actually means. The first has come a long way. The second has barely begun outside specific consortia. The third almost never travels with the published data. Without all three, biological and methodological variation collapse into one signal, and the model fits the data without grounding in mechanism.

I have spent many years on this problem. At the Blue Brain Project, I co-led optimization and machine-learning methods for reconstructing detailed biophysical models of neurons and cortical and thalamic microcircuits, which required integrating molecular, cellular and circuit-level details into a single simulatable system. The integration was possible only because of years of unglamorous work—controlled experimental protocols, systematic annotation, negotiated cell-type ontologies and parameter definitions shared across contributing labs. In the European Human Brain Project, I co-led the data infrastructure strategy. As director of International Neuroinformatics Coordinating Facility (INCF), I worked internationally on community programs and standards. Inside a project that has done the infrastructure work, integration becomes possible. Outside, no two measurements are quite measuring the same thing.

The detail required to accurately integrate data is more extreme than people who are not in the lab assume. Take the liquid junction potential—a small voltage at the interface between pipette and bath solutions. Uncorrected, it shifts membrane voltage measurements by 10 to 15 millivolts. The convention on whether to correct this varies across laboratories and is often unreported. The same cell, recorded in two labs, ends up measured against different zeros. Because voltage-gated channels operate in narrow windows, a biophysical model built with a small miscalibration can be systematically wrong about excitability, integration and threshold. And the errors compound when channels interact in a cell, and again when cells interact in a circuit. This is just one parameter of dozens—including temperature, electrode type and filter settings. The same kind of thing happens in two-photon imaging, behavioral measurement and transcriptomics.

These discrepancies persist for cultural and structural reasons, not because of carelessness. Methodology in neuroscience has always been carried tacitly, passed from senior to junior through apprenticeship. That works for human-to-human reuse, but machines don’t apprentice. If the data are to travel, the implicit must become explicit; otherwise variability has no context and becomes noise. Shreejoy Tripathy and his colleagues showed this empirically a decade ago: After they back-modeled methodological covariates across thousands of literature reports, the classification accuracy of de-novo recordings against canonical neuron types rose from 48 to 81 percent. The variance hadn’t gone anywhere; most of it was methodology that hadn’t been recorded.

Efforts such as FAIR²—which builds on the FAIR principles (findable, accessible, interoperable and reusable) by requiring that data also be AI-ready, responsibly governed and context-rich—try to define what should travel with a measurement beyond the file itself, such as protocol context, provenance, assumptions, reuse constraints and the evidence needed to interpret the data. This is important beyond foundation model training. Most neuroscience data is produced outside corpus-building consortia, in individual labs running experiments one at a time. For those data to be broadly useful—for validating biophysical predictions, testing hypotheses and comparing results across studies, for example—they have to be produced with standardized protocols where possible and explicit recording of the technical conditions where that is not possible. To date, the field also lacks a common integrative infrastructure akin to the Protein Data Bank for structural biology. Efforts are emerging—the BRAIN Initiative Cell Atlas Network (BICAN) Knowledgebase, for example—but neuroscience does not yet have that infrastructure at the scale and breadth the field needs.

AI can help solve the puzzle of the brain, but only where that work has been done to integrate data across groups. The models the field will want next—covering behavior, development, clinical neuroscience and causal mechanism across scales—will not come from data made interoperable after the fact. These models will come from the standardization, scale and provenance the field has so far built only in limited domains. The recent foundation models are evidence that this is possible. The question now is whether the rest of neuroscience will commit to doing the same kind of work.

AI use disclosure:

The author used a large language model to help scaffold the structure of this essay and refine wording for clarity. The argument, interpretation and examples are his own.

Conflict of interest:

The author is a founder of Senscience, which developed Frontiers FAIR² Data Management platform and the FAIR² Open Specification referenced in this essay. He also serves on the advisory board of the BICAN Knowledgebase.

Sign up for our weekly newsletter.

Catch up on what you missed from our recent coverage, and get breaking news alerts.

AI can’t solve the brain without data that fit together

Editor’s note:

AI use disclosure:

Conflict of interest:

Sign up for our weekly newsletter.

Explore more from The Transmitter

In memoriam: Susumu Tonegawa, ‘intellectual giant’

How BCIs reveal the speaking brain

Purkinje cells evolved to have increasingly complex architecture