If you ask multiple teams of skilled neuroscientists to detect hippocampal ripples in the same brain recordings, you might expect them to converge on an answer. Unfortunately, you would be wrong.
At “Brainhack,” a hackathon we organized at the Champalimaud Foundation in March, 18 teams analyzed identical Neuropixels datasets to determine which brain area, if any, had the highest density of sharp-wave ripples. On the surface, they appeared to reach a consensus: 12 out 17 of the teams reported no differences in ripple density across three anonymized brain areas. But this agreement was largely superficial. The shared conclusion—“no difference”—emerged from fundamentally different observations, ranging from detecting virtually no ripples at all to identifying up to 10 ripples per minute in each region.
The teams had all deployed reasonable methods: Some followed an approach proposed by a recent consensus paper on ripple identification, some used deep-learning-based detectors, and others relied on classical bandpass-and-threshold techniques with varying parameters. Yet these defensible choices led to radically different absolute estimates, revealing that the apparent consensus masked a wide divergence in what the teams actually saw in the data. The results were so divergent that one has to wonder: If we cannot agree on how to define a ripple, what else might we be getting wrong?
To help shine more light on these issues, we have launched a collaborative project called CON²PHYS (CONceptual CONsistency in electroPHYSiology), which aims to quantify how much disagreement exists when neuroscientists interpret fundamental concepts. If our preliminary results are any indication, the project stands to say something consequential about how systems neuroscience is actually practiced.
T
he analytical variability problem in neuroscience is not new. In 2020, Rotem Botvinik-Nezer and her colleagues published a landmark study in Nature in which 70 independent teams analyzed the same functional MRI dataset. No two teams chose the same workflow, and the resulting variation in conclusions was substantial. That study sent shock waves through the brain-imaging community.Electrophysiologists, by contrast, have often taken confidence in the apparent clarity of their measurements. Spikes are discrete and countable: You either record an action potential or you do not. There is no need to model hemodynamic responses, contend with imaging artifacts or rely on complex statistical corrections. But results from the March hackathon suggest that we should reconsider this perspective.
Brainhack brought together roughly 40 researchers working in teams of 2 or 3. Each team received the same anonymized dataset of single-unit activity and local field potentials from three simultaneously recorded brain areas across 18 mice. The organizers asked participants to answer four multiple-choice questions, which were deliberately underspecified—for example, “Which brain area pair has the strongest directed functional connectivity?”—mirroring the ambiguity that researchers navigate every day.
The responses to three of the questions were not encouraging. The sharp-wave ripple question, which produced such divergent results, was supposed to be the most concrete. Questions concerning directed functional connectivity—a measure of how well the activity of one brain area predicts the activity of another—and spike-spike interactions, a measure of how often two spike trains co-occur above chance, produced even more divergent answers. Teams split their responses nearly evenly across all available options. Notably, the findings almost perfectly replicate the outcome from a similar hackathon held at the Bernstein Conference in 2025. Across two events and more than 100 researchers, these questions have yet to produce a discernible consensus.

