Before the scientific paper, there was the treatise. In “Astronomia Nova,” Johannes Kepler bundled 10 years of astronomical observations, along with false starts, methodological disagreements and philosophical justifications, into a book-length masterpiece. Isaac Newton’s “Principia,” though written in a burst of roughly two years, unified ideas he had been developing since the 1660s. This pace was typical until journals, such as the Philosophical Transactions of the Royal Society, founded in 1665, offered an alternative. In them, scientists could share provisional findings quickly, in compact form, shrinking the unit of publishable knowledge. Charles Darwin himself seemed to worry about the rigor of this short-form science, complaining of his theory that he could “hardly see how it can be made scientific for a Journal, without giving facts, which would be impossible.”
But discoveries announced via tome eventually became the exception, not the rule. Albert Einstein’s four revolutionary papers appeared in Annalen der Physik in 1905. James Watson and Francis Crick announced DNA’s structure in less than 1,000 words in Nature in 1953. It’s easy to forget today that the scientific paper was an innovation, enabling researchers to build on one another’s work in something closer to real time, accelerating the pace of discovery. It also seeded the infrastructure of modern scientific publishing that governs academic life.
That infrastructure is now under extraordinary strain, and artificial intelligence (AI) is making it worse. A study published last December in Science found that researchers who use large language models are publishing significantly more papers than they did before. Since at least 2024, Matt Spick, a health data analytics researcher at the University of Surrey and associate editor at Scientific Reports, has been getting nearly identical papers to review—one a day, sometimes two, all drawing on the same publicly available U.S. health dataset, rephrased just enough to dodge plagiarism detection. Meanwhile, paper production is getting its own silicon facelift, with effort reaching an apotheosis in April when a team at Google released Paper Orchestra, a system that takes a researcher’s raw lab notes and, through a sequence of five specialized AI agents, produces a submission-ready LaTeX manuscript with figures and verified citations in about 40 minutes. The volume of output powered by AI is overwhelming the peer-review system. Editors can’t find enough qualified reviewers, and the reviewers they do find are increasingly turning to AI themselves—some 21 percent of reviews submitted to this year’s International Conference on Learning Representations, a major machine-learning conference, were fully AI-generated.
The standard response to this frenzy of automation has been to look for ways to shore up the existing system. But a growing number of researchers are asking a different question, in no small part because the existing system has long had its own drawbacks. What if the problem isn’t how to fix scientific publishing? What if AI’s growing capabilities are going to force the unit of scientific communication, once again, to evolve? Such a shift could have a major impact on a wide-ranging field such as neuroscience, where the science spans molecules to behavior and whose diverse data often remain siloed, underserved by the paper-shaped package the system demands.
P
eople in science have sensed this change coming for a while. In early 2023, just a few months after ChatGPT’s launch, I sat in a meeting where one of the most prominent scientists at my institution speculated that AI may eventually force a rethinking of what a scientific paper even is. That same year, Michael Eisen, computational biologist and former editor-in-chief of eLife, described a future in which findings are published not as static, one-size-fits-all narratives but in an interactive, “paper on demand” format, in which users query the underlying experiments, data and analyses directly. “I think it’s only a matter of time before we stop using single narratives as the interface between people and the results of scientific studies,” Eisen said.For a few years, this remained in the realm of premonition. Now the proposals are getting concrete enough to argue about. Among the most concrete comes from a team led by Lior Pachter at the California Institute of Technology. In a preprint posted in February, Pachter and his colleagues describe OpenEval, a system that decomposes scientific papers into their constituent parts: individual claims, the evidence supporting them, and evaluations of whether that evidence holds up.
They ran this system across the entire corpus of eLife, which publishes its manuscripts and peer reviews in a machine-readable XML format. From roughly 16,000 papers, OpenEval extracted nearly 2 million discrete claims, and an AI evaluated each one. AI and human reviewers agreed 81 percent of the time, but I think the more intriguing finding was about coverage. OpenEval assessed 93 percent of the claims in a given manuscript. Human peer reviewers, on average, got to 68 percent. This is where the tirelessness of AI comes in handy: A paper containing 100 empirical claims simply isn’t going to be fully and systematically evaluated by two or three busy faculty on short turnaround.
Pachter’s group argues that scientific publication should distinguish between two functions that papers currently bundle together: the dissemination of results and the communication of ideas. Results, they say, should be published in explicit, machine-readable form. Narratives should serve as an interpretive layer on top of that structured foundation. The paper would still exist, but it would become one view of a deeper, queryable record.
Neuroscience may have more to gain from this kind of system than most fields. The discipline spans molecular biology to functional imaging to behavioral psychology, and findings at one level routinely bear on questions at another—yet the connections stay buried because no single person can manage the flood of findings from all subfields. A structured, queryable record of results would make those connections visible for the first time.
The OpenEval system has already demonstrated that it can find hidden connections: Two eLife papers independently examined the mechanisms of timing-dependent long-term depression (tLTD) in different brain circuits: One showed that tLTD can occur with or without NMDA receptors, depending on the synaptic connection; the other showed that NMDA receptors are required and mediate tLTD through non-ionotropic signaling. The papers don’t cite each other because they studied different circuits. Taken together, they suggest that NMDA receptor involvement in tLTD is circuit-dependent and mechanistically diverse, the kind of complementary finding that might reshape how you think about a mechanism but that the balkanized nature of the neuroscience literature keeps invisible.
Others have gone further. Italian physicist Francesca Colaiori has outlined what she calls an Adaptive Knowledge Network, in which the basic unit of scientific contribution is a “knowledge object” rather than a paper—a single claim, a dataset, a method, an open question. Each connects to others through informative links, and publishing becomes something more like editing a shared wiki than submitting a finished manuscript. At the other end of the spectrum, the editors of NEJM AI have already begun running an invitation-only hybrid human-AI review process in which a human editor reviews a manuscript independently, two large language models produce separate structured reviews, and a statistician collaborates with an AI on a full statistical review. From submission to provisional acceptance: seven days. They published their first two papers through this system late last year, along with the full AI reviews and author responses, inviting readers to judge the quality for themselves.
T
hese proposals occupy different points on the spectrum from incremental to transformative. NEJM AI is patching the existing system. Pachter’s group is proposing a new layer underneath it. Colaiori is imagining a replacement. But they share a common diagnosis: The scientific paper bundles too many functions into a single artifact, and the bundle is starting to come apart.There’s a precedent for this kind of unbundling that is already further along than many scientists might realize. When mathematicians Timothy Gowers, Ben Green, Frederick Manners and Terence Tao proved a key case of the Polynomial Freiman-Ruzsa conjecture in 2023, they reported it in a traditional paper. But within three weeks, a team of contributors also translated the proof into Lean, a proof assistant whose community-maintained library, mathlib, contains more than 250,000 machine-verified theorems. The machine verification process caught a minor error that humans had missed.
Now the result exists in two forms: a paper that explains why the proof is important, what intuitions guided it, and where the challenges were, as well as a machine-checked version that guarantees every step is valid. The two serve different purposes, and neither is reducible to the other. Pachter’s group invokes mathlib as a model, though the analogy is imperfect—empirical science can’t be verified, logically, the way a mathematical proof can. The broader point is that a narrative remains essential for communicating ideas to other humans, but that machine-readability can make results reusable, composable and verifiable in ways that results locked in prose cannot achieve.
What would this look like in practice for a neuroscience lab? One version: You submit a paper the way you always have, but alongside it you deposit your claims and results in a structured format, each linked to specific figures, statistical tests and datasets. Reviewers evaluate individual results rather than trying to hold the whole manuscript in their heads. Related results across papers get discovered automatically, including in papers you’ve never read and that don’t cite each other.
In Eisen’s version, even the narrative layer becomes dynamic: Instead of a fixed paper, the reader queries the structured record and gets results tailored to their question and expertise. The paper becomes an interface, generated on demand. In the more radical version, the paper becomes optional altogether, a “narrative view” of structured knowledge objects. This structure could have valuable implications for publication bias. If the primary scientific record exists in the network and professionally meaningful credit is assigned that way, then a single careful replication or a precisely stated negative result counts toward career advancement rather than languishing, unpublished, on someone’s computer for years.
This more radical version also comes with pitfalls. As I wrote in a previous column, the process of scientific writing is itself a form of thinking, and the struggle to articulate a finding forces you to confront gaps in your own reasoning. If writing about the science is where scientists figure out what their science actually means, then treating paper-writing as a decorative layer on top of formalized claims may remove important think-work from the writing process. Google’s PaperOrchestra, for example, is, by design, eliminating the weeks a scientist would have spent struggling with exposition, discovering that one result doesn’t really flow from the previous as neatly as expected, or realizing halfway through the introduction that the problem is framed in a way that isn’t quite consistent with a close read of the literature. If that struggle is where some fraction of scientific insight happens, then automating it away costs science in a way that won’t register in publication metrics. The paper might be inefficient as a data container, but it may also be doing cognitive work that is germane to scientific progress.
I don’t think the machine-readability advocates have adequately addressed this potential trade-off. But it’s also not a reason to defend the status quo indefinitely. The shift from treatises to papers involved genuine losses. Darwin knew his ideas needed hundreds of pages for his painstaking argument to fully unfurl. The historian of science Alex Csiszar has argued that fragmenting knowledge into “broken pieces of fact” carried real epistemic costs. And yet the smaller unit enabled something the treatise couldn’t: a fast, iterative, cumulative record that changed what the scientific enterprise could accomplish. The quantum mechanics revolution of 1925 to 1927, in which a complete theory emerged in barely two years through a cascade of short journal papers exchanged across Göttingen, Copenhagen, Cambridge and Zurich, perhaps would not have happened in the age of the treatise.
The question is whether the scientific community is at a similar inflection point. The truth is that although AI may be the catalyst for change, the scientific paper on its own terms was already failing to contain the science. What is called a paper today is really a strange Frankensteined-together patchwork of supplementary files that run longer than the manuscript, datasets deposited in repositories that other researchers rarely reuse, and code posted to GitHub without documentation. The paper has already been replaced, in practice, by a mess. For a field such as neuroscience, in which two groups can independently discover something fundamental about the same synaptic mechanism in different circuits and never find each other’s work, the difference between a structured transition and a disorganized one is not academic. Neuroscience has long been a field too large for any one person to survey, so the next form scientific communication takes may determine whether the next generation works from a map or wanders the territory by feel.
