Something fundamental to computation-based research has really changed in the last 10 years. In certain fields, progress is simply dramatically faster than ever. Researchers in affected fields are living through a period of profound transformation as the fields undergo a transition to frictionless reproducibility (FR). This transition markedly changes the rate at which ideas and practices spread, affects scientific mindsets and the goals of science, and erases memories of much that came before.
The emergence of FR flows from three data science principles that matured together after decades of work by many technologists and numerous research communities. The mature principles involve data sharing, code sharing, and competitive challenges, however implemented in the particularly strong form of frictionless open services.
Empirical machine learning is today’s leading adherent field; its hidden superpower is adherence to frictionless reproducibility practices; these practices are responsible for the striking and surprising progress in AI that we see everywhere; and these practices can be learned and adhered to by researchers in any research field, automatically increasing the rate of progress in each adherent field.
Keywords: reproducible computational research, challenge problems paradigm, frictionless reproducibility, frictionless research exchange, emergent superpower, AI singularity
Media Summary
Overheard in conversation...
(Outsider):
Things are happening awfully fast these days! I frequently see advances in data processing—text, video, tabular data—coming at a rate I hadn’t been expecting to see. This makes it look like there’s a superpower out there. Maybe it’s AI?
(Insider):
No, actually it’s data science.
(Outsider):
What’s the data science superpower?
(Insider):
Okay, on the fingers of one hand:
In the last decade, frictionless services became available thanks to the modern smartphone information ecosystem
Those frictionless services were applied by scientists and technologists to data sharing, code sharing, and challenges
Some communities of researchers started frictionlessly sharing research artifacts—code, data, results—and building on each others’ work.
Involved research communities are progressing much faster. It’s night and day.
Soon it will really speed up, as scientists write computer programs to exploit all the digital artifacts from previous research.
(Outsider):
But wait, I’m still reading everywhere about AI.
(Insider):
AI is one of those communities where people are working this way. That’s why you’re hearing so much news about it.
(Outsider):
But I heard that there’s a singularity, that AI is going to unemploy everybody and may kill us all.
(Insider):
There is a singularity (= new superpower) but not in AI. It’s a superpower allowing us to do data science research faster and better. It’s not dangerous. You’ll just be hearing more and more frequently about advances, because research goes faster and faster.
1. Introduction
Recent decades witnessed a dramatic transformation of information technology, as the smartphone spread across the planet, spawning omnipresent internet connections, and always-on cloud computing capacity. The result was a unified global computing/communications resource now driving nearly every industry and transforming nearly every human sphere of activities.
Information is now flowing to agrarian peasants in developing economies in a fashion inconceivable not long ago; this is driving historically unprecedented declines in global poverty and ignorance.
Our collective intellectual life is also transforming rapidly. Scientists today have a completely different set of ideas about what can be learned from data, and how to go about learning it, than only 10 years ago. There are many prominent stories of dramatic recent changes in science’s ability to make sense of data, the most well-known being in the fields of computer vision, natural language translation, or protein folding, but examples are springing up nearly everywhere. We all know something has happened, with no going back; but we may not yet perceive the larger, implicit, emergent phenomena, causing so many dramatic changes at the same time in so many fields.
The media and scientific public view these many signs of progress as signs that mysterious superpowers have somehow emerged, but are secreted away in hegemon labs, and need immediate political regulation. Memes of ‘AI singularity’ are widely disseminated, often alongside hints of ‘AI doom.’ Politicians on the global stage are now worried.
In a companion article (Donoho, 2024), I argue that “AI singularity” offers a misleading narrative, which cannot explain a wide range of new developments and which hides from view the true driving forces. In this article, I argue for a different narrative: that computation-driven research really has changed in the last 10 years, driven by three principles of data science, which, after longstanding partial efforts, are finally available in mature form for daily practice, as frictionless open services offering data sharing, code sharing, and competitive challenges.
Researchers in fields adhering to these principles find themselves living through a period of profound, rapid transformation. Research life after adopting these principles operates at a faster pace: good ideas spread and are adopted, and improved upon, seemingly with very little friction (e.g., human labor) compared to earlier decades. We are entering an era of frictionless research exchange, in which research algorithmically builds on the digital artifacts created by earlier research, and any good ideas that are found get spread rapidly, everywhere. The collective behavior induced by frictionless research exchange is the emergent superpower driving many events that are so striking today.
To discuss these data science principles and their singular effects, I first discuss the larger civilizational changes enabling them; I then review the recent public discussion of singularity for eventual comparison with my thesis; and then discuss the dramatic acceleration in research progress that takes off once a research discipline adopts the new practices and crosses this singularity.
I then mention actions individual readers can take in response to the arguments presented here, including in research and teaching; and I make predictions about future developments on the other side of the singularity.
2. Explosion of Accessible Compute Power
Moore’s law was, from 1950 to 2000, an engine of continual improvement in computing speed and power, but then it stalled out, post 2005, as decreases in the size of transistors no longer allowed reductions in processor cycle times. Amazingly, progress in compute power accelerated since the ‘death’ of Moore’s law.
The powerful instigator was a marked one-time ‘smartphone transition’ dividing human history into periods ‘before’ and ‘after’ smartphones. Over the first 15 years of the smartphone era, smartphones proliferated rapidly; today purportedly 80%+ of all adult humans have access to this technology either directly or through friends and associates.1 Smartphones rely upon a global computational and communications infrastructure whose rapid construction has been a major civilizational achievement, with myriad consequences and corollaries. These include undersea fiber, satellite internet, arctic data centers, cloud computing, novel database technologies, and a massive reconfigurable global workforce of software developers. This workforce envisioned and created a range of organizational tools orchestrating virtual armies of compute servers; as a result, the compute capacity that could be easily thrown at a single task grew perhaps 100,000 times from when the smartphone era began.2
The implications for computational science were unprecedented. Even a negligible fraction of existing global compute capacity could suddenly allow computational scientists to be ambitious at previously unimaginable scales. But beyond this unexpected windfall, computational science independently and organically drove additional bounties; for example, hardware hegemons invested heavily in GPUs for numerical computing, pushing stock market valuations beyond anyone’s expectations. Combined with novel machine learning models and a whole industry category—MLOps3—for specifying and deploying machine learning workflows, a further 1,000-fold increase in compute power has been delivered by new ways to organize machine learning computations.4,5
Importantly, all this infrastructure is now available immediately, for a price, to anyone who will pay.
This new world order in computing offered all of us a vast expansion in functionality and scale—while remaining true to the original purpose of the smartphone era: to enable humans to spontaneously share information among themselves. Such spontaneity demanded frictionless open sharing of digital artifacts and impromptu computing in response to unexpected user requests. This frictionlessly available data and compute infrastructure could be ingeniously adapted for many purposes, including computation-driven research.
3. Data Science Matures
The new world order in computing crystallized into maturity three long-ongoing socio-technical initiatives spreading throughout science. The three initiatives have been aspirations of many data scientists across decades, and the coming-to-fruition moment, where everything needed is now in place and immediately accessible, has been reached by the hard work of many technologists.
The initiatives are related but separate, and all three have to come together in a particularly strong way to provide the conditions for the new era. Here they are:
[FR-1: Data]: Datafication of everything, with a culture of research data sharing. One can now find datasets publicly available online on a bewildering variety of topics, from chest x-rays to cosmic microwave background measurements to Uber routes to geospatial crop identifications.
[FR-2: Re-execution]: Research code sharing including the ability to exactly re-execute the same complete workflow by different researchers.
[FR-3: Challenges]: Adopting challenge problems as a new paradigm powering methodological research. The paradigm includes: a shared public dataset, a prescribed and quantified task performance metric, a set of enrolled competitors seeking to outperform each other on the task, and a public leaderboard. The paradigm can also include virtual challenges lacking the formal leaderboard, in which authors still attempt to publish a new state-of-the-art result going beyond previously recorded/published performance levels on a given dataset/performance metric. Thousands of such challenges and virtual challenges, with millions of entries, have now taken place, across many fields.
Note: The term benchmarks is frequently used to describe what we here call virtual challenges; but it can be unclear whether this term refers to the process of quantifying performance, or instead to the specific value of certain quantitative measures in a particular case.
Each initiative addresses aspirations in scientific research as old as data-driven science. Researchers have always wanted to access the same data used by their predecessors in earlier studies; or apply the same algorithms from those studies; or to score performance of new algorithms in later studies according to the same metrics used in earlier studies. However, those wishes were generally frustrated in various ways in earlier generations.
In the past decade, all of these initiatives, in their best implementations, became frictionless open services, essentially offering immediate, permissionless, complete access to each relevant digital artifact, programmatically from a single line of code. Not all friction is yet gone, but the trend is clear: what frictions still remain will vanish soon enough.
Today’s researchers will know many specific examples of initiatives [FR-1]+[FR-2]+[FR-3] coming together as frictionless open services in the last decade. Notable ones include:
What were once mostly frustrated wishes of researchers are today achieved realities in several fields, supported by habits and institutions. Most importantly, where the behaviors are not yet dependably present, there is no essential obstacle to turning them into everyday habits, except the interest and diligence of participating researchers.8,9
This is a big change from the recent past! The last few decades had been a period of experimentation and prototyping, trying out various approaches to sharing research data, challenge problems, and research code. Sometimes, in principle, we could reproduce the research of others. But in various ways there was still clumsiness and friction.
Then came the breakneck development of information technology driving our post-2007 smartphone era. The emergence of ‘The Cloud’ gave us tools for web-accessible database management, and source code management for workgroups, which made it possible for information technology organizations to be distributed globally yet work together.
Building on the new capabilities, developers implemented globally accessible code and model repositories, data repositories, and, eventually, globally visible task metric leaderboards.
These services and their use patterns are well known to modern data scientists. Research communities can now organize themselves to model their research publication process around the use of these services.
Among research communities, empirical machine learning (EML) has gone the farthest: its members today can use accepted commercial services including Kaggle, GitHub, HuggingFace, and Weights and Biases for storing and accessing these artifacts. Quite literally, today’s model for academic EML research requires accessing the various artifacts created under [FR-1]+[FR-2]+[FR-3] and then modifying some of them in some way and then sharing those modified artifacts. Articles describing such research, which we discuss later below, are simply highly stereotyped press releases advertising the availability of these artifacts.
Step back from the EML context for the moment. The same process is ready for immediate deployment by computational scientists more generally.
4. Emergence of Frictionless Reproducibility
The maturation of data science initiatives [FR-1: Datafication and Data Sharing], [FR-2: Code Sharing and Re-execution], and [FR-3: Challenge Problems] in their current form as essentially frictionless services brings us to a phase transition in computation-driven research, to a new era of frictionless reproducibility in data-driven methodological development. This new research era features unprecedented rates of advance.
Indeed, today, a computational scientist working in one of those fields where all three practices are standard, can, with a few mouse clicks, and maybe a little extra typing, access everything needed to reproduce some recently reported experimental result from some other institution: code, data, experimental analysis. The scientist can be running some other team’s original experiment or a modified computational experiment within minutes or even seconds of hearing about it. In many cases, the scientist does not even really need to be manually involved; a predefined script can do all the work, completely automating the data and code access, and running the experiment—possibly with novel modifications or measurements layered on top.
In contrast, progress seems notably slower in a field where only two (or one, or none) of these initiatives have yet come to fruition, rather than three.10
The most common leave-one-out setting is surely reproducible computational science (RCS) where we combine [FR-1: Data Sharing] and [FR-2: Code Sharing], without [FR-3]. Here there is a scientific question but no underlying challenge problem being considered; we might simply be doing an exploratory data analysis and reporting what we saw, and giving others access to the data and the analysis scripts. RCS lacks the ability to focus attention of a broad audience on optimizing a performance measure.11
A less common but very famous and important setting is the proprietary-methods challenge (PMC), with [FR-1: Data] and [FR-3: Challenges] but not widespread code sharing by participant teams, even after the fact. The $1 Million Netflix Challenge (2009), and many large-dollar prize challenges since, are examples.
Least common by far are bring-your-own-data challenges (BYODC), that is, offering [FR-3: Challenges]–[FR-2: Code Sharing] without [FR-1: Data Sharing]. This can happen in clinical medical settings where the research focuses on predicting patient outcomes but the underlying data is private and only a few credentialed researchers ever get to see it, under nondisclosure. Here, the research publications may share task definitions and results and also algorithms, without sharing the data. Clinical medical research overall is a field with about one million research publications per year.
Without all three triad legs [FR-1]+[FR-2]+[FR-3], FR is simply blocked. In two of the above leave-out-one scenarios, this is obvious. Without [FR-1 – Datafication and Data Sharing] we would be missing the data; without [FR-2 – Code Sharing and Re-execution] we would be missing the opportunity to inspect and build upon the workflow.12
Less clear is what we might be missing without [FR-3 – Challenges]. We would be missing the task definition that formalized a specific research problem and made it an object of study; the competitive element that attracted our attention in the first place; and the performance measurement that crystallized a specific project’s contribution, boiling down an entire research contribution essentially to a single number, which can be reproduced. The quantification of performance––part of practice [FR-3]—makes researchers everywhere interested in reproducing work by others and gives discussion about earlier work clear focus; it enables a community of researchers to care intensely about a single defined performance number, and in discussing how it can be improved.13
Table 1. Leave-one-outs, and what is blocked.
If we only have...
We are blocked, because
Example
+ [FR-2]
No defined task
Exploratory Data Analysis
[FR-1] + [FR-3]
Can’t build on code of others
Netflix Challenge; DARPA Biometric Challenges
+ [FR-3]
No Common Dataset
Human Subjects Clinical Research
Of course, not every field works this way today,14 but those that do commonly benefit from very high velocity of progress. Frictionless reproducibility spontaneously spawns groups of inspired researchers to a tight loop of iterative experimental modification and improvement, often engaging many scientists. This tight loop per researcher, combined with iterative feedback from cross-researcher competition, results in a rapid series of improvements in the performance metric. The outcome of the process is much higher performance on the task metric.
An emergent property is manifest here. The researchers in an adherent field, understanding that they are all subscribed to the FR triad, think differently about what they do/should be doing, behave differently and develop new, FR-aware habits and practices. We see a new institution arising spontaneously; let us call it a frictionless research exchange (FRX).
FRX is an exchange, because participants are constantly bringing something (code, data, results), and taking something (code, data, new ideas), from the exchange; and various globally visible resources—task leaderboards, open review referee reports - broadcast information to the whole community about what works and what does not. Of course, this is a very different type of exchange from those involved in financial markets; it involves intellectual engagement, not money. Financial exchanges produce price discovery. Frictionless Research Exchanges produce community critical review.
Within an FRX, each researcher expects to find out what others have done, how those results worked, and then can build directly and explicitly on the selected work of others; seeking tweaks and improvements over their work. A community of researchers working this way initiates a chain reaction of reproductions followed by furious experimental tweaking, with each new ascent up the leaderboard exciting responses by others and further tweaks and improvements.
Empirical machine learning is, par excellence, the field that has most nearly adopted the FR triad in its research model; that has most completely shaped itself to have the habits and attitudes that create an FRX; and has benefited the most from FR practices.
Many computational experiments in EML follow a very uniform structure; understood by everyone in the field, the sites and tools (see e.g., CodaLab on the noncommercial side; Weights and Biases, HuggingFace, Github on the commercial side) offer easy immediate uniform access to archives of canonical data, models, performance metrics, and all the other paraphernalia needed to begin reproducing results as they are announced, or developing modified experiments that build on those announcements.
EML research has delivered the numerous stories of remarkable ‘AI progress’ I noted earlier: in machine vision, natural language processing (NLP), gameplay, and even protein structure prediction. In my opinion, without the last decade’s (near-) frictionless reproducibility, this progress could never have been delivered on the decadal time scale that we have all just witnessed. The rate of progress on each problem would have been much smaller, because the number of meaningful contributors would be much smaller, and uncertainty about the meaning of individual experimental reports would have exerted a significant drag on the energy and enthusiasm of the research community. But also, without the right habits and engagement by researchers in the field, the full FRX experience would not have emerged.
EML, by its embrace of frictionless exchange, has made a very valuable discovery of new habits and mindsets in research, which can benefit us all, and has revealed the spontaneous emergence of frictionless research exchanges, a research superpower at the heart of the EML story.15
The triad of FR practices is not ‘the same thing as what EML is doing’; The EML community seems not inherently committed to the whole FR Triad; some hegemons but not others play nice with these practices.16
Protein structure prediction began using the challenge problem paradigm with the CASP challenges 3 decades ago; spent a great deal of research effort over the years in developing insights into identifying scientifically relevant performance measures. and benefited as we all have from the increasing tendency toward data sharing and code sharing.
Other fields might follow these practices and benefit correspondingly. Scientific fields recently implementing challenges include NMR17 spectroscopy structure determination (Rosato et al., 2009) and computational biology (Blasco et al., 2019). Recent large-scale funding for new challenges include an initiative in high energy physics18 and in applied mathematics (dynamical systems).19
In my view, FR and FRX mark a leap well beyond their three well-known predecessors.
5.1. Beyond In-Principle Reproducibility
Traditionally, computational scientists sometimes offered in-principle reproducibility (IPR) in their publications, sharing some code and data publicly, with additional details embedded in the paper’s narrative, so that after careful reading and collating of information from various original sources, the authors guesstimated that some outside researcher could eventually roughly reproduce their work. While IPR is hardly frictionless, it was an advance on what came before; in earlier times, it was more or less accepted that scientific publications would not allow reproduction of complex computational work. IPR was an advance, and stood for some time as a worthy standard to emulate, deserving of approval where achieved.
In the experience of researchers who lived under both regimes, IPR contrasts dramatically with FR. The common experience under IPR was that during the effort to reproduce, there would be: (a) discovery of many undocumented steps that might offer ‘stoppers,’ and (b) many places for previously undisclosed human input and decision where there could be misunderstanding or miscommunication. Consequently, the prospect of success in reproducing reported results was forever in doubt, in turn spawning near-universal skepticism on the part of experienced researchers of results reported by others (the well-known “Not Invented Here” attitude). FR, by removing such undisclosed and undocumented components, makes it much, much easier for researchers to verify and therefore build trust in another group’s work.
5.2. Beyond Open-Source
For 30 years, open-source software (OSS) has been a major presence in the world of information technology.20 In its best implementation, which has been widely practiced for decades, one indeed obtains OSS through frictionless services. OSS is thus an enabler of FR, an important step in the right direction. It enables software developers to build their work on the coding of others; this is a crucial part of the usual path to FR, but only one part. Re-executability of a published computational experiment requires much more than open access to some key software components that were used in that experiment; it generally requires the exact reconstruction of an entire computational environment, and an entire orchestrated workflow. An important corollary of the cloud information technology era was ubiquitous virtualization and containerization,21 enabling specific machine instances, and even specific virtual cluster architectures. These technologies matured long after the OSS movement.
In my mind, the open-source movement takes us beyond in-principle reproducibility but not yet to full frictionless reproducibility, and also not to the emergence of a frictionless research exchange. For example, a researcher absorbed in solitary pursuits can share code and data, it can even be available frictionlessly, but without engagement by a community of other researchers that cares deeply about assimilating the best contributions, building upon them, and outdistancing them, we will not see emergence of rapid progress.
5.3. Beyond Competitive Challenges
Challenges alone, without each of [FR-1: Data Sharing] and [FR-2: Code Sharing] and consequent research community engagement, also cannot produce the advantages we are discussing. The typical missing ingredient is [FR-2], reverting us to what we called the proprietary-methods challenge (PMC).22 In such circumstances, we miss the sharing and cross-fertilization of knowledge, we do not as a community know what has been done to get a method that bests the competitors, and we do not get the chance for critical assessment of the methods. It is possible that a ‘method’ is just a set of random tweaks with no rhyme or reason, yet can win a particular competition at a particular moment essentially due to random noise—some competitions are just that close. Competitions do not, inevitably, produce lasting results or knowledge. Community review is what ensures that important advances are recognized and will propagate.
Certain research communities already take ‘open source’ as a given, meaning both data sharing and code sharing. In such situations, [FR-3: Challenges] seem to be the only missing ingredient, and hence the ‘secret sauce’. It is then natural to think ‘Oh, it’s all challenges.’ Fair enough. But if that psychology takes hold, by a ‘slippery slope effect,’ we may forget to secure the assumed givens (data, code, community) and fail to produce critical assessments.
6. FR Everywhere!
Crucially, other research fields can benefit from adoption of FR practices and institutions just as well as EML—or maybe more. The FR triad can be, and is being, practiced outside the empirical machine learning world.
To see this, we do pattern recognition, noting analogies where certain elements in the triad are, although different in detail, structurally identical.
For example, the notion of challenge could change. Here the data and code ingredients [FR-1]+[FR-2] might be as before, but the challenge task being solved per [FR-3] might no longer be what EML calls a prediction task. Mark Liberman, a natural language processing expert at the University of Pennsylvania, has been using common-task framework (CTF) to label challenges [FR-3] more neutrally, without implying that they are prediction challenges.23
Consider the field of optimization. There are many algorithms for solving well-posed optimization task instances such as linear programming and quadratic programming, and traditionally one developed personal preferences for one algorithm or another based on intellectual perspectives and mathematical properties. What matters is the speed of an algorithm and the accuracy of the solution. The data science approach in this context would: [FR-1] share standard datasets that will be used in defining specific optimization problems, [FR-2] share code defining reference implementations, and [FR-3] define metrics (running time, solution accuracy) and maintain leaderboards. Benchopt does exactly this, focusing particularly on the algorithms of interest in machine learning; but instead of asking for the prediction performance, as in much EML research, it asks for timing or solution accuracy. French research funding agencies supported Benchopt with personnel funding and compute access.
Consider an example from statistical methodology. Stephen Ruberg and coauthors (Ruberg et al., 2022) conducted a challenge to promote development of ‘subgroup identification methodologies’ ultimately intended for the analysis of data in pharmaceutical clinical trials. Here syntheticdata are used, with ground truth known. The authors construct many specified generative models, whose details are (let us say) sequestered from view, then sample data from those models. Task performance metrics included the ability to correctly identify the subgroups of treatment responders, because the underlying ground truth synthetic model was known to the organizers; such metrics could not be defined in the usual EML framework. The challenge itself was hosted on the InnoCentive platform, and offered a prize of $30,000 for first place and $15,000 for second place. More than 100 competitors vied to predict properties of the generative model. This work and its publication were supported by Eli Lilly.
In physical chemistry, Adam Schuyler and coauthors (Pustovalova et al., 2021) conducted the NUScon challenge, seeking reconstruction methods for nonuniformly sampled NMR Spectroscopy. NUScon was hosted using the NMRbox methods platform (Maciejewski et al., 2017), which allows the creation of workflows using a large number of different software tools developed by the research community over the years, as well as extensive ‘workflow glue’ that allows chaining together such methods and tweaking their operation. Synthetic data were created using NMRbox based on underlying physical models. Competitors used NMRbox to code their entry workflows, and administrators used NMRbox and the NMRbox compute cluster to implement the contest evaluation. The platform provides an (almost) fully automated workflow for running contestant reconstruction recipes on contest data and evaluating solutions. Several thousand dollars in cash prizes were awarded in a NUScon prize ceremony at the annual meeting of the Experimental Nuclear Magnetic Resonance Conference (ENC). The contest produced community engagement with, and adoption of, methods for nonuniform sampling, which allow very substantial speedups in data acquisition for high-dimensional NMR24 experiments. It documented in a fair and objective way the measured performance of a variety of methods on a common platform. In addition, NUScon improved awareness of, and adoption of, the NMRbox platform, and thereby created a more capable and productive community of researchers.
In each of these three examples, the problems being attacked are not one-one mappings of standard ML problems; but some variant of common task framework can be defined and implemented.
7. The Roots in Data Science
The maturation of [FR-1]+[FR-2]+[FR-3] and emergence of FRX did not spring out of a vaccum. Nor out of slidedecks presented to Silicon Valley venture capitalists who funded Github, Kaggle, and Hugging Face. Nor out of today’s hegemon research labs. Rather, they developed organically from efforts by data scientists and technologists across at least four decades, witnessed by my own eyes.
7.1. Reproducible Computational Science
RCS has been developing steadily for decades.25
A few examples: (Peng & Hicks, 2021)
Geophysicist Jon Claerbout began advocating and practicing26 RCS in the early 1990’s, sharing code and data of his lab members’ technical reports and PhD theses by CD-ROM; thereby he changed the standards in his field and ultimately changed journal publication practices to enable provision of digital research artifacts.
Over the last 20 years, computational biologists have been increasingly sharing code and data, including genome sequences,27(Benson et al., 2013) gene expression data,28(Ringwald et al., 2000) phenotypic data,29 algorithm libraries like BioConductor,30 and workflow frameworks (Grüning et al., 2018).
In the last 10 or 15 years, numerous professional scientific journals and societies have acknowledged the need for, and developed accommodations for, reproducibility of computations (Peng, 2009). Cross-science surveys of this activity are available from reports of the National Academies of Science, Engineering and Medicine; see for example the recent 2019 consensus report Reproducibility and Replicability in Science(NASEM, 2019).
The intent has consistently been to share data and code using the technological capabilities and scientific awareness of the day. As those capabilities advanced, the comprehensiveness of the reproducibility and immediacy of the access also advanced. In my view, today’s RCS capabilities, though they would, of course, not been predicted in detail by scientists of two or three decades ago, would have been clearly anticipated at a high level. A hypothetical scientist of those times might even have expected them to come together long before 2023.31
Actually, though, many many scientists and technologists had to march in the direction of RCS in order to create the culture, customs, and tools we have today. Moreover, RCS still does not happen in every case. Incredibly to me, advocacy of RCS is spotty in certain fields.
Advocacy of RCS, following RCS practices, and implementation of RCS therefore deserve our praise. I will not enumerate all those I consider instrumental. The already mentioned report Reproducibility and Replicability in Science(National Academies of Sciences Engineering and Medicine, 2019) presents an extensive cross-section sampled across many scientific fields.
Still, for this article, there is one person I would like to spotlight as a standout advocate and implementer.
Yann LeCun: Turing Prize winner; Chief AI Scientist of Meta; Silver Professor at NYU; and important thought leader in today’s AI information space. Among technical leaders in the trillion-dollar market cap group, LeCun, to me, stands alone as a highly vocal and consistent advocate of code and data sharing. Over the last decade, Meta produced, shared, and supported fundamental tools like PyTorch32 and LLaMA,33 which are integral to reproducible computational science in many fields today.
As the last touches were being placed on this manuscript, LeCun spoke to the US Senate Committee on Intelligence, which wanted to hear about AI and preserving American Innovation.34 As one can imagine, many politicians and corporate leaders would be talking in such a setting about placing technology under lock-and-key. Instead (my observation in italics):
LeCun chose to use his precious moments of testimony to emphasize the importance of code and data sharing. He explicitly mentioned these factors as crucial to the recent rapid advances in AI.
In this article’s language, LeCun is implying that our secret sauce is the frictionlessness of building on the work of others. This is our economic and intellectual engine.
I quibble only in whether RCS alone—[FR-1]+[FR-2]—is enough to get all the rapid advances. Against my inclinations, I came to believe one extra factor is essential—[FR-3: Challenges]—to produce full FR, and after that, we also need the engagement of a whole community of researchers with a new type of institution—the FRX—for emergence of all the rapid progress we are seeing.
7.2. The Challenge Paradigm
From my perspective, the I-didn’t-already-see-this-coming-40-years-ago ingredient in the FR triad has been the impact of [FR-3: Challenges]. For at least 50 years it was widely understood that collecting data, and sharing it, could drive data analysis challenges.35 However, the view of data analysis ‘challenge’ in early years would have been extremely broad and I would say ‘humanistic,’ and could tolerate a wide range of challenge deliverables, include the challenge of ‘understanding’ what is in the data, or the challenge of ‘revealing an interesting view of the dataset,’ or even the challenge of ‘redefining our ideas of what can be done with the dataset.’ A popular term at the time was ‘exploratory data analysis,’ which well evokes the possibility of fun and discovery that inspired many. From this vantage point, what came next was a real surprise.
A very different engineering view of challenges became formalized36 in the mid-1980s for research in natural language processing and biometrics. Researchers were asked to submit entries scored by specified, predefined metrics. Leaderboards were instituted and winners declared algorithmically. This approach to challenges was radical in its clarity and simplicity, but also in its narrowness of focus.37
In a series of projects, many funded ultimately by the Pentagon (DARPA38) in the 1980s through the 1990s and beyond, challenges were mounted in speech processing, biometric recognition, facial recognition, and other fields (Garris & Wilson, 2005); Garris et al., 2006; Wing, 2013). Funding, often to NIST,39 covered data collection and curation, and challenge contest administration. Sometimes, winners of challenges were selected for follow-on DARPA grants and research contracts.
In a major departure from the principles advocated in this article, some DARPA challenges followed the proprietary challenge model, which allowed contestants to keep their code and working methods confidential, essentially meaning that contestants learn leaderboard results, but not necessarily about the winning models or fitting procedures. Such an implementation of [FR-3: Challenges] is incompatible with [FR-2: Code Sharing], and hence with frictionless reproducibility, showing it makes a difference when all three of [FR-1]+[FR-2]+[FR-3] are specifically present. This longstanding tension is important at the present moment in empirical machine learning, where some internet hegemons take a proprietary line.
Other data science fields instituted challenges organically without Pentagon instigation. The CASP protein structure prediction problem is the longest-lasting scientific recurring challenge problem venue.40 The sequence of a certain protein is known; predict its 3D structure. This competition has been around roughly as long as the World Wide Web and web browsers; its website today at https://predictioncenter.org/ documents the (peri-) biennial competitions since 1994 and the evolution of predictive solutions. Many outside-of-proteins data scientists’ first inkling of this competition followed a recent episode of hegemon engagement; DeepMind entered into recent CASP competitions, deployed overwhelming computational resources, got their wins, and issued their corporate PR,41 to the applause of the capital markets (Wiggers, 2018). But the competition has been ongoing for three decades, developing and perfecting the challenge model and growing the community that engages in it and gradually improving protein folding; without a corporate PR budget then or now.
Many data scientists ought to be mentioned as heros of the challenge paradigm, across science and technology, and across decades. I list three important figures, in last-name-alphabetical order.
Isabelle Guyon of Université de Paris–Saclay,42 who collaborated on the original UNIPEN dataset (Guyon et al., 1994) and coauthored a foundational paper on computer handwritten digit recognition (Bottou et al., 1994). Over the last 20 years she has been heavily invested in many prediction challenges (Guyon et al., 2006; Guyon et al., 2008; Guyon et al., 2011; Guyon et al., 2015; Liu et al., 2021; Jiang et al., 2020). A recent example is the 2 Million Euro Horizon Prize for Big Data Technologies,43 organized by Guyon and her colleagues at Chalearn.
John Moult of the University of Maryland, a molecular biophysicist who has been a sparkplug of the CASP protein structure prediction contests since the very beginning, that is, the early 1990s (Moult et al., 1995; Kryshtafovych et al., 2010; Moult, 2005; Moult, 2006; Kryshtafovych et al., 2014; Moult et al., 2018; Kryshtafovych et al., 2021).
Jonathon Phillips of NIST, who played a crucial role on many of the pioneering 1990s and 2000s NIST/DARPA challenges in vision and biometrics, including data collection and contest organization (Phillips et al., 2000; Phillips et al., 2002; Phillips et al., 2005; Phillips et al., 2009; Rizvi et al., 1998), with challenge involvement continuing up to the present time.
Although they have very different professional backgrounds, they have in common an exceptional clarity on the importance of challenges and exceptional persistence in promoting this paradigm in their domains.
From a later generation, I will also mention:
Percy Liang of Stanford; he had the farsighted idea in the 2000s to create a sort of ‘operating system for conducting challenge problems’ that is today’s CodaLab worksheets. Offshoots, managed by Isabelle Guyon and colleagues, are today’s CodaLab competitions (Pavao et al., 2023) and Codabench systems (Xu et al., 2022), which administer a large number of academic challenges annually.44
These data scientists, and many others I do not have space to mention, have shown how to make data science a science, full stop. Their vision is that data science research projects making full use of [FR-1]+[FR-2]+[FR-3] are truly empirical science; while research projects without all three will fall short in some way. In fact, Isabelle Guyon, in her 2022 keynote address at NeurIPS (Guyon & Viegas, 2020), has chosen to emphasize exactly this: through challenges, machine learning has become an empirical science.45
So the social practice, of challenges propelling research, was not born yesterday; it has been developed and refined across four decades. Pioneers saw early on, and soon made others understand, that this would be a transformative development.
7.3. Surprising Reactions
In a sign of its potency, [FR-3: Challenges] elicits a range of surprising reactions.
7.3.1. Reaction 1: Wild Enthusiasm
Mark Liberman has made clear that already in some of the earliest contests, the challenge paradigm tapped into a very special, previously unexpressed human energy. Leaderboards game-ified research and exerted a deep grip on the moment-by-moment attention of many participants of early challenges. They eagerly dropped into a tight loop of tweak model and data, submit new answers, try again. It was understood that performance improvements were happening at a much faster pace, even in the humble early days of this paradigm. It was also clear from this early experience that, if data and code sharing grew, and leaderboards went global, this human energy would increase in staggering ways. Exactly as happened in EML over the last decade.
7.3.2. Reaction 2: Shock and Disbelief
Quite a few mathematical scientists (including myself) have found it initially difficult to accept that a great deal of work in the empirical machine learning community was not supported by careful formal analysis and derivations, for example, theorems from mathematics or from computer science (CS). Many math scientists (also including me, initially) had difficulties understanding that a whole scientific community could be based entirely on the foundations of reproducible computations and challenge leaderboards.46
In processing my own sense of dislocation, I turned to the famous article of Alon Halevy, Peter Norvig, and Fernando Pereira, The Unreasonable Effectiveness of Data(2009). This squarely took aim at Eugene Wigner’s The Unreasonable Effectiveness of Mathematics in the Natural Sciences(1960), an article of some veneration among those trained in the mathematical sciences over the last 60 years. Halevy et al.’s (2009) article is best viewed as a provocation rather than an attempt to convince doubters, but it does brutally explain the new rules of the game.47
8. Acceleration Toward Singularity
The argument so far is an argument that [FR-1]+[FR-2]+[FR-3] have recently come together, after lengthy gestation in plain sight, and this combination—in the presence of the right research community practices—unleashes an acceleration of computational-methodology research. But acceleration is everywhere, not just in research; for inhabitants swimming through the 2020s decade, it surrounds us like water.
8.1. Acceleration Everywhere
Innovations of all sorts are today adopted more easily and spread more rapidly than ever. The spread of memes on the internet was recognized as a new phenomenon already in the 1990s; the sudden virality of slogans, posters, or videos that began to be seen at that time is today a dependable feature of the information space; each day on social media, the ‘trending now’ feature presents that day’s viral topics, likely to be replaced tomorrow by other viral topics, often unpredictably.
At the heart of viral meme spread is the ‘single-click transmission’ of ideas. Some saying or image is appealing, and the ease of transmitting that message to others is so effortless, just a few clicks; the viewer overcomes inertia and forwards it to others, initiating or continuing a chain. This is then repeated, in some cases almost endlessly.
Some observers see a biological drive at work; it is common to say that each participant in the chain gets a ‘dopamine hit’ from her/his ‘discovery’ of the content in their in-box, followed by a fulfilling ‘action’ of reposting or forwarding.48 Some participants speak darkly of their ‘addiction’ to participating in such chains as caused by ‘dopamine craving’ (Lustig, 2018). Others praise, and seek out, the ‘flow state’ induced by participating in the single-click transmission of ideas.
We are now so used to frictionless spread of sayings and images, some quite arresting, entertaining, and shocking, that many of us expect a daily parade of amazement.
Mass culture has been practically overwhelmed by this development. Billions of people have smartphones, which expose them to social media for hours a day,49 during which they experience the cresting and breaking of waves of viral memes.
This has upset traditionally staid activities like politics (Barrett et al., 2021).50 Popular ‘grass roots’ movements can grow virally and crest beyond the reach of any traditional debate or clash of ideas. The attention-grabbing capabilities of such viral waves are breathtaking. Originally “fringe” ideas can suddenly emerge, seemingly from nowhere, to capture significant mindshare.
The recent unprecedented dynamism of mass culture is happening all around us, all the time; but we may forget that it is rooted ultimately in the technological possibility for delivering (almost) frictionless spread of sayings and images. Today’s frictionless regime is invisible to us fish; it is water.
8.2. Friction Spoils Everything
In contrast, friction—drag on the effortless sharing of information—is really, really visible to us. It breaks the flow, stops the dopamine. Today, many have no patience for friction of any kind.
One can see complaints about this on social media: ‘receipts or it didn’t happen,’ ‘URL or I don’t care.’ Commenters want the experience of accessing digital content with a single click. If there is no digital content to access efficiently, participants are frustrated and share their frustration, vocally; perhaps even with digital tokens of their frustration, such as emojis or GIFs.
Putting content behind ‘paywalls’ has the same effect. Some participants will react strongly against it. Some may not want to pay, but many simply do not want friction!
8.3. Virality in Data Science
Frictionless reproducibility of scientific computations is the analog, for research, of the frictionless transmission of internet memes.
FR aligns powerfully and naturally with the habits and practices each of us experiences in our experience of digital culture at large. A new methodological tool published in a way that subscribes to [FR-1]+[FR-2]+[FR-3] may spread across a field like wildfire.
I have seen this up close in single-cell RNA-Seq data analysis. A leading journal published (Gribov et al., 2010; Pereira et al., 2021) introducing a new software package, Seurat; the data and code were packaged in a very clean way,51 with a user interface making it extremely easy to reproduce their results––but also to cross-apply the same methodology to fresh data from another project. The package spread across the RNA-Seq research landscape rapidly and it soon became de rigueur for visualizing RNA-Seq data in publications.
In a series of interactions with the computational biologist I was working with, the narrative quickly went from ‘what is this new tool? Can I trust it?’ in an initial meeting to, a few meetings later, ‘I was just at a conference, Seurat is everywhere; you now have no choice, you must now present your data using Seurat, or no one will take you seriously.’
After diving into Seurat’s technical ideas, I saw that a key driver of such viral spread must be simply the ease of adoption of the package and the attractiveness of the produced plots, almost divorced from other considerations. The package was very well designed for ‘unboxing’ by newcomers, effortlessly delivered colorful presentations of some supplied data sets, and following the rules for creating general purpose software for use with other data sets. In my opinion, this ease of adapting the tool to other data sets definitely propelled Seurat’s adoption.
By the way, Seurat was not the lone option; many competing methodological tools were available. I studied some of them and came to the conclusion that Seurat offered the least ‘reproducibility friction,’ and so it spread.
8.4. Onset of Frictionless Reproducibility as a Singularity
Today’s onset of frictionless reproducibility will have major consequences for methodological research. To see this, consider lessons from The Singularity is Near (Kurzweil, 2005), a prominent influence in discussions about modern AI.
In 2005, computer technologist Ray Kurzweil famously forecast that “The Singularity” would take place around 2030, after a factor 10^{14} more compute would become available to humanity, compared to its 2005 level. This forecast was derived by stacking many fascinating speculations, at the top end approaching the spiritual and religious. Kurzweil’s audacity firmly planted the idea of “The Singularity” in the minds of intellectuals, after which computer scientist Eliezer Yudkowsky52 and other adherents of the “Less Wrong’’ online community53 focused attention on the problem of “AI Alignment”54—the worry that AI will one day surpass us, and then, afterwards, be cruel to us. The most vocal worriers are “AI Doomers”; they fear the singularity will arrive with a “hard takeoff” that will overwhelm us with destructive consequences before we even understand what is happening.
Kurzweil quotes one of the 20th century’s most prominent mathematicians, John von Neumann:
The history of technology ... gives the appearance of approaching some essential singularity in the history of the race, beyond which, human affairs, as we know them, cannot continue. (Ulam, 1958, page 5)
Von Neumann introduces the idea that a singularity is coming.55 But when? Kurzweil presents Figure 1.
Here Kurzweil’s vertical axis presents “time between notable events” and the horizontal axis shows time before the present day. The interval between notable events is dropping toward zero, signaling acceleration in the pace of human affairs.56
Figure 1, the Von Neumann quote, and Kurzweil’s rhetoric build towards a working definition:
when the time between notable events drops to zero, we are at singularity.
The ‘reaching zero moment’ is quite explicit in both Von Neumann’s and Kurzweil’s accounts; it involves observables we can directly sense. I adopt this as my criterion for ‘dating a singularity’.57,58,59
Compare to frictionless reproducibility in computational science. FR heralds a drop to zero of the human effort required to reproduce a computational result. This micro-phenomenon (time for a researcher to reproduce one result dropping, essentially to zero) in turn drives a macro-phenomenon—the time for a field to globally adopt a new dominant methodology—also dropping, essentially to zero. As we saw in the rapid embrace of Seurat for single cell RNA data analysis.
I see analogies to the physicist’s idea of superconductivity, where an electrical conductor’s resistance drops to zero. Many have explained that easy access to superconductivity could one day have amazing consequences for the energy industry and the world economy.60 I likewise sense that frictionless reproducibility can have, and already is having, impressive consequences.
Everywhere we look, progress in science and technology is speeding up. At the same time, we see a transition to frictionless exchange of digital research artifacts. At the macro-level, speedup; at the micro-level, frictionlessness.
The mRNA vaccine story of 2020 is a well-known emblem of science acceleration. Early in the COVID-19 pandemic we were told that novel vaccine development has never taken place in less than a decade, and yet in about a year hundreds of millions of people were already vaccinated, in the United States mostly with mRNA vaccines (Bourla, 2022). A key part of the story, often left out, is the frictionless spread of information about the SARS-COV-2 virus, most importantly its sequence. The virus’s RNA sequence data were published by Chinese scientists on virology websites in mid-January 2020 and within days virologists all over the world were analyzing the sequence and very soon had vaccine candidates.61 The new mRNA technology fit perfectly with this situation as it allowed virologists to design a vaccine candidate directly from digital sequence—a comparatively frictionless process. The friction in the end-to-end process was all in the traditional political, regulatory, manufacturing, and public outreach realms. Companies like Pfizer-BioNTech and Moderna are justly proud of the amazing operational job they did in immunizing billions of people in 2020–2021; this practical effort was unprecedented. The scientific and technological translation was, comparatively speaking, instantaneous (Ball, 2020).62,63
Key enablers of this rapidity were [FR-1] data sharing (of the virus’s RNA sequence) and widespread [FR-2] code sharing (of algorithms that could analyze and translate that sequence in various ways).
We earlier discussed the staggering rate of recent progress in large language models. The sudden emergence of GPT-4 and ChatGPT in last year’s public discourse spotlights one company’s work; but it is better viewed as evidence that the NLP field as a whole has made very dramatic progress across many tasks and challenges in the last decade. Consistent with our theme, the habits and institutions of [FR-1]+[FR-2]+[FR-3] are, as we expect, very strong in NLP. The availability of massive language data sets ([FR-1]), the sharing of architectures and fitted models ([FR-2]), and the popularity of challenges ([FR-3]) are heavily present in this field; their synergy and associated institutions, including CS conferences64 and Open Review of submissions65 shaped the community of NLP researchers into a frictionless research exchange.
Again, frictionlessness at the micro level, rapid progress at the macro.
Combining the above comments, and applying the criteria given above, I propose we are at a singularity—the reproducibility singularity. I do not propose dramatic claims of mass unemployment and human extinction. I do propose instead fundamental changes affecting computation-driven research and its rate of progress.
9. Revolutions at the Singularity
The crossover to FR upends our expectations about research and researcher behaviors, in two ways.
9.1. Computational Epistemology
In the ‘comprehensive FR’ regime, where every computation can be reproduced, adherent researchers no longer depend on hypotheticals. They instead limit themselves to documenting facts:
What: a specific workflow does; On: a specific publicly available data set; According to: a specific task performance measurement; Using: publicly available code.
This is an extremely limited agenda for research discussion! In exchange for the limitations, one gets two benefits:
Epistemic modesty: The research literature is transparent about what is being asserted and under what conditions.
Full computational reproducibility: The research literature becomes a transcript of actual code executions, and so is, simply, true.
This approach to epistemology is completely a creature of the frictionless reproducibility triad and would make no sense without it. One is only making claims that are immediately verifiable because of the assumed [FR-1]+[FR-2]+[FR-3] setting. This intertwining is spelled out in the table below:
Table 2. Epistemology and FR components.
Facet Documented
Facet explanation
Triad Elements Involved
What:
Specific Workflow Does
[FR-1]+[FR-2]+[FR-3]
On:
Specific, Publicly Available, Dataset
[FR-1: Data Sharing]
According to:
Specific Task Performance metric
[FR-3: Challenges ]
Using:
Specific Publicly Available Code
[FR-2: Code Sharing ]
What is obviously gained is efficiency in article composition, and efficiency in article evaluation. Less obvious is that this epistemology enables emergence of an FRX, which is a research community exchanging digital research artifacts rather than more nebulous entities.
What is potentially gained compared to traditional scientific publication is unprecedented virality. In EML, some articles showing a fertile new architectural concept breaking some performance barrier can garner massive impact quickly (e.g., ResNet, Transformers), by being rapidly incorporated into thousands of works of other community members.
Look at the ranking of top journals science-wide, ranked by accepted citation metrics. Over the last decade, coincident with the maturation of the data science components [FR-1]+[FR-2]+[FR-3], the journal impact leaderboard, long heavy with incumbent biology, medicine, and general science journals, has been disrupted; three of the top 10 journals are today from engineering fields that follow this article layout.66 Ten years ago, these venues were nowhere to be seen in the journal leaderboard. A true publishing revolution.
The rapid rise up the leaderboard is easy to understand. Traditional science epistemology was heavy with counterfactuals. Authors discussed consequences of some hypothetical intervention on hypothetical measurements and compared those with observed measurements. In fact, over the last century, human intelligence was more or less equated with the articulate use of counterfactuals, conditionals, and hypotheticals in discourse. The new context removes the burden of articulate deployment and parsing of counterfactuals.
9.2. Research Mindset
There is also a ‘disruption’ in inter-researcher professional discussion. When I observe today’s younger data scientists, I note that their typical interactions exchange information about their information technology stacks. Typical questions include:
What’s your package name?
What’s your URL? QR Code?
Is package <X> on your stack? (<X> varies from conversation to conversation)
More generally, they are often asking for details underlying the FR triad: They want to know, are you data sharing? Using some new shared data set I have not previously heard of? Are you code sharing? Using some new shared codebase I haven’t previously heard of? Is there some new specific numerical performance measure? Is there a challenge that just dropped? How can I get single-click access to the work you are describing? These are all signs that these researchers are participants in an FRX—a research community exchanging digital research artifacts.
Traditional intellectual discourse was heavy with conditionals and counterfactuals—that is, thought experiments. Computation was so difficult that the ability to run mental simulations of what computations might deliver, if we could conceivably do them, was a sign of real intellectual penetration. Today it becomes a big ‘turn-off’ to propose thought experiments and impossible hypotheticals like ‘you could potentially try to do such-and-such.’ One might as well suggest a fly-by of Venus.
The currency of discourse today is frictionless replications. Researchers are thinking: Can’t I just try this now?
10. Actions Readers Can Take Immediately
Here are some action items implied by the argument so far.
10.1. Exemplify Frictionless Reproducibility
In line with the spirit of the times, try to make your own research work capable of viral spread among data scientists. You can best do this by having it be single-click reproducible.
Suppose you are a mathematical scientist who thought of your job as mathematically describing a data science procedure and creating a theorem to probe its behavior. That approach worked for my generation, but it runs afoul of the ‘URL or it didn’t happen’ mindset of modern life. Today, you might also implement your tool computationally, share code, evaluate it on shared data, and make formalized quantitative comparisons of your work with standard baselines. Implicitly in each research product you can be either participating in an existing challenge, or creating a new reproducible challenge problem and proposing its first entry. If there are today no relevant datasets or baselines, you can productively work to create them. Creating a dataset or challenge can have much more impact than merely theorem-izing a specific tool. If you don’t immediately see a way to integrate the three elements [FR-1]+[FR-2]+[FR-3] into your work, sense this as a puzzle to unlock an opportunity.
Alternatively, suppose you work in a nonmathematical field, say in medicine, that does not currently use one or more legs of the triad. Work to change this, for example, instituting the sharing of data and code and documenting performance. Such work can be visionary and foundational, and there can be grants to support it. This has been proven for example by the Nightingale Open Science project. Medicine is just one example; this pattern can repeat throughout science and technology.
10.2. Research Opportunities
Dataset Bias. Are the shared public databases we depend on for challenges a ‘reasonable representation’ of the dataset universe, or are selective/opportunistic publication in some way skewing our methodological research? Are there ways to better incentivize datafication? Data sharing? Are there negative consequences of this approach, perhaps ones we can guard against or mediate?
Are there types of datasets that are flatly incompatible with this approach? Data that are changing? Datasets that are too small? Too Large? Are there adaptations of the whole paradigm that should be adopted? See, for example, Varoquaux and Cheplygina (2022).
Availability Bias. I perceive a heavy bias of researchers toward certain popular approaches over others; often this is because of [FR-2: Code Sharing]. Thus, deep learning is considered ‘first priority’ by today’s empirical ML researchers, whereas traditional tree-based prediction methods like CART67 and random forests are considered deprecated, perhaps because the shared code resources for deep learning became more convenient to access—although they are much more computationally expensive to apply. For many non-image, nonlanguage so-called ‘tabular’ datasets, the tree-based methods make more intuitive and practical sense. But if few researchers want to use them in challenge entries, we face ignorance about whether those tools work, or how they compare.
Challenge Bias. In the new regime, we have groups of researchers competing to improve performance in task performance on shared public databases. There are predictable consequences of this approach, such as overfitting but perhaps also other negative ones not yet discussed? Are there novel computational tools to ward off the negative effects (Neto et al., 2016), or adjust for them?
Challenge Design. How do we design better challenges? How do we invent performance criteria for important problems that never were subject to challenges before, thereby enabling access to a new paradigm. Liu et al. (2022) gives an intriguing new avenue: reproducibility challenges. Similarly, can we design challenges to promote interpretability.
Exchange Design. How do we entice a group of researchers to participate in our challenge? What practices work and do not work? How do we ignite a group of researchers into a dynamic, productive FRX? How do we catalyze FRX takeoff?
Resource Blindness. A particular feature of the new regime is that doing things just about as well but dramatically more efficiently is typically not at all valorized. Have brutal scaling and blind focus on performance obscured tremendous efficiencies lying in plain sight? What would the more efficient methods be? How could uncovering them be valorized effectively? Can we design challenges to promote more efficient algorithms?
10.3. Teaching Opportunities
Push Back against Faulty Narratives! This article, together with its companion, exposes a clash of narratives, between the highly visible ‘AI singularity’ narrative and the contrasting narrative that I prefer, the little-known but historically grounded ‘data science finally takes hold.’ The AI singularity narrative promotes powerlessness and is demoralizing to the young. Data science instruction can proclaim an accurate and hopeful counternarrative.
This narrative will better equip students for the future and induce more respect of students for the material they are learning and for the value of their own educations. It can be injected everywhere in academic data science curriculum.
New Advanced Courses. In fall 2023, Xiaoyan (XY) Han and I taught Statistics 335 at Stanford–The Challenge Problems Paradigm–and discussed its ingredients and its consequences, both in ML and elsewhere. As with other 300-level courses in my department, this course may have PhD students and ambitious masters students enrolled. We engaged students in many of the issues discussed in this article, in particular, to thinking about the history to date of this approach, some of its greatest achievements and some variations and modifications.
New Undergraduate Courses. The game has changed for data-driven methodological development, but actual instruction has not adapted. The National Academies conducted a study (National Academies of Sciences Engineering and Medicine, 2018) of the undergraduate data science curriculum in 2017–2018; this already looks in need of severe revision. Many of today’s students simply would not be able to get value from statistics courses taught the way they were taught before 2010, yet some courses at some institutions have not really adapted. The profusion of publicly shared datasets and data analysis scripts, along with on-line teaching resources such as YouTube videos, has already changed the expectations of students, about what they want to learn and how; in particular, students have the expectation of frictionless access to tools and other artifacts and traditional coursework defeats this. Probably we do not yet agree on what comes next, but our students are in a completely new mindset and completely new approaches are needed.
10.3.1. Software Developer Opportunities
Frictionless reproducibility is here, now; in the future we will see more and more research fields following its discipline, meaning there is an opportunity for services and tools to support and build upon the triad [FR-1]+[FR-2]+[FR-3], including outside the context of empirical machine learning. To me the big opportunity lies in making some of the predictions listed below, come true.
11. Predictions
This article began life for a session on ‘The Next 50 years of Data Science’ at the 2023 Joint Statistical Meetings. Implicitly, we were being asked for predictions; here are several. See also Donoho (2017).
11.1. Computation on Research Artifacts (CORA)
Frictionless reproducibility in computational research is now essentially here. As a side effect, this makes available a vast array of digital research artifacts generated by earlier research. Soon, those byproducts will be viewed as digital gold.
In many research disciplines, reports spotlight tables that constitute the main deliverables of the work being presented. Such tables bring together chosen data and algorithms and specific task performance metrics. In methodological work, they can often be viewed as what an EML researcher would call a ‘mini-challenge leaderboard’; I am speaking here inclusively, broadly, outside the EML context, and allow an enlarged definition of task and task performance.
In the near future, researchers will understand that future researchers, digesting the given research work, would wish to directly probe the computations that generated the numbers in the researcher’s tables, as a way of engaging with the researcher’s result and possibly building upon it. Whole communities will expect their researchers to expose the data and workflow that produced the surfaced numerical results.
New tools will soon enough become available to automatically enable frictionless access to the underlying digital artifacts that went into any relevant table in a study. This will happen seamlessly, as part of ordinary research and publication; standard research computing and publication computing environments, with little or no effort, will automatically expose the underlying data, code, and performance measurements that delivered these results.
Exposing such artifacts enables other data scientists who read a published report to build upon the research that has been done in that report: to first reproduce and, later, to possibly modify some underlying algorithms or some underlying observables and recompute the table, thereby producing a new research project.
This enters us all into a new era, where computing on the digital research artifacts created by previous research computing (for which we adopt the label CORA) allows us to write algorithms to inspect and generalize the workflows used in previous research, and thereby algorithmically obtain new research results. We will begin to naturally think and operationalize at a new level as follows: ’Take project X’s workflow and everywhere replace algorithm A with algorithm B, call the result "project Y’s workflow." Execute the full workflow of project Y, computing all the same observables as you did for Project X.’
As a vision, CORA has been clear to many data scientists for some time. The quantitative programming environment Mathematica and its associated notebook system delivered some of this functionally already, at least 30 years ago. Stephen Wolfram in several venues has evoked his vision of computing on anything, broad enough to include CORA.68
CodaLab69,70 is a platform for running challenge problem contests that can inherently perform CORA, for certain specific tasks (e.g., swap the public test data set that was used by a certain contest entry, for a certain private data set sequestered from public view, and reevaluate specific performance metrics after the swap).71
My prediction is that CORA will inevitably emerge and grow in scope, to outstanding effect, increasing the pace of scientific advance. My thinking derives from (practically unknown) papers by the author and Matan Gavish (Gavish & Donoho, 2011, 2012) in which we described the notion of Verifiable Computational Results (VCR), and three dream applications.
VCR, where implemented, ensures that digital artifacts emerging during a computation are automatically captured and made frictionlessly available to future researchers; and dream applications exploit these artifacts to automatically generalize upon earlier work (e.g., swap this computational method for the one originally used in prior research, and show me the new table that results) or automatically evaluate earlier research (e.g., bootstrap this entry in that previously published table and show me the histogram that results).
It seems to me that full implementation of CORA, when it happens, will be a transformative achievement! Also, that we as a scientific computing world, are headed toward that achievement, and its fulfillment dates the actual moment of the reproducibility singularity.
11.2. CORA and the ‘End Times.’
Kurzweil calls attention to a very dramatic moment, coming soon—yielding priority to I. J. Good, Speculations concerning the first ultraintelligent machine(Good, 1966):
Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an “intelligence explosion,” and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make.
Good was an amazing intellectual (Good, 1983), who served as Alan Turing’s research assistant and who ‘kept Bayesian statistics alive’ during its exile from academic favor (Neapolitan, 2008). As you can see, he was a stylish writer and his formulation is charismatic and compelling. It is easy to see why Kurzweil latches onto it and how it might inspire “AI Doom” influencer threads; Eliezer Yudkowsky (2008) coined the notion of a “hard takeoff” or FOOM,72 where machine intelligence starts to expand its own powers hyper-exponentially. Good seems to anticipate hard takeoff.
Back to Earth. Let us acknowledge that, in certain ways, performance on challenges is inevitably and automatically going to improve markedly post-transition to FR. Indeed, it is a simple matter to take any given frictionlessly reproducible workflow and modify it by trial and error to produce a new workflow that may improve it. Consider two meta-operators on workflows:
Transformation 1: Wrappers
Adopt a given existing workflow as a starting point.
Prepend a new workflow element to preprocess the inputs
Append a new workflow element to postprocess the outputs
Obtain a new workflow
Transformation 2: Tweaks
Adopt a given existing workflow as a starting point.
Assuming this workflow exposes internal hyperparameters that were previously unexplored:
Systematically vary those unexplored hyperparameters
Extract the optimal hyperparameter combination
Obtain a new workflow
Much of the EML research activity we have seen in recent years reports discoveries of successful applications of these two strategies, in many cases adding new layers/gadgets to a multilayer architecture (Transformation 1); or modifying an existing architecture by, say, adding extra width (Transformation 2). The new models produced by such meta-operations can then be evaluated robotically to find one that improves performance.
This has been automated in empirical machine learning (e.g., with famous systems like AutoML [Hutter et al., 2015; Hutter et al., 2019; Karmaker ("Santu’’) et al., 2021] and Vizier [Golovin et al., 2017; Song et al., 2023]); the process points in the direction of Jack Good’s vision. We avoid human guesswork and blind luck to get good models; hyperparameter search is done to a certain degree robotically.
These meta-operators involve operations on workflows (Drori et al., 2021)—digital artifacts produced by previous research computing. Hence these are instances of what has earlier been discussed and labeled CORA. In any field that invests in the FR regime, the digital artifacts are there, as a side effect of publication, available frictionlessly to be exploited. It is just a matter of adopting the mindset to exploit the artifacts to do future research.
As individual fields invest in FR, they will transition to CORA; the transition may even enable algorithmic improvement of models and produce a hard takeoff of model performance. Maybe the last decade’s steep increase in performance in EML is an approximation to hard takeoff; the approximation manifesting the ‘still-slightly-frictioned,’ ‘still-human-handwork-was-involved,’ ‘CORA-was-still-in-its-infancy’ situation over the last decade.
Will CORA make machines hyper-intelligent by itself? No. All we will get via automated application of strategies like those listed here is incremental bumps to performance. We will painlessly and rapidly mine out the value of each human-supplied idea.
Peter Huber, in his book Data Analysis: What Can Be Learned From The Last 50 Years(2012) contrasts strategy (high-level wisdom) and tactics (low-level, step-by-step). Using his terminology, we predict that, as FR takes hold in a field, and CORA becomes ubiquitous, humans will increasingly focus on strategic choices about model architectures and types of hyperparameters; while increasingly, information technology will automate the tactical details of finding the best instances within those strategic constraints.
11.3. What’s Next
Certain features of research life going forward from that point are very clear.
Irreversibility. The transition to FR is a one-way transition: the wishes underlying [FR-1]+[FR-2]+[FR-3] have been around as long as there has been data-driven science; the ability to fulfill those wishes is now provided by a complete technology stack that came into existence to service the global communications/computation infrastructure. That infrastructure is here to stay. The desire for FR will never go away; the ability to offer FR is here to stay.
Acceleration. Research fields that have transitioned to the frictionless reproducibility regime will progress even faster in the future as the habits get even more ingrained, producing even higher quality with ever higher velocity. Further, a community of researchers working together according to this pattern will recognize and seize new opportunities as a result.
Stagnation. Some fields will not or cannot practice mature forms of the three data science initiatives we have discussed. Barriers to transition can include: inhibitions against data sharing, for example, because of confidentiality restrictions; inhibitions against code sharing, for example, because of proprietary restrictions; inhibitions against prediction challenges, for example, because of a preference for theoretical derivations disconnected from empirical measurements.
Such fields will not transition to a frictionless reproducibility regime. They will be noticeably lagging behind in rate of progress. Those fields may soon enough be recognized as backwaters.
Resource Allocation. Funding agencies will recognize that they get higher rates of return on investments in research communities that have transitioned to frictionless reproducibility; they will invest more heavily in such fields over time. Research talent will follow the dollars. The separation between fields, initially from reproducibility practices, will then involve funding, compounding the contrast between accelerating and stagnating fields.
Some other features of our future are less obvious.
Opacity. A key point made by Kurzweil in “The Singularity Is Near” is that progress ‘on the other side’ of “The Singularity” will be so amazing, we cannot even imagine what will be accomplished. This seems fair to me. Seniors among us became used to the idea that certain research topics progress slowly or not at all; are best learned by dint of long study and philosophical reflection; and research contributions happen on the time scale of entire careers. Minds trained in such a regime might not adjust to how rapidly things will soon be happening.
Of course, some research topics are, so far, untransformed by the three data science initiatives. But eventually some will be so transformed, and research in such areas will rapidly move far beyond what experts of earlier generations would have thought possible, or even conceivable. Others may be essentially forgotten. Hence, the reproducibility singularity is indeed opaque.
Amnesia. The opacity of the reproducibility singularity works in reverse. Much of the work pre-reproducibility transition will be forgotten. Listening to the young makes this clear.
Current Stanford Statistics PhD student Apratim Dey told me in personal conversation (May 2023) about a Stanford CS Class on ‘Distribution Shift’ in Spring 2023. In that class, the initial course reading was a more or less traditional academic Statistics journal article from 20 or so years ago that established the field (more or less). That paper posed a generative probabilistic model for the distribution shift problem and performed a formal analysis using it. Later in the term, papers came from leading CS conferences and followed a more empirical bent, using shared data and code.
The CS students in this course apparently were not fans of the academic statistics paper, although they were told it was fundamental. In discussing this paper, they pursued critiques along these lines:
It’s based on a theoretical generative model
I don’t know how to tweak or improve a theoretical generative model
I don’t know if this generative model describes any real data
I don’t know if the phenomena it describes occur on any real data
If I do research on real data, I can share results with others, publish in a CS conference.
If I instead do research with a generative model, as is done here: I have no immediate outlet for my work, no readership, no follow-up. It could take years and years to publish in a traditional stats journal.
Maybe these are just comments about one paper; but possibly they are comments about papers of this type. Namely, papers not following today’s empirical data science style, which involves building data and code and empirical performance on the work of others. The CS students see the older foundational research as irrelevant under today’s frictionlessly reproducible regime; it doesn’t offer the opportunity for single-click adoption and code reexecution by students. Engaging with such nonconforming work disrupts the students’ state of ‘flow’; hence such papers are seen as frustrating. Such work confronts students, in their view, with gotchas and obstacles. It detains them from efficient progress to publication in the outlets they prefer. Such pre-FR work offers friction, not fulfillment.
This younger CS generation is steeped in the recent empirical machine learning literature, much of which is powered by FR/FRX, not in the mathematically powered discourse practiced previously, which is often based on hypotheticals and counterfactuals; it sees little need to engage intellectually with such discourse.
We may be starting a rapid de-skilling transition, in which new researchers initially do not want to, and later simply cannot, onboard the lessons of symbolic analysis of generative models.
12. Conclusion
A data science–driven phase transition, crossing a kind of singularity, is happening now.
Not the ‘AI singularity’— the moment when AGI73 is achieved—this has not happened. Rather, some fields in computation-driven research are approaching a reproducibility singularity, the moment when all the ingredients of frictionless reproducibility (FR) of computational research are assembled in a package that makes it, essentially, immediate to build on the work of other researchers on an open, global stage. A research community working fully within this regime creates and exchanges digital artifacts following new models of scientific publication and epistemology; this is a research superpower enabling strikingly rapid progress.
This new regime has been best approximated within the field of empirical machine learning, which indeed has made very rapid progress over the last decade in the fields of computer vision and natural language processing.
The FR regime is best thought of, not as ‘property’ of EML, but instead as a more general achievement of dedicated data scientists working for decades, working in many research disciplines, including those where empirical machine learning and AI historically plays little role.
Today’s ‘AI singularity’ narrative—that the last decade was the decade where AI was achieved—is premature; this was instead the decade when data science matured, where a global computing and communication infrastructure came together, and frictionless delivery of research artifacts emerged, enabling computational science as a solid new avenue to scientific validity.
The next generation of data scientists can leverage this understanding in building their own careers to identify interesting questions to work on and to make their research impact more viral.
Acknowledgments
I would like to thank Genevera Allen of Rice University for organizing the session ‘The Next 50 Years of Data Science’ at the 2023 Joint Statistical Meetings in Toronto and giving me the opportunity to participate. I would also like to thank my fellow committee members on the NASEM Committee on Reproducibility and Replication (Harvey Fienberg, Chair), from whom I learned a great deal about the global, science-wide nature of the changes going on in science and the advances in reproducibility. Also, I thank my fellow members on the NASEM study on Envisioning the Data Science Discipline: The Undergraduate Perspective(David Culler, Chair) for making me aware of the many currents of change in Data Science education.
I had the fascinating chance to witness ‘from inside’ the NUScon challenge emerge in NMR Signal Processing; many thanks to Adam Schuyler and Jeffrey C. Hoch (University of Connecticut Health Sciences Center).
Certain ‘insiders’ corrected various comments I made about their work in an earlier draft. Thanks for these corrections are due to: Isabelle Guyon (Université de Paris), John Moult (University of Maryland), Jonathon Phillips (NIST), Benjamin Recht (University of California, Berkeley), and Steven Ruberg (Analytix Thinking).
The Editor of Harvard Data Science Review, Prof. Xiao-Li Meng, and the anonymous reviewers have materially improved the manuscript through several revisions. I am deeply grateful for their willingness to engage in this process with full attention and interest.
Brian Wandell (Stanford) provoked the title. In alphabetical order, Milad Bakhshizadeh(Stanford), Apratim Dey (Stanford), Andrew Donoho (Donoho Design Group), Xiaoyan Han (Cornell), Vardan Papyan (Toronto), Elad Romanov (Stanford) and Yu Wang (Stanford) made many helpful comments. This is closely related to work in progress by the author and Matan Gavish (Hebrew University); I also want to acknowledge Victoria Stodden (University of Southern California) for many conversations on reproducibility over the years.
Disclosure Statement
David Donoho has no financial or non-financial disclosures to share for this article.
Benson, D. A., Cavanaugh, M., Clark, K., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., & Sayers, E. W. (2013). GenBank. NucleicAcids Research, 41(D1), D36–D42. https://doi.org/10.1093/nar/gks1195
Blasco, A., Endres, M. G., Sergeev, R. A., Jonchhe, A., Macaluso, N. J. M., Narayan, R., Natoli, T., Paik, J. H., Briney, B., Wu, C., Su, A. I., Subramanian, A., & Lakhani, K. R. (2019). Advancing computational biology and bioinformatics research through open innovation competitions. PLoS One, 14(9), Article e0222165. https://doi.org/10.1371/journal.pone.0222165
Bottou, L., Cortes, C., Denker, J. S., Drucker, H., Guyon, I., Jackel, L. D., LeCun, Y., Muller, U. A., Sackinger, E., Simard, P., & Vapnik, V. (1994). Comparison of classifier methods: A case study in handwritten digit recognition. In Proceedings of the 12th IAPR International Conference onPattern Recognition (Vol. 3, pp. 77–82). IEEE Computer Society. https://doi.org/10.1109/ICPR.1994.576879
Bourla, A. (2022). Moonshot: Inside Pfizer’s nine-month race to make the impossible possible. Harper Collins.
Buckheit, J. B., & Donoho, D. L. (1995). WaveLab and Reproducible Research. In A. Antoniadis & G. Oppenheim (Eds.), Wavelets and Statistics (pp. 55–81). Springer. https://doi.org/10.1007/978-1-4612-2544-7_5
Donoho, D. (2024). AI narratives at the singularity [Manuscript in preparation]
Donoho, D. L., Maleki, A., Rahman, I. U., Shahram, M., & Stodden, V. (2009). Reproducible research in computational harmonic analysis. Computing in Science and Engineering, 11(1), 8–18. https://doi.org/10.1109/MCSE.2009.15
Drori, I., Krishnamurthy, Y., Rampin, R., de Paula Lourenco, R., Ono, J. P., Cho, K., Silva, C., & Freire, J. (2021). AlphaD3M: Machine learning pipeline synthesis. ArXiv. https://doi.org/10.48550/arXiv.2111.02508
Garris, M. D., Tabassi, E., & Wilson, C. L. (2006). NIST fingerprint evaluations and developments. Proceedings of the IEEE, 94(11), 1915–1926. https://doi.org/10.1109/JPROC.2006.885130
Garris, M. D., & Wilson, C. L. (2005). NIST biometric evaluations and developments. In M. J. DeWeert, & T. T. Saito (Eds.), Photonics for Port and Harbor Security (pp. 26–38). SPIE. https://doi.org/10.1117/12.607598
Gavish, M., & Donoho, D. (2012). Three dream applications of Verifiable Computational Results. Computing in Science & Engineering, 14(4), 26–31. https://doi.org/10.1109/MCSE.2012.65
Golovin, D., Solnik, B., Moitra, S., Kochanski, G., Karro, J., & Sculley, D. (2017). Google Vizier: A service for black-box optimization. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1487–1495). Association for Computer Machinery. https://doi.org/10.1145/3097983.3098043
Good, I. J. (1966). Speculations concerning the first ultraintelligent machine. In F. L. Alt, & M. Rubinoff (Eds.), Advances in computers (Vol. 6, pp. 31–88). Elsevier. https://doi.org/10.1016/S0065-2458(08)60418-0
Good, I. J. (1983). Good thinking: The foundations of probability and its applications. University of Minnesota Press.
Gribov, A., Sill, M., Lück, S., Rücker, F., Döhner, K., Bullinger, L., Benner, A., & Unwin, A. (2010). SEURAT: Visual analytics for the integrated analysis of microarray data. BMC Medical Genomics, 3(1), Article 21. https://doi.org/10.1186/1755-8794-3-21
Grüning, B., Chilton, J., Köster, J., Dale, R., Soranzo, N., van den Beek, M., Goecks, J., Backofen, R., Nekrutenko, A., & Taylor, J. (2018). Practical computational reproducibility in the Life Sciences. Cell Systems, 6(6), 631–635. https://doi.org/10.1016/j.cels.2018.03.014
Guyon, I., Bennett, K., Cawley, G., Escalante, H. J., Escalera, S., Ho, T. K., Macià, N., Ray, B., Saeed, M., Statnikov, A., Viegas, E. (2015). Design of the 2015 ChaLearn AutoML challenge. In 2015 International Joint Conference on Neural Networks (IJCNN) (pp. 1–8). IEEE. https://doi.org/10.1109/IJCNN.2015.7280767
Guyon, I., Dror, G., Lemaire, V., Taylor, G., & Aha, D. W. (2011). Unsupervised and transfer learning challenge. In The 2011 International Joint Conference on Neural Networks (793–800). IEEE. https://doi.org/10.1109/IJCNN.2011.6033302
Guyon, I., Gunn, S., Hur, A. B., & Dror, G. (2006). Design and analysis of the NIPS2003 challenge. In I. Guyon, M. Nikravesh, S. Gunn, & L. A. Zadeh (Eds.), Feature extraction: Foundations and applications (pp. 237–263). Springer. https://doi.org/10.1007/978-3-540-35488-8_10
Guyon, I., Saffari, A., Dror, G., & Cawley, G. (2008). Analysis of the IJCNN 2007 agnostic learning vs. prior knowledge challenge. Neural Networks, 21(2–3), 544–550. https://doi.org/10.1016/j.neunet.2007.12.024
Guyon, I., Schomaker, L., Plamondon, R., Liberman, M., & Janet, S. (1994). UNIPEN project of on-line data exchange and recognizer benchmarks. In Proceedings of the 12th IAPR International Conference on Pattern Recognition (Vol. 3, pp. 29–33). IEEE. https://doi.org/10.1109/ICPR.1994.576870
Halevy, A., Norvig, P., & Pereira, F. (2009). The unreasonable effectiveness of data. IEEE intelligent systems, 24(2), 8–12. https://doi.org/10.1109/MIS.2009.36
Hardt, M., & Recht, B. (2022). Patterns, predictions, and actions: Foundations of machine learning. Princeton University Press.
Huber, P. J. (2012). Data Analysis: What can be learned from the past 50 years. John Wiley & Sons.
Hutter, F., Kégl, B., Caruana, R., Guyon, I., Larochelle, H., & Viegas, E. (2015). Automatic machine learning (AutoML) [Paper presentation]. ICML 2015 Workshop on Resource-Efficient Machine Learning, 32nd International Conference on Machine Learning, Lille, France. https://hal.in2p3.fr/in2p3-01171463
Jiang, Y., Foret, P., Yak, S., Roy, D. M., Mobahi, H., Dziugaite, G. K., Bengio, S., Gunasekar, S., Guyon, I., & Neyshabur, B. (2020). NeurIPS 2020 competition: Predicting generalization in deep learning. ArXiv. https://doi.org/10.48550/arXiv.2012.07976
Karmaker (“Santu”), S. K., Hassan, M. M., Smith, M. J., Xu, L., Zhai, C., & Veeramachaneni, K. (2021). AutoML to date and beyond: Challenges and opportunities. ACM Computer Surveys, 54(8), Article 175. https://doi.org/10.1145/3470918
Kryshtafovych, A., Fidelis, K., & Moult, J. (2010). CASP: A driving force in protein structure modeling. In H. Rangwala, & G. Karypis (Eds.), Introduction to protein structure prediction: Methods and algorithms (pp. 15–32). Wiley. https://doi.org/10.1002/9780470882207.ch2
Kryshtafovych, A., Fidelis, K., & Moult, J. (2014). CASP10 results compared to those of previous CASP experiments. Proteins: Structure, Function, and Bioinformatics, 82(02), 164–174. https://doi.org/10.1002/prot.24448
Kryshtafovych, A., Schwede, T., Topf, M., Fidelis, K., & Moult, J. (2021). Critical assessment of methods of protein structure prediction (CASP)—Round XIV. Proteins: Structure, Function, and Bioinformatics, 89(12), 1607–1617. https://doi.org/10.1002/prot.26237
Kurzweil, R. (2005). The singularity is near: When humans transcend biology. Penguin.
Liu, J., Carlson, J., Pasek, J., Puchala, B., Rao, A., & Jagadish, H. V. (2022). Promoting and enabling reproducible data science through a reproducibility challenge. Harvard Data Science Review, 4(3). https://doi.org/10.1162/99608f92.9624ea51
Liu, Z., Pavao, A., Xu, Z., Escalera, S., Ferreira, F., Guyon, I., Hong, S., Hutter, F., Ji, R., Jacques, C. S., Jr., Li, G., Lindauer, M., Luo, Z., Madadi, M., Nierhoff, T., Niu, K., Pan, C., Stoll, D., Treguer, S., … Zhang, Y. (2021). Winning solutions and post-challenge analyses of the ChaLearn AutoDL challenge 2019. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(9), 3108–3125. https://doi.org/10.1109/TPAMI.2021.3075372
Lustig, R. H. (2018). The hacking of the American mind: The science behind the corporate takeover of our bodies and brains. Penguin.
Maciejewski, M. W., Schuyler, A. D., Gryk, M. R., Moraru, I. I., Romero, P. R., Ulrich, E. L., Eghbalnia, H. R., Livny, M., Delaglio, F., & Hoch, J. C. (2017). NMRbox: A resource for biomolecular NMR computation. Biophysical journal, 112(8), 1529–1534. https://doi.org/10.1016/j.bpj.2017.03.011
Moult, J. (2005). A decade of CASP: Progress, bottlenecks and prognosis in protein structure prediction. Current Opinion in Structural Biology, 15(3), 285–289. https://doi.org/10.1016/j.sbi.2005.05.011
Moult, J. (2006). Rigorous performance evaluation in protein structure modelling and implications for computational biology. Philosophical Transactions of the Royal Society B: Biological Sciences, 361(1467), 453–458. https://doi.org/10.1098/rstb.2005.1810
Moult, J., Fidelis, K., Kryshtafovych, A., Schwede, T., & Tramontano, A. (2018). Critical assessment of methods of protein structure prediction (CASP)—Round XII. Proteins: Structure, Function, and Bioinformatics, 86(Suppl 1), 7–15. https://doi.org/10.1002/prot.25415
Moult, J., Pedersen, J. T., Judson, R., & Fidelis, K. (1995). A large-scale experiment to assess protein structure prediction methods. Proteins: Structure, Function, and Bioinformatics, 23(3), ii–v. https://doi.org/10.1002/prot.340230303
National Academies of Sciences Engineering and Medicine. (2018). Envisioningthe data science discipline: The undergraduate perspective. Interim report. National Academies Press. https://www.nationalacademies.org/our-work/envisioning-the-data-science-discipline-the-undergraduate-perspectiveNational Academies of Sciences Engineering and Medicine. (2019). Reproducibility and replicability in science. National Academies Press. https://doi.org/10.17226/25303
National Academies of Sciences Engineering and Medicine. (2019). Reproducibility and replicability in science. National Academies Press. https://doi.org/10.17226/25303
Neapolitan, R. E. (2008). A polemic for Bayesian statistics. In D. E. Holmes & L. C. Jain (Eds.), Innovations in Bayesian networks: Theory and applications (pp. 7–32). Springer. https://doi.org/10.1007/978-3-540-85066-3_2
Neto, E. C., Hoff, B. R., Bare, C., Bot, B. M., Yu, T., Magravite, L., Trister, A. D., Norman, T., Meyer, P., Saez-Rodrigues, J., Costello, J. C., Guinney, J., & Stolovitzky, G. (2016). Reducing overfitting in challenge-based competitions. ArXiv.https://doi.org/10.48550/arXiv.1607.00091
Pavao, A., Guyon, I., Letournel, A.-C., Tran, D.-T., Baro, X., Escalante, H. J., Escalera, S., Thomas, T., & Xu, Z. (2023). CodaLab Competitions: An open source platform to organize scientific challenges. Journal of Machine Learning Research, 24(198), 1–6. http://jmlr.org/papers/v24/21-1436.html
Pereira, W. J., Almeida, F. M., Conde, D., Balmant, K. M., Triozzi, P. M., Schmidt, H. W., Dervinis, C., Pappas, G. J. J., & Kirst, M. (2021). Asc-Seurat: Analytical single-cell Seurat-based web application. BMC Bioinformatics, 22(1), Article 556. https://doi.org/10.1186/s12859-021-04472-2
Phillips, P. J., Flynn, P. J., Beveridge, J. R., Scruggs, W. T., O’Toole, A. J., Bolme, D., Bowyer, K. W., Draper, B. A., Givens, G. H., Lui, Y. M., Sahibzada, H., Scallan, J. A., III., & Weimer, S. (2009). Overview of the Multiple Biometrics Grand Challenge. In M. Tistarelli, & M. S. Nixon (Eds.), Advances in Biometrics: Third International Conference, ICB 2009 (pp. 705–714). https://doi.org/10.1007/978-3-642-01793-3_72
Phillips, P. J., Flynn, P. J., Scruggs, T., Bowyer, K. W., Chang, J., Hoffman, K., Marques, J., Min, J., & Worek, W. (2005). Overview of the Face Recognition Grand Challenge. In C. Schmid, S. Soatto, & C. Tomasi (Eds.), 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Vol. 1, pp. 947–954). IEEE. https://doi.org/10.1109/CVPR.2005.268
Phillips, P. J., Moon, H., Rizvi, S. A., & Rauss, P. J. (2000). The FERET evaluation methodology for face-recognition algorithms. IEEE Transactions on pattern analysis and machine intelligence, 22(10), 1090–1104. https://doi.org/10.1109/34.879790
Phillips, P. J., Sarkar, S., Robledo, I., Grother, P., & Bowyer, K. (2002). The Gait Identification Challenge Problem: Data sets and baseline algorithm. In R. Kasturi, D. Laurendeau, & C. Suen (Eds.), 2002 International Conference on Pattern Recognition (Vol. 1, pp. 385–388). IEEE. https://doi.org/10.1109/ICPR.2002.1044731
Pustovalova, Y., Delaglio, F., Craft, D. L., Arthanari, H., Bax, A., Billeter, M., Bostock, M. J., Dashti, H., Hansen, D. F., Hyberts, S. G., Johnson, B. A., Kazimierczuk, K., Lu, H., Maciejewski, M., Miljenović, T. M., Mobli, M., Nietlispach, D., Orekhov, V., Powers, R., . . . Schuyler, A. D. (2021). NUScon: A community-driven platform for quantitative evaluation of nonuniform sampling in NMR. Magnetic Resonance, 2(2), 843–861. https://doi.org/10.5194/mr-2-843-2021
Ringwald, M., Eppig, J. T., & Richardson, J. E. (2000). GXD: Integrated access to gene expression data for the laboratory mouse. Trends in Genetics, 16(4), 188–190. https://doi.org/10.1016/s0168-9525(00)01983-1
Rizvi, S. A., Phillips, P. J., & Moon, H. (1998). The FERET verification testing protocol for face recognition algorithms. In Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition (pp. 48–53). IEEE. https://doi.org/10.1109/AFGR.1998.670924
Rosato, A., Bagaria, A., Baker, D., Bardiaux, B., Cavalli, A., Doreleijers, J. F., Giachetti, A., Guerry, P., Güntert, P., Herrmann, T., Huang, Y. J., Jonker, H. R. A., Mao, B., Malliavin, T. E., Montelione, G. T., Nilges, M., Raman, S., van der Schot, G., Vranken, W. F., . . . Bonvin, A. M. J. J. (2009). CASD-NMR: Critical assessment of automated structure determination by NMR. Nat Methods, 6(9), 625–626. https://doi.org/10.1038/nmeth0909-625
Ruberg, S., Zhang, Y., Showalter, H., & Shen, L. (2022). A platform for comparing subgroup identification methodologies. Biometrical Journal, 66(1), Article 2200164. https://doi.org/https://doi.org/10.1002/bimj.202200164
Song, X., Perel, S., Lee, C., Kochanski, G., & Golovin, D. (2023). Open Source Vizier: Distributed infrastructure and API for reliable and flexible blackbox optimization. ArXiv. https://doi.org/10.48550/arXiv.2207.13676
Ulam, S. (1958). John von Neumann, 1903-1957. Bulletin of the American Mathematical Society, 64(2, part 2), 1–49.
Varoquaux, G., & Cheplygina, V. (2022). Machine learning for medical imaging: Methodological failures and recommendations for the future. npj Digital Medicine, 5(1), Article 48. https://doi.org/10.1038/s41746-022-00592-y
Wigner, E. P. (1960). The unreasonable effectiveness of mathematics in the natural sciences. Communications on Pure and Applied Mathematics, 13(1), 1–14. https://doi.org/10.1002/cpa.3160130102
This fascinating look at slope game how frictionless reproducibility is changing computational research! The emphasis on data and code sharing and competitive challenges show how AI innovation is driven by collaboration. The idea that these approaches might expedite advancement across fields is exciting. Looking forward to the final edition!