Skip to main content
SearchLoginLogin or Signup

On Emergent Limits to Knowledge—Or, How to Trust the Robot Researchers: A Pocket Guide

Published onMay 24, 2024
On Emergent Limits to Knowledge—Or, How to Trust the Robot Researchers: A Pocket Guide
·
key-enterThis Pub is a Commentary on

Join or be lost! Says David Donoho (2024) to the research community in his recent HDSR publication “Data Science at the Singularity.” The colossal cloud infrastructure investments behind near-ubiquitous global mobile technologies have trickled down to scientific research through cloud compute and storage, I/O tools, data analysis tools, and frameworks, which in turn have generated broad and expanding communities of users and supporters. These new technological innovations are deeply disruptive to the research community, since they open new paths to knowledge creation that were previously inaccessible and largely culturally unknown. And these new paths dominate, leading to accelerated discovery, while the old paths, to a “backwater.”

The new discovery paths Donoho describes produce knowledge in radically new ways; ways that are no longer transparent to researchers and instead result in discoveries where there is essentially no expectation that we (humans) actually understand the underlying mechanisms, or the reasons why a particular solution works. This is a shift away from the goal of science being human understanding of why findings are correct, and toward trust in opaque computational discoveries. The scientific community’s response to this shift must be one of recognition and adjustment of our own research processes to maximize the correctness of our findings, and hence public trust in the research enterprise. Arguably, this growth of scientific discovery by leveraging increasingly complex computational discovery pipelines could reduce the gap between a card-carrying research scientist and the regular person in the street, with the key difference emerging that the scientist is equipped with talent and training in deploying and interpreting computational pipelines. In other words, these researchers trust their discovery process, which is not entirely transparent even to them. Nonscientists today tend to trust scientific findings because of trust in the scientific enterprise; defined by a set of practices and safeguards designed to increase the likelihood of the conclusions being true. The changes Donoho describes do not bring to mind the well-worn story of structural resistance to technological change but instead a fundamental shift in the values of the research community. 

Donoho presents a trifecta, already well underway, defining “frictionless reproducibility” in scientific discovery. This means broad access to: 1) input data; 2) software steps visited upon the data through data preparation, model estimation, and prediction; and 3) correctness assessments determined by benchmarks fixed in advance through the “challenge problem” approach to discovery. The challenge problem approach defines a quantitative metric for success and, crucially, enlists a neutral third party who sequesters a test data set and who then receives code submissions and evaluates models on the test data set. A winner is declared by comparing the test metric output for the various code submissions. 

This is roughly the approach used by Kaggle and Codabench, among others (Liu et al., 2022), and is primarily responsible for successes such as machine translation research leading to widely deployed products such as Apple’s digital assistant Siri through DARPA’s (Defense Advanced Research Project Agency) CALO (Cognitive Assistant that Learns and Organizes) project, and the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) (Russakovsky et al., 2015), to name a few. In these cases, the solutions typically manifest in such complexity of code and workflows that would be hopelessly unrealistic to expect a human being to read the code to understand the scientific mechanism at work, to say the least.

No One Knows How and Nobody Cares

This new path of challenge problems entails a deep and novel blending of scientific discovery and engineering. Mathematical insights, computational and empirical discoveries, and hardware advances all drive challenge breakthroughs. No one knows exactly how machine translation ‘does it’ in any particular case, and this does not seem to be an active research goal. Instead the focus is producing results that satisfy widely recognized benchmark performance goals. Arguably, this is all that matters. Is there any interest in finding a cognitively tractable explanation for how large language models return sensible responses to prompts? With models routinely containing billions of parameters, a direct interpretation of the mechanism of response creation is obviously far out of reach, and perhaps such an expectation is a quaint throwback to the days of linear regression model interpretation. So how do we delineate classes of problems for which the actual mechanism of functioning is not important, to those where it is? Science used to be that dividing line, but scientists are increasingly users of opaque computational code judged for scientific correction on output alone. This means the guardrails for knowledge production processes become incredibly important, as we can no longer solely rely on our usual mechanisms to assess research: peer review, disclosure of transparent methods for (human) verification, and the independent reproduction of findings.

A Scientific Method for Challenges

In a search for truth, the best we can do is maximize the probability of a result being true. The scientific method does not remit claims to absolute truth, so findings are probabilistic and a facsimile of truth verification is instead created through adherence to a process that is likely to support such an outcome. Traditionally, this process embodies techniques such as transparency in methods used to achieve the result, which are intended to reduce and identify errors in the discovery process and allow for independent reproduction and verification – our traditional hallmarks of scientific correctness. This raises some compelling questions for the challenge problem approach. Is the quantitative goal of the challenge problem sufficient to support the scientific conclusions? Can we understand why the winning method wins? Can we understand how and when to generalize challenge problem findings? Our task is to adapt the standards embodied in the scientific discovery process to allow coherent answers to these questions. Some pointers might be:

[FR-1: Data]: Donoho’s first leg of the frictionless reproducibility trifecta. Certain kinds of information about the data can increase trust in the result, as well as certain information about the test data set. Was the test set sampled from the input data or is it a new sample, likely with greater variability? With how much fidelity does the data span the true underlying population of interest? For example, use of the ImageNet data set for training convolutional neural networks has been criticized for data dependency (e.g., Deng et al., 2009; Tuggener et al., 2022). Of course, any data set is a sample, and so effective measures of representativeness can guide where we expect applications of the resulting model to be reasonable, in other words, trustworthy.

[FR-2: Re-execution]: To what extent are the results and models architecture dependent? How can effective software tests be designed and routinely expected to increase trust in the functioning of the code? Can aspects of the software testing or model parameter estimates be shared in a verifiable way, since often these models and pipelines require significant resources at scale to estimate and retrain. What aspects of the implementation pipeline and the resulting trained model must be made available to the research community for verification purposes?

[FR-3: Challenges]: The last leg of Donoho’s frictionless reproducibility trifecta. A key question is understanding what types of problems can be successfully and effectively solved using the challenge approach and which cannot. Consider a world where all research is done this way: Are there researchers who will be left out of participating in knowledge production? Research suggests some people are less likely to engage in competitions and this is uncorrelated with their likelihood of competitive success (Niederle & Vesterlund, 2007). Access to the infrastructure at scales often necessary to produce winning models is not distributed evenly across the research enterprise, nor even only to those producing successes. Conversely, perhaps the challenge approach brings a level of objectivity that could be missing from peer review, and therefore promotes greater inclusivity in discovery. How do we give credit to discovery pipeline creators, especially when solutions and code are reused and extended for new challenge problems. Finally, how does the challenge approach intersect with the traditional rationale for scientific discovery: a love of wonder and the excitement of explaining and understanding the natural world?

That different types of knowledge production have different standards for community acceptance of results has been widely accepted for hundreds of years, for example, the mathematical proof supporting mathematical findings, and empirical results linked to a structured methods section in publications. Today’s research community is still largely ill-equipped for review of scientific advances that rely heavily on computational methods. The social adaptation to bring the same standards of verifiability to computationally enabled research has been slow and, although there are several influential localized efforts, 20 years on, largely incremental (Xiong & Cribben, 2022). Most of our published research today relies on computation in crucial ways that enabled the discovery, yet there are no generally accepted broad standards of review or transparency for computational findings, as there are for traditional noncomputational findings (Stodden et al., 2018). As knowledge discovery accelerates via frictionless reproducibility, we will need to develop new processes that enable trust in the correctness of scientific findings resulting from computationally opaque discovery pipelines. Frictionless reproducibility is therefore a global cultural change in knowledge discovery and creation: in a new era of unprecedented rapid technological change1 the singularity occurs when people no longer demand to know why. Because there is no why.

A Computable Scholarly Record

Reliance on a rigid practice of manuscript publication is waning, being replaced by openly accessible computational workflows that are continuously changing, frictionlessly reproducible, and mostly outside the established scholarly records encapsulating the traditional scientific literature. There is a clear need to integrate computational knowledge to create a verifiable and extensible base in a systematic and open way, synthesizing computationally and data-enabled discoveries. Such a knowledge base defines a Computable Scholarly Record that comprises frictionlessly reproducible, transparent, extensible, and reusable discovery pipelines and facilitates: the regeneration of a computational result or model; comparisons and reconciliations of different hypotheses; the reimplementation of methods on new data and the modification of methods; the generation and evolution of benchmarks and standardized testbeds for the assessments of models and inference methods; the development and application of appropriate policies regarding data privacy, ethics, and meta-research on the scholarly record, and all perhaps even in automated ways. A Computable Scholarly Record will record and preserve computational results and models natively over time; integrating and linking computational solutions to challenge problems and their accompanying data. This structure then acts as a scholarly record in a traditional sense. It forms a locus for a research community to share ideas, get feedback, improve their work, agree on priorities, and resolve debates. The creativity and curiosity of the next generation of scientists will manifest in a digital sphere, and can be exercised in these new paths to discovery as the way science is done.

Open data, re-execution of discovery pipelines, and the use of the challenge approach to scientific discovery comprise frictionless reproducibility. A future of frictionlessly reproducible scientific results brings about a key shift: the inherent lack of transparency in complex computational discovery pipelines and models that necessitates trust in results rather than transparency, and hence there is a need for widespread community adoption of standards for the process of frictionlessly reproducible knowledge creation. We propose key questions to guide the development of such standards for the three facets of open data, re-execution, and the challenge approach, whose answers must in turn guide the creation of a novel Computable Scholarly Record.


Acknowledgments

Thank you to my PhD advisor, David Donoho, to whom I owe an infinite intellectual debt.

Disclosure Statement

Victoria Stodden has no financial or non-financial disclosures to share for this article.


References

Deng, J., Dong, W., Socher, R., Li, L. -J., Li, K., & Li, F.-F. (2009). ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 248–255). IEEE. https://doi.org/10.1109/CVPR.2009.5206848

Donoho, D. (2024). Data science at the singularity. Harvard Data Science Review, 6(1). https://doi.org/10.1162/99608f92.b91339ef

Liu, J., Carlson, J., Pasek, J., Puchala, B., Rao, A., & Jagadish, H. V. (2022). Promoting and enabling reproducible data science through a reproducibility challenge. Harvard Data Science Review, 4(3). https://doi.org/10.1162/99608f92.9624ea51

Niederle, M., & Vesterlund, L. (2007). Do women shy away from competition? Do men compete too much? The Quarterly Journal of Economics, 122(3), 1067–1101. https://gap.hks.harvard.edu/do-women-shy-away-competition-do-men-compete-too-much

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., & Li, F.-F. (2015). ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 115(3), 211—252. https://doi.org/10.1007/s11263-015-0816-y

Stodden, V., Seiler, J., & Ma, Z. (2018). An empirical analysis of journal policy effectiveness for computational reproducibility. PNAS, 115(11), 2584–2589. https://doi.org/10.1073/pnas.1708290115

Tuggener, L., Schmidhuber, J., & Stadelmann, T. (2022). Is it enough to optimize CNN architectures on ImageNet? Frontiers in Computer Science, 4. https://doi.org/10.3389/fcomp.2022.1041703

Xiong, X., & Cribben, I. (2022). The state of play of reproducibility in statistics: An empirical analysis. ArXiv. https://doi.org/10.48550/arXiv.2209.15602


©2024 Victoria Stodden. This article is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the article.

Connections
1 of 15
Comments
0
comment
No comments here
Why not start the discussion?