Dave Donoho is to be congratulated on his insightful article (Donoho, 2024), which provides substantial evidence for a more pragmatic and less fearful view on the fast progress of data science. I am interpreting the term data science in the sense Jeff Wu (1986) intended when he coined the phrase, as an alternative name for statistics that emphasizes its modern developments, in line with Donoho (2017). Data science thus overlaps a great deal with machine learning. In fact, the article uses these terms interchangeably, and expresses the opinion that calling the current phenomenon “AI singularity” is misleading.
Dave’s article opposes the narrative that machine learning will create an independent entity that endangers humanity by its own initiative. On the other hand, humans can definitely commit evil with the help of machine learning. For instance, facial recognition can be used to spy on people or keep them in check.
At our junction in history where some autocratic regimes are in a geopolitical expansive phase, and in other countries opposing ideologies become increasingly polarized, a torrent of propaganda and systematic misinformation is going on. To further their causes, the proponents eagerly adopt modern machine learning technology, ranging from social media manipulation to deep-fake videos. We observe the unpleasant empirical fact that a very large fraction of the population is susceptible to this. I see no other way to improve the situation than to incorporate more critical, logical, and statistical thinking in education. The effect will take generations.
The article stresses two factors contributing to the current rapid progress that are often overlooked. The first is publicly posed challenges that are open to everyone. Participating in a challenge appeals to human nature, as people are often competitive or see the challenge as a game. This type of challenge is fairly new, and has not been used in many fields yet. But it must be said that attaining a lower error rate on test data is a challenge that lends itself particularly well to objective quantitative evaluation. It may require some ingenuity to devise appropriate figures of merit for challenges that are not about prediction.
The other factor is the frictionless environment. The role of friction is often underestimated, but it is a big hindrance. For instance, each time I look up something and am greeted by a whole screen full of questions about cookies, I hastily turn back to avoid wasting my time and open another search result. Those cookie-filled websites for profit chase away a lot of potentially interested people. Perhaps the authorities should firmly outlaw cookies and tracking, because friction does not go away by itself. Fortunately, Wikipedia is not for profit. For scientific publications I remain a staunch fan of open access. Down with paywalls!
Deep neural nets and similar methods have achieved superb performance. A major advantage is that making a prediction on new data is very simple and fast, so it can be done in real time. On the other hand, this requires a fitted model with a huge number of parameters that have to be learned first, and it is this training stage that can eat humongous amounts of computer power. The required resources are very expensive, which over time has created an unfair advantage for the hegemons mentioned in the article, and entails an increasing environmental cost.
Dave sketches the sad state of affairs where many young researchers in machine learning are only motivated to cleverly tinker with code in order to get some small improvement in the error rate, and are not interested in attempting a deeper understanding. Their current reward structure is set up that way. Of course, there is nothing wrong with making incremental improvements, and there are many settings, such as some areas of medicine, where steady research of this type over many years has yielded a large cumulative benefit. Nevertheless, I am still hopeful that enough bright people will continue to do fundamental research on the question why huge neural nets and other complex methods can be made to work so well. Not only because scientists want to understand things but also because understanding may lead to improvements that might drastically reduce the computational resources required for training. Due to the end of Moore’s law, improvements mainly due to increased scale are in their final stages anyway. Fortunately, there is work going on to attempt a better understanding of previously black boxes. Quite a few people, including Dave, have made contributions addressing this difficult problem. One example is the investigation of the intriguing collapsing phenomenon in neural nets (see Kothapalli, 2022, and Papyan et al., 2020). The double descent phenomenon, which suggests the existence of implicit regularization that avoids the problems of overfitting in spite of the enormous number of parameters, has also been studied in, for example, Li et al. (2023) and the references cited therein. Other authors have searched for an objective function to optimize (Chan et al., 2022). Several other avenues are being pursued. Fortunately, some publication outlets remain that are interested in ideas, rather than only in beating benchmarks.
The development of machine learning and neural nets has altered course several times, often with sharp turns. This reminds me of the concept of ‘alternate history.’ While I was a graduate student at ETH Zürich, at some point John Tukey visited our group. Because I was a fellow fan of science fiction, he gave me the book The Man in the High Castle (Dick, 1962) that he had read on the plane. I still have it. The plot of the book was based on the premise that the Second World War had another outcome, and what might have happened next. The book has since been serialized for TV by Amazon Prime Video. This is the alternate history genre: you assume some event in the past turned out differently, and then try to imagine how history could have unfolded afterward. It is a counterfactual thought, but it can yield interesting questions.
One such question is what would have happened if at some point a totally different approach for classification had been proposed, with fairly good performance for its time. If it had attracted enough people that made successive improvements to it, could it have led to a viable alternative to today’s ubiquitous deep learning networks? This would be an alternate history of machine learning. I know of no mathematical reason why a powerful methodology with much lower computational cost would be impossible. The question is of practical interest because it suggests the possibility of fresh starts in different directions. The inherent difficulty is that no fresh start could initially compete on the benchmarks with the state of the art that has been perfected over the years, not even if it were inspired by a better understanding of what makes the current technology tick. Any fresh start would therefore have to be proposed in an outlet that is not solely about improvements on benchmarks, and afterward would require a sustained research effort before it could become competitive. This makes it a big and open-ended project that is too risky for researchers in their early career years, but I do hope that some creative people are considering it.
Peter J. Rousseeuw has no financial or non-financial disclosures to share for this article.
Chan, K. H. R., Yu, Y., You, C., Qi, H., Wright, J., & Ma, Y. (2022). ReduNet: A white-box deep network from the principle of maximizing rate reduction. Journal of Machine Learning Research, 23(114), 1–103.
Dick, Philip K. (1962). The Man in the High Castle. Berkley Medallion Books, New York, Seventh Printing.
Donoho, D. (2017). 50 years of data science. Journal of Computational and Graphical Statistics, 26(4), 745–766.
Donoho, D. (2024). Data science at the singularity. Harvard Data Science Review, 6(1). https://doi.org/10.1162/99608f92.b91339ef
Kothapalli, V. (2022). Neural collapse: A review on modelling principles and generalization. ArXiv. https://doi.org/10.48550/arXiv.2206.04041
Li, Z., Su, W., & Sejdinovic, D. (2023). Benign overfitting and noisy features. Journal of the American Statistical Association, 118(544), 2876–2888.
Papyan, V., Han, X., & Donoho, D. L. (2020). Prevalence of neural collapse during the terminal phase of deep learning training. Proceedings of the National Academy of Science, 117(40), 24652–24663.
Wu, C. F. J. (1986). Future directions of statistical research in China: A historical perspective. Application of Statistics and Management, 1, 1-7.
©2024 Peter Rousseeuw. This article is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the article.