Unless you have joined the HDSR’s global community only recently (and if so, welcome!), you may recall the title of my editorial from the first issue this year: “2024: A Year of Crises, Change, Contemplation, and Commemoration” (Meng, 2024). If you have read it, you probably could sense that I tried hard to anticipate the crises and changes that were likely to unfold in the months ahead, and to participate as a concerned data scientist and engaged citizen by advocating for reflective mental preparation and actionable responses, learning from history and those who have dealt with crises and changes of all shapes and forms.
But I clearly failed completely to anticipate a delightful surprise: the Nobel Prize in Physics 2024 is awarded to the development of machine learning and artificial neural networks, which have powered much of today’s rapidly growing field of generative AI and beyond. The Nobel Prize (NP) organization’s announcement underscores the link between machine leaning and physics by titling it “They trained artificial neural networks using physics,” and highlights Geoffrey Hinton’s influential work “that uses a different method: the Boltzmann machine” (Nobel Prize Outreach, 2024a). I am sure the emphasis on “using physics” would delight many academics who lamented that they are not in the NP fields.
Still, it’s remarkable because, until this year, the Nobel Prize in Physics had virtually always celebrated fundamental physics—its theories, experiments, discoveries, or innovations—rather than applications of physics as a tool. In contrast, the Nobel Prize in Chemistry 2024, though also intertwined with AI, focuses on AI as a tool for tackling biochemistry challenges, such as predicting complex protein structures (Nobel Prize Outreach, 2024b).
There’s no question that the foremost achievements in machine learning and neural networks—and, more broadly, data science and AI—are NP-worthy. As a statistician and editor for a data science platform, I commend the Nobel committees’ innovative approach in recognizing this fact and hope this is only the beginning. However, awarding a physics Nobel prize to a breakthrough in data science and AI has indeed stirred considerable emotions, discussions, and reflections in the scientific community in general; see for example, Bordodi (2024) and Zia (2024).
Writing this editorial allows me to share three reflections prompted by the NP news, especially as these reflections echo the themes in most of this issue’s articles outside the special theme on the 2024 election. Specifically, the preeminent featuring of AI in two hard science NPs signals (1) a pronounced tilt towards empiricism in scientific inquires; (2) an ongoing and profound blending of disciplinary boundaries; and most crucially, (3) a landmark acknowledgment of AI’s transformative potential, regardless of whether it will ever possess intelligence.
In layperson’s terms, empiricism refers to acquiring knowledge through sensory experience, as opposed to rationalism, which emphasizes inherent knowledge or ideas derived from logical reasoning. The rapid rise of machine learning and generative AI has vastly expanded our ability to seek, simulate, and synthesize patterns—often without any deep understanding of their underlying generative mechanisms, assuming such mechanisms even exist. After all, humans have been finding patterns in clouds around the world long before cloud computing existed.
As a seasoned statistician, I have encountered many concerns from my profession regarding the growing reliance on pattern recognition in everything from discovery to decision-making. At the same time, the transformative power of pattern-seeking generative AI is unfolding before our eyes, and I find myself both as excited as the most fervent machine learner and as critical as the least forgiving statistician. This is not a disparaging view toward either field but rather a healthy state of mind that I strive to sustain.
Seeking information or evidence in data is a tricky business, especially when driven by parties with vested interests. Data scientists can play a critical and positive role by harmonizing two vital mindsets: the confidence and capability to innovate and adventure, and the prudence and insight to recognize when restraint is necessary.
The article by Melody Huang and Harsh Parikh (2024) in this issue, “Toward Generalizing Inferences From Trials to Target Populations,” offers an insightful discussion and demonstration of strategies to foster such a balanced perspective toward causal inference. This article presents a comprehensive overview of a 2023 workshop held at Brown University, where experts from fields such as social science, medicine, public health, statistics, computer science, and education convened. Each discipline has advanced the field of causal inference through addressing unique, domain-specific challenges, yet all recognize a common, crucial task: reliably generalizing findings from controlled trials to the ‘wild west’ of the real world. In this broader context, a cautious approach is vital to minimize unintended consequences arising from the well-intentioned desire to generalize findings as widely as possible, a goal that—if not carefully managed—can and will do more harm than help.
I highly encourage every data scientist to at least review the bolded highlights in the article, as many of the discussed issues and recommendations are broadly relevant, such as dealing with heterogeneity, addressing data deficiencies, combining multiple data sources, understanding causal mechanisms, leveraging advances in machine learning, and enhancing interpretability.
The column article by Maria Jones (2024), “Introducing Reproducible Research Standards at the World Bank,” addresses a fundamental concern in empirical research: how can we ensure the reliability of findings when logical derivations or theoretical verification are absent? Ensuring reproducibility—the reported numerical results are verifiable and verified —is a critical but minimal requirement. If our numerical results cannot even be verified using the same data and methods that we report to use, then how could anyone, including ourselves, trust these results?
Given the essential roles of reproducibility and replicability in scientific research, HDSR has featured two special themes (in issue 2.4 and issue 5.3) and subsequently established a dedicated column on reproducibility and replicability. Jones’s article is the latest installment in this series and discusses an ambitious World Bank project aimed at providing curation support and conducting reproducibility checks on hundreds of working papers, books, and other publications. Putting on my hat as a critical statistician, I am delighted to see these efforts, particularly given the World Bank's global influence and the broad-reaching implications of its findings.
Disciplinary Boundaries Are Blurring, Whether We Are Ready or Not
Interdisciplinarity, multidisciplinarity, and transdisciplinarity are terms we encounter—and adopt—with increasing frequency in articles, presentations, policy documents, and grant proposals. However, most people who have deeply engaged in cross-disciplinary research have rich stories about the challenges—as well as rewards—of these efforts. The rewards and challenges of interdisciplinary work are not unlike the experience of being an immigrant: learning a new language, appreciating different values, adapting to a foreign culture and, most importantly, finding ways to be welcomed, valued, and respected.
Trained almost 4 decades ago in the Harvard Department of Statistics—a department built on the belief that theoretical and methodological work must be grounded in substantive fields (Meng 2012)—I have many personal stories to regale interested readers, including some sobering ones. During my tenure as department chair, two young researchers working at the intersection of computer science, digital humanities, and statistics contacted me for advice. Despite their innovative contributions, they faced a daunting challenge in the academic job market. Conversations typically began with, “Your work is really interesting,” yet ultimately devolved into “But that’s not computer science,” or “They didn’t publish much in statistics.” These reactions, though valid given the rigid boundaries of academic departments, underscored the barriers facing those who transcend traditional disciplines. It was extremely uncomfortable, to say the least, to find myself cornered into the hypocritical position of promoting interdisciplinary research while having to tell those who have forged such paths, “Unfortunately, statistics is not your home”—and worse, knowing there was no fitting department for them elsewhere.
I am encouraged, however, by the improvements seen over the past decade or so, driven by the rapid rise of data science and AI. New institutional structures, such as the Harvard Data Science Initiative (HDSI) and the Kempner Institute for the Study of Natural and Artificial Intelligence, are emerging to address these challenges. As HDSI’s flagship publication, HDSR aspires to build a global platform for the data science community, and our 500+ publications since inception illustrate the necessity and benefits of cross-disciplinary discussion, debate, and dissemination. The lively debates surrounding the cross-disciplinary nature of recent Nobel Prizes in the hard sciences further underscore the shifting landscape of academic disciplines. The range of emotional responses to these changes highlights varying levels of readiness to embrace these evolutions. It is essential that we prepare ourselves and future generations not only to embrace but also to benefit from these changes, rather than becoming unintended casualties of evolution.
Sustainable preparations are most effective through education, and the earlier the better, much like learning languages. That is why HDSR features a column on “Minding the Future,” with articles for and frequently written by pre-college students. In this issue, in “How Data Science Can Be Started From Your School’s Yearbook,” Anthnoy Shen (2024) reports his experience as a high school student engaged in a self-identified social and behavioral science project. Having in mind the need of cultivating and culturing future generations to consider data science as broadly as possible, I was particularly pleased to see Shen’s emphasis on “how aspiring data scientists can collect, analyze, and draw conclusions from anything around them, even their school’s yearbook.”
The essay documents Shen’s first-hand experience from forming the research question, a rather challenging one because it is about understanding students’ behavior; to identifying, collecting, and “cleaning” relevant data; and to analyzing data and interpreting results. I was very pleased, and in fact, proud to see the line that “The most surprising part of my entire research process was the drastic difference between my hypothesis and my findings.’’ It is rare in data science publications to see such candid expressions. I am very proud that HDSR is a place for aspiring data scientists to share their real experiences and lessons, instead of packaging everything into highly overfitted ‘successes,’ and I’d encourage all of us to share our learning experiences that can help others to flatten their learning curves.
The article by Doreet Preiss, Jessica Sperling, Ryan M. Huang, Kyle Bradbury, Thomas Nechyba, Robert Calderbank, Gregory Herschlag, and Jana Schaich Borg (2024), “Where Data Science and the Disciplines Meet: Innovations in Linking Doctoral Students With Master’s-Level Data Science Education,” explores interdisciplinary training at the post-college level. As the authors note, PhD programs tend to concentrate on discipline-specific methodologies, while master’s-level data science programs often engage more broadly in interdisciplinary activities. This observation inspired an innovative program at Duke University, where “Doctoral fellows from diverse fields worked with teams of master’s students from Duke’s Master in Interdisciplinary Data Science program on applied Capstone projects focused on the doctoral fellows’ own disciplines and dissertation research.”
The article provides a rich account of the program’s design and implementation, along with a candid assessment of its benefits and drawbacks. Drawing from these experiences, the authors offer a host of actionable suggestions for other institutions interested in implementing similar initiatives. I am particularly pleased and proud that HDSR has become a central forum for data science educators to share their experiences and insights with such candor and constructiveness. This culture of openness is crucial for fostering cross-boundary engagement and collaboration, especially in advancing scientific inquiry.
AI Has Transformative Power, Whether We Understand It or Not
No matter how much physics machine learning or neural networks draw on, they would not have won the Nobel Prize in Physics without delivering results that have impressed physicists. As tautological as it may sound, this underscores a crucial point: AI’s pattern-seeking paradigm has effectively broken an epistemic barrier, earning genuine respect from the hard sciences without providing the reasoning and explanations that scientific fields typically demand.
Indeed, this achievement was made before the ChatGPT phenomenon, and quietly, which speaks to the transformative power of AI without fanfare. Matthew Schwartz, a physics colleague at Harvard and the author of “Modern Machine Learning and Particle Physics” (Schwartz, 2021), started his abstract with the line, “Over the past five years, modern machine learning has been quietly revolutionizing particle physics.” In retrospect, this statement could well justify the Nobel Prize in Physics for machine learning—or at least parallels the reasoning behind awarding the Nobel Prize in Chemistry for AI-driven achievements.
Reflecting on my exchanges with Schwartz—partly to thank him for his timely contributions to HDSR—I was struck by a central question: What changed in physics to open the door for machine learning, despite the field’s traditional insistence scientific understanding and interpretability alongside empirical verification? In my editorial for the issue containing Schwartz’s article (Meng, 2021), I noted how “relying on mostly black-box or at least non-interpretable machine learning algorithms to discover physics laws was something virtually unthinkable a decade ago, and perhaps it is still unfathomable to many.”
Schwartz’s (2021) response, expressed in his article, highlighted three approaches to tackling the interpretability issue: (1) finding traditional interpretations, (2) accepting human limits in understanding machine learning, and (3) developing a new language and tools to interpret it. His concluding insight is telling: “I feel more optimistic about the possibility of transcendental progress in fundamental physics now than at any other time in my career, even if I myself may not be able to comprehend the final theory.”
This sense of wonder—our awe at what lies beyond our comprehension—has the power to reshape our worldview, not unlike a religious revelation. With the arrival of ChatGPT and generative AI, many have come to recognize, albeit reluctantly, that machine learning vastly outperforms humans in discovering patterns. As I argued in last year’s editorial (Meng, 2023), generative AI is a triumph of data engineering and science, tapping into the collective intelligence of Homo sapiens stored in myriad records across history. In that sense, there is nothing ‘artificial’ about generative AI—it is purely a human creation, both in engine and input.
A more provocative question remains: Can AI surpass humans in creativity, whether through true intelligence or by imitating it? The Recreations in Randomness column article by Johan Ugander and Ziv Epstein (2024), “The Art of Randomness: Sampling and Chance in the Age of Algorithmic Reproduction,” delves into this question. As the column editor, Mark Glickman, succinctly puts it, “By tracing a historical tradition of artists who have embraced randomness, Johan Ugander and Ziv Epstein reveal how leaning into AI’s inherent stochasticity might just be the key to breaking out of the creative cage.”
This is a fascinating pursuit, though potentially also a frightening one to those wary of losing human control over AI. It is not an unreasonable question that if AI can recognize and capitalize on behaviors we humans perceive as unpredictable (and here, artists’ use of ‘random’ likely refers to more than behaviors that follow known probabilistic laws), what could serve as guardrails for AI’s stochastic tendencies or creativities, given that it is already a (giant) step ahead? A partial answer—or comfort—can be found in Ugander and Epstein’s article, which calls for “stochasticity-induced ‘happy accidents’ in modern human–AI collaboration.”
Humans’ Empathy Is What Set Us Apart From AI …
Personally, I am not in the camp that worries the AI will take over, at least not yet. But I am concerned—and saddened—by our collective dumbness, to put it bluntly. As a species, we collectively have failed miserably to learn from history’s repeated lessons on the devastating effects of hatred and war, whether physical, ideological, or spiritual. Despite our staggering advances at so many fronts, we haven’t figured out how to live with each other in peace, let alone in harmony. We’re still killing each other, like animals, albeit with far superior weapons. In that domain, the advance of AI or any technology should deeply concern anyone seeking peace.
I reflect on this amid the devastating wars unfolding before our eyes, and the turbulent politics and division in a country that was formed to be the United States. And of course, yes, the polls’ nail-biting tension over the 2024 election is a statistical nightmare. But of course one should never waste a crisis; this is also a moment for data scientists of all stripes to sharpen our tools and do our best to provide as accurate data and information as possible. I am therefore deeply grateful to HDSR’s media feature editor, Liberty Vittert, and guest co-editors and political scientists Ryan Enos and Stephen Ansolabehere for repeating their 2020 election collaborations and putting together a special theme on the 2024 presidential election.
As the co-editors introduce each of the six themed articles in their editorial (Vittert et al., 2024), which I expect to break the record for the most viewed single article in a week in HDSR, currently held by their 2020 editorial (Vittert et al., 2020), I will only report how delighted I was seeing the balanced coverage in terms of outcomes (vote shares and voter turnout; Gelman et al., 2024, and Ansolabehere et al., 2024), methodologies (quantitative and qualitative; Donnini et al., 2024, and Lichtman, 2024), and genre—there is even a murder mystery (Bailey, 2024)!
But I would like to highlight the conversation with Minnesota Secretary of State Steve Simon (Simon et al., 2024). I had the pleasure of hearing his keynote on AI and elections at the 2024 Spring Research Workshop on Generative AI hosted by the Data Science Initiative at University of Minnesota (Simon, 2024). Whereas I am generally curious about and get excited by many topics, speeches by politicians are not among them. Secretary Simon’s speech, however, was a refreshing exception. I found myself in a mood of listening to a fellow data scientist candidly reporting their accomplishments and challenges, from voter engagements to cybersecurity. Incidentally, Stephen Stigler’s (2024) article in this issue, “Some Early History of Data Security,” in the context of French lottery in 18th century reminds us how we humans have struggled with the issue of data security for centuries, though obviously the complexity has grown staggeringly.
Following his keynote, I invited Simon to share his message with HDSR’s readers, which led to the conversation featured here in this issue, “AI and Elections: A Conversation With Secretary Steve Simon of Minnesota” (Simon et al., 2024). In particular, Simon’s concise call for ‘high turnout; low drama’ has stayed with me—so much so that we echoed it in our October 2024 podcast on “Digesting 2024 Election Polls: How the Media Reports and Decodes the Numbers” (Meng et al., 2024). I encourage all readers, especially those concerned about elections, to tune in to our podcasts about the election.
But if you have no time for the whole conversation, I’d still urge you to hear Simon’s (Simon et al., 2024) response to the final “magic wand” question from my co-host, Liberty: “If you could wave your magic wand and have every voter do one thing this election before voting, what would it be?”
Simon: “I would say somehow in your head, demonstrate some political empathy. […] don’t assume bad motives about the people who are voting completely opposite of you. […] don’t believe the worst about someone who’s voting different from you.”
This simple advice is vital yet difficult, if not impossible, for many today. How did we reach this point? Empathy—the quality that AI lacks—is essential to peaceful coexistence. That is one area in which we humans can always outsmart AI.
So, as you exercise your right to vote, please exercise your empathy too.
Xiao-Li Meng has no financial or non-financial disclosures to share for this editorial.
Ansolabehere, S., Brown, J., Khanna, K., Phillips, C., & Stewart III, C. (2024). Forecasting turnout. Harvard Data Science Review, 6(4). https://doi.org/10.1162/99608f92.62881547
Bailey, M. A. (2024). Murder at polling manor. Harvard Data Science Review, 6(4). https://doi.org/10.1162/99608f92.19b0a307
Bordoloi, S. K. (2024, October 22). Prizes for research using AI: Blasphemy or Nobel Academy’s genius. Sify. https://www.sify.com/ai-analytics/prizes-for-research-using-ai-blasphemy-or-nobel-academys-genius/
Donnini, Z., Louit, S., Wilcox, S., Ram, M., McCaul, P., Frank, A., Rigby, M., Gowins, M., & Tranter, S. (2024). Election night forecasting with DDHQ: A real-time predictive framework. Harvard Data Science Review, 6(4). https://doi.org/10.1162/99608f92.ccb395f0
Gelman, A., Goodrich, B., & Han, G. (2024). Grappling with uncertainty in forecasting the 2024 U.S. presidential election. Harvard Data Science Review, 6(4). https://doi.org/10.1162/99608f92.a919e3fa
Huang, M., & Parikh, H. (2024). Toward generalizing inferences from trials to target populations. Harvard Data Science Review, 6(4). https://doi.org/10.1162/99608f92.68c7973b
Jones, M. (2024). Introducing reproducible research standards at the World Bank. Harvard Data Science Review, 6(4). https://doi.org/10.1162/99608f92.21328ce3
Lichtman, A. (2024). The Keys to the White House: Predicting the 2024 winner. Harvard Data Science Review, 6(4). https://doi.org/10.1162/99608f92.a390dc78
Meng, X.-L. (2012). 55 years of Harvard statistics: Stories, snapshots, and statistics. In A. Agresti, & X.-L. Meng (Eds.), Strength in numbers: The rising of academic statistics departments in the U.S. Springer.
Meng, X.-L. (2021). Building data science infrastructures and infrastructural data science. Harvard Data Science Review, 3(2). https://doi.org/10.1162/99608f92.abfa0e70
Meng, X.-L. (2024). 2024: A Year of crises, change, contemplation, and commemoration. Harvard Data Science Review, 6(1). https://doi.org/10.1162/99608f92.239082d0
Meng, X.-L. (2023). Human intelligence, artificial intelligence, and Homo sapiens intelligence? Harvard Data Science Review, 5(4). https://doi.org/10.1162/99608f92.11d9241f
Meng, X.-L. (Host), Vittert, L. [Liberty] (Host), Hall, C. (Guest), & Vittert, L. [Leland] (Guest). (2024, October 29). Digesting 2024 election polls: How the media reports and decodes the numbers (No. 46) [Audio podcast episode]. In Harvard Data Science Review Podcast. https://hdsr.podbean.com/e/digesting-2024-election-polls-how-the-media-reports-and-decodes-the-numbers/
Nobel Prize Outreach. (2024a, October 8). They trained artificial neural networks using physics [Press release]. https://www.nobelprize.org/prizes/physics/2024/press-release/
Nobel Prize Outreach. (2024b, October 9). They cracked the code for proteins’ amazing structures [Press release]. https://www.nobelprize.org/prizes/chemistry/2024/press-release/
Preiss, D., Sperling, J., Huang, R. M., Bradbury, K., Nechyba, T., Calderbank, R., Herschlag, G., & Borg, J. S. (2024). Where data science and the disciplines meet: Innovations in linking doctoral students with masters-level data science education. Harvard Data Science Review, 6(4). https://doi.org/10.1162/99608f92.f81142cc
Shen, A. (2024). How data science can be started from your school’s yearbook. Harvard Data Science Review, 6(4). https://doi.org/10.1162/99608f92.4a1751ea
Schwartz, M. D. (2021). Modern machine learning and particle physics. Harvard Data Science Review, 3(2). https://doi.org/10.1162/99608f92.beeb1183
Simon, S. (2024, May 22–24). AI and elections [Keynote address]. Data Science Initiative Spring Research Workshop 2024, University of Minnesota, Minneapolis, MN, United States. https://mediaspace.umn.edu/media/t/1_7721oqpy/324159032
Simon, S., Meng, X.-L., & Vittert, L. (2024). AI and elections: A conversation with Secretary Steve Simon of Minnesota. Harvard Data Science Review, 6(4). https://doi.org/10.1162/99608f92.a10bcaeb
Stigler, S. M. (2024). Some early history of data security. Harvard Data Science Review, 6(4). https://doi.org/10.1162/99608f92.717c3fe5
Ugander, J., & Epstein, Z. (2024). The art of randomness: Sampling and chance in the age of algorithmic reproduction. Harvard Data Science Review, 6(4). https://doi.org/10.1162/99608f92.f5dcab1a
Vittert, L., Enos, R. D., & Ansolabehere, S. (2024). Predicting the 2024 presidential election. Harvard Data Science Review, 6(4). https://doi.org/10.1162/99608f92.91566bf7
Vittert, L., Enos, R. D., & Ansolabehere, S. (2020). Predicting the 2020 presidential election. Harvard Data Science Review, 2(4). https://doi.org/10.1162/99608f92.fed3dc89
Zia, T. (2024, October 18). How AI researchers won Nobel Prizes in physics and chemistry: Two key lessons for future scientific discoveries. Unite.ai. https://www.unite.ai/how-ai-researchers-won-nobel-prizes-in-physics-and-chemistry-two-key-lessons-for-future-scientific-discoveries/
©2024 Xiao-Li Meng. This editorial is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the editorial.