Professor Jennifer Chayes (“Data Science and Computing at UC Berkeley,” this issue) describes an extraordinarily ambitious and exciting vision for data science and computing at Berkeley. The timeliness will be obvious to HDSR readers; our world increasingly relies on data and computing to create knowledge, to make critical decisions, and to better predict the future. Data science has emerged to support these data-driven activities by integrating and developing ideas, concepts, and tools from computer science, engineering, information science, statistics, and domain fields. Data science now drives fields as diverse as biology, astronomy, material science, political science, and medicine, not to mention vast tracts of the global economy, key government activities, as well as quotidian social and societal functions. Universities clearly need to respond, and Berkeley, not for the first time, serves as a beacon.
Extraordinary opportunities for data science present themselves, and an intellectually riveting new frontier is emerging. We will, however, have completely missed the boat if we think of this as a shiny new academic discipline that pops up, occupies a building on campus somewhere, and advances with minimal impact on the other domains. Clearly Berkeley aims to avoid this trap, but the challenges are real. Traditional academic disciplines represent the central organizing principle for our universities and, try as we might, this has changed little over the decades and centuries. As former university president John Lombardi has noted, these disciplines operate a lot like medieval guilds (Lombardi, 2013). Each department serves as the local branch of the national guild of the same specialty. The national guild establishes standards while the local university deals with employment and work assignments. Unfortunately, the guilds tend to be more focused on self-preservation and upholding standards than on working with other guilds to solve societal problems. But in truth, no guild working alone can solve the problems we actually care about, such as endemic poverty, racism, climate change, and access to clean water and quality health care. The guilds stifle the kind of innovation we really need. My good friend Jeannette Wing has posed the question, “Is data science a discipline?” (Wing, 2020). I hope not! The last thing we need is the asphyxiating data science guild that would surely follow.
The range of data science and its impact on our daily lives raises challenging questions relating to privacy, ethics, and fairness. Science and technology alone cannot provide the answers; we must draw on legal scholarship and historical analysis, as well as the insights and inspirations of the humanities and the arts to ensure an appropriate path forward. The corporate sector is investing massively in data science, but only the great universities can address the broader societal implications. To do this, we need new structures and new incentive systems. I have high hopes for Berkeley’s “Data Commons”; perhaps this might provide a supra-disciplinary institutional model for us all?
Chayes suggests we need to anoint data science as a discipline in order to offer a Ph.D. in data science. Why so? Fundamentally the Ph.D. is a research degree. As research scientists, our graduates should be able to contribute to the solution of real-world data-centric problems through the creation of novel data science objects (e.g., models, methods, visualizations) or the analyses of such objects, embedded in real-world contexts. The stuff of disciplines? I hope not.
Different institutional models for data science have emerged at major universities. Berkeley is moving toward a data science college. Other examples along this line include the University of Virginia and Fudan University. My current and previous universities, Northeastern and Columbia, have established institutes. Yet others have named cognate departments, Stanford and Cornell, for example. There are pros and cons to each. Colleges typically offer degree programs, house tenured faculty, and expect to persist for many decades. The role of college dean is well understood, and the dean has a seat at the table at resource allocation time. Institutes exist to cut across the traditional disciplines. This feels like a better fit for data science, but ambiguities can arise over faculty lines and control of educational programs. Departments are easiest to understand but run the risk of being isolated and can end up playing whack-a-mole with other data science efforts. Even more variants exist—initiatives, centers, programs, and so on—although these may represent transient states. To partners and to undergraduate students, these distinctions may have little meaning, but for faculty and staff, real consequences ensue. Universities the world over will be paying close attention to Berkeley, and no doubt the Berkeley model is one that others will follow.
Berkeley’s ambition and boldness are inspiring, and Professor Chayes has laid out a compelling vision. The structural shifts at Berkeley are huge but so is the opportunity. The industrial revolution led to a remaking of many institutions and structures; the emergence of data science and Berkeley’s leadership may presage something similarly momentous.
Lombardi, J. V. (2013). How universities work. JHU Press.
Wing, J. M. (2020). Ten research challenge areas in data science. Harvard Data Science Review, 2(3). https://doi.org/10.1162/99608f92.c6577b1f
This discussion is © 2021 by the author(s). The editorial is licensed under a Creative Commons Attribution (CC BY 4.0) International license (https://creativecommons.org/licenses/by/4.0/legalcode), except where otherwise indicated with respect to particular material included in the article. The article should be attributed to the authors identified above.