Who is a data scientist? This deceptively simple question was posed at the KDD-2018 conference in London, at a panel moderated by Usama Fayyad (Fayyad et al., 2018). A very quick realization, after the panel discussion and interaction with the audience in a full room, was that multiple and often quite different definitions can be found for this increasingly common role in the industry. This realization led to a long journey, one that resulted in this special theme on Data Literacy at Scale at HDSR. Let me briefly share with you the story of how we got here, and why we think this special theme on data literacy is timely and important.
After the KDD panel, we set out on an extensive research effort with a nonprofit initiative, the Initiative for Analytics and Data Science Standards (IADSS), where I was a cofounder. Our goal at IADSS was to propose knowledge and skills standards for the most common data-related roles in the industry, such as data scientist, data analyst, and data engineer, and as a result help eliminate the confusion in defining them. The findings of the research were published in two articles at HDSR, which I coauthored with Usama Fayyad, “Toward Foundations for Data Science and Analytics: A Knowledge Framework for Professional Standards” (Fayyad & Hamutcu, 2020) and “From Unicorn Data Scientist to Key Roles in Data Science: Standardizing Roles” (Fayyad & Hamutcu, 2022).
As we were conducting our research, we had an opportunity to talk to dozens of industry executives and practitioners who were trying to drive better decision-making at their organizations by building and optimizing data science and analytics teams. While it was important to clearly define knowledge and skills for these highly capable groups of individuals so that they are set to succeed in solving the most challenging data science problems or developing data infrastructure, these individuals made up a very small portion of an organization’s workforce. So we asked ourselves and our industry peers: What about the remaining significant majority in an organization, say the other 99%? Which data-related skills and knowledge are important for them? It turns out there is not a clear answer to that either. Think about a human resources specialist who is trying to decide how to spend $1,000 on advertising a particular open position at their company. This clearly is not a problem that justifies a data science team’s time and energy. But imagine thousands of these types of decisions employees make in a given day within a typical large organization. Any improvement in how these small decisions are made by individuals who are able to use data effectively would have the potential for creating a large cumulative positive impact for the organization. This line of thinking led us to think more broadly about data literacy.
Organizations clearly have a data literacy gap. According to a survey conducted by New Vantage Partners (NewVantage Partners, 2021), more than 3 in 5 executives do not believe that their companies are data driven, nearly 3 in 4 executives experienced failure in building a data culture at their companies, and more than 9 out of 10 executives describe people and processes as the biggest barriers to becoming data driven. A focus group conducted by Harvard Business Review shows a fundamental lack of understanding among employees: “they had trouble asking the right questions, understanding relevant data and how to validate it, interpreting data using A/B tests to evaluate hypotheses, creating visualizations for these results, and telling the story that helps decision-makers understand the necessary next steps” (Bersin & Zao-Sanders, 2020). Furthermore, based on Accenture’s study on the Human Impact of Data Literacy, 79% of the global workforce is not confident in their data skills, 59% experience burnout while working with data systems, and 48% defer to making decisions based on gut feeling over data-driven insight (Accenture, 2020).
Back in the summer of 2023, when Katie Malone, my former coeditor at the Active Industrial Learning column here at HDSR, and I started devising a workshop to bring together experts and build a discussion and sharing platform on data literacy, we wanted to approach it from multiple angles. Defining, building, and entrenching data literacy requires different approaches and methods in different contexts and environments, and we ended up with five themes for the workshop: data literacy in industry, communities, K-12 education, undergraduate education, and government. The workshop was a collaborative effort between the Harvard Data Science Initiative and the Institute for Experiential AI at Northeastern University. Around 40 practitioners and academicians from various organizations tackled the five themes in working groups, discussing and documenting what they saw as challenges and opportunities in data literacy.
This special theme on data literacy is aiming to share some of the output from this collective effort through multiple articles. The first release that you are reading now features articles written by industry and communities working groups.
Amid the current surge in interest surrounding GenAI (generative AI) and its implications for organizational data utilization, the critical challenge of implementation is addressed in the first article of the series, “Data Literacy in Industry: High Time to Focus on Operationalization Through Middle Managers” (Koloski et al., 2025). Framing data literacy as a change management exercise, the authors argue that middle managers are essential for driving behavioral change and operationalizing data literacy initiatives, bridging the gap between executive vision and frontline practice.
Recognizing the limitations of individual training efforts, the “Building an Inclusive Data Literacy Community” (Hilty et al., 2025) article focuses on the power of collective learning. The article delves into the foundational concept of community within the context of organizational culture and offers practical, actionable strategies for building inclusive and sustainable data literacy communities in an attempt to effectively bridge the data literacy gap.
In subsequent additions to this collection of articles, we will try to bring more of the content driven by the workshop, and hope this will generate interest from others in the data community to contribute with their own expertise and experiences.
I would like to thank all the workshop participants and the authors who worked on the published articles for their support of this initiative and their substantial efforts. I would like to extend my gratitude to Katie Malone for her support in preparing the workshop and initial review of the submitted articles, and Teymur Fatullayev for his invaluable contributions before, during, and after the workshop. I am also grateful to the teams at Institute for Experiential AI at Northeastern University and the Harvard Data Science Initiative for making this workshop a reality.
Hamit Hamutcu has no financial or nonfinancial disclosures to share for this editorial.
Accenture. (2020, January 16). The human impact of data literacy. https://www.accenture.com/us-en/insights/technology/human-impact-data-literacy
Bersin, J., & Zao-Sanders, M. (2020, February 12). Boost your team’s data literacy. Harvard Business Review. https://hbr.org/2020/02/boost-your-teams-data-literacy
Fayyad, U., Hamutcu, H., Moody, K., Mulani, N., Perlich, C., Kumar, R., Wing, J., & Yankov, D. (2018, August 19–23). Who is a data scientist? Defining the analytics profession and cutting out the hype and confusion. KDD-2018 Applied Data Science Invited Panel. The 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, London UK. https://www.kdd.org/kdd2018/applied-data-science-invited-panel
Fayyad, U., & Hamutcu, H. (2020). Toward foundations for data science and analytics: A knowledge framework for professional standards. Harvard Data Science Review, 2(2). https://doi.org/10.1162/99608f92.1a99e67a
Fayyad, U., & Hamutcu, H. (2022). From unicorn data scientist to key roles in data science: Standardizing roles. Harvard Data Science Review, 4(3). https://doi.org/10.1162/99608f92.008b5006
Hilty, E. C., Vilski, V., Mishra, S., Condon, P., & Gracia, S. M. (2025). Building an inclusive data literacy community. Harvard Data Science Review, 7(1). https://doi.org/10.1162/99608f92.d622eaff
Koloski, D., Porter, C., Almand-Hunter, B., Gatchell, S., & Logan, V. (2025). Data literacy in industry: High time to focus on operationalization through middle managers. Harvard Data Science Review, 7(1). https://doi.org/10.1162/99608f92.6f5dfc6f
NewVantage Partners. (2021, May 25). Big data and AI executive survey 2021. Retrieved April 19, 2023, from https://static1.squarespace.com/static/62adf3ca029a6808a6c5be30/t/639dd6725c2e623f729f148a/1671288435762/Big+Data+Executive+Survey+2021+Findings+Final.pdf
©2025 Hamit Hamutcu. This article is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the article.