Skip to main content
SearchLoginLogin or Signup

Understanding Diversity: Overcoming the Small-n Problem

Published onApr 28, 2022
Understanding Diversity: Overcoming the Small-n Problem
key-enterThis Pub is a Commentary on

Diversity, equity, and inclusion (DEI) in the realm of science, technology, engineering, and math (STEM) is intrinsically valuable and instrumentally beneficial—for all. Equal opportunity and access are hallmarks of democracy. Numerous studies across a variety of disciplines also show that attending to these criteria generates a return on investment and creates competitive advantage; in particular, broadening participation in STEM can increase creativity, productivity, and competitiveness (Bayer & Rouse, 2016). Yet, data show that improvement in representation has stagnated and underrepresentation persists in some STEM fields. The proportions of doctorates and faculty in science and engineering fields, for instance, do not equitably reflect U.S. demographics.

Thus, the grand challenge is to solve the small-n problem, whereby underrepresented groups are accurately counted and their intersections of identity (Acker, 2006) are given voice and visibility, while simultaneously ensuring the protection of individual privacy. As National Science Foundation director Dr. Sethuraman Panchanathan recently explained, “The gap between the demographics of the research community and the demographics of the whole nation” is a “missing millions” (Cohen 2021b) problem that requires “making the invisible, visible” (Cohen, 2021a). Further, a major impedance to developing actionable and effective policy interventions that are designed to improve equity for underrepresented groups is a lack of comprehensive and visible intersectional data. Making inferences about groups, without including relevant details concerning their lives and situations, leads to imprecise conclusions, which may generate solutions that only partially address a problem.

If we can achieve this aim, we can more precisely assess aspects of diversity and its benefits. Downstream consequences are numerous, including the ability to create more accurate and evidence-based policies to achieve equitable outcomes for those represented in the data. We can also identify contributions these populations make to benefit the broader public. Other advantages follow, one being inspiring a future workforce whose identities match those in small-n populations, thereby creating a virtuous upward spiral of growth and increased representation. Without these efforts, latent talent remains underutilized, thereby hindering innovation and limiting the expansion of scientific exploration and implementation of new discoveries in the marketplace.

Articles in this special issue laud the value and import of diverse communities and networks to fuel science and innovation and to promote equity. Chang et al. (2022) present the technical and organizational components of a linked “data mosaic” and show how these components have worked in evolving data architectures, which require trust and enticement toward interdependence. Jones et al. (2022) offer a concrete example of how an integrated data infrastructure can benefit society, extending lessons learned. Notably, their use of the “Five Safes Framework,” which addresses people, projects, settings, data, and output, is fundamental to maintaining privacy and trust among stakeholders and the public. These scholars emphasize the value of understanding the composition of the skilled technical workforce, “linking student information to labor market and business data,” demonstrating “simple relationships between education and training and labor market outcomes,” and examining the “gender wage gap.” Indeed, secure, publicly trusted, and linked data sets may enable researchers to see the layered and intersectional (Crenshaw, 1989) identities of our workforce, while also accounting for their dynamic nature. One dimension may become more salient, depending on context (Metcalf et al., 2018) and the organizational mechanisms at play, which may, for example, perpetuate gender pay gaps (Smith-Doerr et al., 2019).

An exciting vision for accelerating science and balancing “varied societal commitments to scientific advance, economic prosperity, and diverse participation” is cast by Sourati et al. (2022). These authors propose combining information on scientific networks and publications while leveraging machine learning. Referencing prior work on artificial intelligence (AI) algorithms, these scholars accounted for the impact of “the diversity and distribution of scientists” on the novelty and pace of scientific discovery, showing that advances are largely predicted by “coauthorship networks, suggesting that the path of scientific progress is heavily restricted by the communication of past experiences.” Unfortunately, we know that there are biases inherent in both AI and humans’ network formation. The latter challenge should be considered; the former ought to be addressed by augmenting methods that aim toward de-biasing these data and mechanisms (Zou & Schiebinger, 2018). Another danger is that such algorithms may reinforce the persistence of focus on scientific developments that target large-scale outcomes (especially in medicine), while neglecting those diseases that affect small populations. Stated differently: Does reliance on large observable networks, coupled with these models, inherently lead scientists away from tackling issues that affect small populations? 

Lane et al. (2022) make the case for a well-designed and sustainable government data ecosystem, which would be “the federal government’s third-most important computer innovation in history” (after the computer and Internet), while also achieving goals prescribed in the Foundations for Evidence-Based Policymaking Act (2019). The act mandates updates to federal data management, requiring data to be publicly accessible and useful for policy-making. Implementation of the law will enhance development of “tools and strategies [necessary] for assessing, understanding, and communicating the value” of government investments in science (Smith, 2022), including the importance and merit of supporting diversity in STEM. Such efforts would be wise to build upon approaches currently underway, such as the science of broadening participation (Fealing & McNeely, 2016; McNeely & Fealing, 2018), and focus on disciplines that are especially homogenous, like the field of economics (Bayer et al., 2020; Bayer & Rouse, 2016). Scorecard metrics for occupational parity have also been established (Heggeness et al., 2016; Myers & Fealing, 2012) and need not be reinvented.

Prospects for bettering science are exciting and there are many valuable ways to contribute. But the question remains: For whom are these data useful and to what ends are they being employed? Before we can understand and explain the specific value-add of DEI in STEM—and reap its virtuous and instrumental benefits, we must make the invisible visible, while vigilantly maintaining data privacy. We must constantly interrogate our thoughts and one another: Are we funding science for all or just science for the large-N

Disclosure Statement

Kaye Husbands Fealing and Aubrey DeVeny Incorvaia have no financial or non-financial disclosures to share for this article.


Acker, J. (2006). Inequality regimes: Gender, class, and race in organizations. Gender & Society, 20(4), 441–464.

Bayer, A., Hoover, G. A., & Washington, E. (2020). How you can work to increase the presence and improve the experience of Black, Latinx, and Native American people in the economics profession. Journal of Economic Perspectives, 34(3), 193–219.

Bayer, A., & Rouse, C. E. (2016). Diversity in the economics profession: A new attack on an old problem. Journal of Economic Perspectives, 30(4), 221–242.

Chang, W.-Y., Garner, M., Basner, J., Weinberg, B., & Owen-Smith, J. (2022). A linked data mosaic for policy-relevant research on science and innovation: Value, transparency, rigor, and community. Harvard Data Science Review, 4(2).

Cohen, A. (2021a, February 11). The future of competitiveness: Strengthening the Symbiosis of exploratory and translational research @ speed & scale. National Science Foundation.

Cohen, A. (2021b, February 18). NSF director lays out vision for future of U.S. science. American Association for the Advancement of Science.

Crenshaw, K. (1989). Demarginalizing the intersection of race and sex: A Black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics. University of Chicago Legal Forum, 1989(1), Article 8.

Fealing, K. H., & McNeely, C. L. (2016). Symposium on the Science of Broadening Participation. National Science Foundation, Science of Science and Innovation Policy Program.

Foundations for Evidence-Based Policymaking Act of 2018, Pub. L. No. 115-435, 132 Stat. 5529 (2019).

Jones, C., McDowell, A., Galvin, V., & Adams, D. (2022). Building on Aotearoa New Zealand’s integrated data infrastructure. Harvard Data Science Review, 4(2).

Heggeness, M. L., Evans, L., Pohlhaus, J. R., & Mills, S. L. (2016). Measuring diversity of the National Institutes of Health-funded workforce. Academic Medicine: Journal of the Association of American Medical Colleges, 91(8), 1164–1172.

Lane, J., Gimeno, E., Levitskaya, E., Zhang, Z., & Zigoni, A. (2022). Data inventories for the modern age? Using data science to open government data. Harvard Data Science Review, 4(2).

McNeely, C. L., & Fealing, K. H. (2018). Moving the needle, raising consciousness: The science and practice of broadening participation. American Behavioral Scientist, 62(5), 551–562.

Metcalf, H., Russell, D., & Hill, C. (2018). Broadening the science of broadening participation in STEM through critical mixed methodologies and intersectionality frameworks. American Behavioral Scientist, 62(5), 580–599.

Myers, S. L., Jr., & Fealing, K. H. (2012). Changes in the representation of women and minorities in biomedical careers. Academic Medicine, 87(11), 1525–1529.

Smith, T. L. (2022). Demonstrating the value of government investments in science: Why anecdotes alone are not enough. Harvard Data Science Review, 4(2).

Smith-Doerr, L., Alegria, S., Husbands Fealing, K., Fitzpatrick, D., & Tomaskovic-Devey, D. (2019). Gender pay gaps in US federal science agencies: An organizational approach. American Journal of Sociology, 125(2), 534–576. Sourati, J., Belikov, A., & Evans, J. (2022). Data on how science is made can make science better. Harvard Data Science Review, 4(2).

Zou, J., & Schiebinger, L. (2018). AI can be sexist and racist—It’s time to make it fair. Nature, 559(7714), 324–326.

©2022 Kaye Husbands Fealing and Aubrey DeVeny Incorvaia. This article is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the article.


No comments here

Why not start the discussion?