Skip to main content
SearchLoginLogin or Signup

Unlocking Government Data to Support and Evaluate Scientific Research: An Overview of Selected Papers from the Value of Science Conference

Published onApr 28, 2022
Unlocking Government Data to Support and Evaluate Scientific Research: An Overview of Selected Papers from the Value of Science Conference
key-enterThis Pub is a Commentary on

"Demonstrating the Value of Government Investment in Science: Developing a Data Framework to Improve Science Policy " by Tobin L. Smith (2022) makes a case for developing new frameworks for quantifying the impacts of government science funding. Opponents of government spending on science often use anecdotes about silly-sounding projects in order to characterize the work as wasteful. Proponents of science funding have countered by publicizing cases of silly-sounding research that in fact led to great advances and benefits to society. Advances in methods for analyzing the impacts of basic research together with an increased availability of longitudinal data across science policy and resulting outcomes, such as provided in response to the federal Foundations for Evidence-Based Policy Act of 2018 (2019), make this a fertile time to move from arguing by anecdote to making strong quantitative demonstrations of the value of science programs.

Developing such frameworks will require better access to relevant data held by a multiplicity of federal and state agencies. In 2009, was launched as an initial effort to create a single portal for researchers to access a wide variety of government data, but its lack of support for linking different data sources and generally awkward design limited its usefulness for researchers. "Data Inventories for the Modern Age? Using Data Science to Open Government Data" by Lane et al. (2022) describes a better approach to opening up government data that incorporates both social and technological practices. On the social side, the approach emphasizes incentives for the participating agencies: instead of being simply mandated to make their data available, reporting tools should be developed that make it easier for the agencies to carry out their primary missions, while at the same collecting the data needed for researchers. On the technological side, machine learning and natural language processing can be used to make those tools easier to use by agencies and the results of data collection easier to link and analyze by researchers.

Craig Jones's (2020) article "Building on Aotearoa New Zealand’s Integrated Data Infrastructure" provides a case study of the power of incentivizing rather than mandating the collection and linking of government data. Each year, the nation's Ministry of Social Development creates a report on the performance of its benefits (welfare) system. Integrating benefit data with that on education and workforce led to more insightful analyses and helped the ministry improve its performance. An important reason for the success of the effort, and one that is generally applicable to other efforts, is that the ministry began with a simple use case that was easy for other agencies to understand, and then built outward from this successful example. This ‘start small’ approach is in stark contrast to that of, which began by mandating the collection of a huge variety of data with no particular use case in mind.

Turning back to the case of science funding policy, "Data on How Science Is Made Can Make Better Science" by Sourati et al. (2022) describes how two different kinds of data can be integrated to predict which agency awards are most likely to lead to high-impact results. The first kind of data are scientific papers that result from funded research and their associated citations and references; that is, the stuff of traditional bibliometric analysis. The authors call this ‘content’ data. The second kind of data are about how science is organized in terms of scientists, institutions, fields of study, conferences, and so on. This second kind is ‘context’ data. The combination of the two kinds of data enabled the discovery that research was particularly impactful when it employed methods developed in one field of research to problems from another field of research; in other words, fresh views of a problem led to more breakthroughs.

The final article, "A Linked Data Mosaic for Policy-Relevant Research on Science and Innovation: Value, Transparency, Rigor, and Community" by Chang et al. (2022), sketches a general answer to the challenge presented by our first article, namely, a new framework for quantifying the impacts of research funding. The framework rejects both hierarchical data organization and management, as is traditional in government and industry, and the fully distributed, loosely organized approach as commonly practiced by individual academic researchers. The authors name the new approach a "data mosaic." Data mosaics are collectively designed architectures that deeply link and integrate a wide range of data sources and are supported by a decentralized but interdependent community of government, academic, and industry data producers, integrators, and analyzers. A mosaic cannot be built through mandates or by depending on the altruism of individual contributors; as several of the other articles also observe, data access efforts succeed only when all participants benefit. The article describes two early efforts to build data mosaics, one based at the University of Michigan Institute for Research on Innovation and Science and another at the National Science Foundation National Center for Science and Engineering Statistics. 

Disclosure Statement

Henry Kautz has no financial or non-financial disclosures to share for this article.


Chang, W.-Y., Garner, M., Basner, J., Weinberg, B., & Owen-Smith, J. (2022). A linked data mosaic for policy-relevant research on science and innovation: Value, transparency, rigor, and community. Harvard Data Science Review, 4(2).

Foundations for Evidence-Based Policymaking Act of 2018, Pub. L. No. 115-435, 132 Stat. 5529 (2019).

Jones, C., McDowell, A., Galvin, V., & Adams, D. (2020). Building on Aotearoa New Zealand’s integrated data infrastructure. Harvard Data Science Review, 4(2).

Lane, J., Gimeno, E., Levitskaya, E., Zhang, Z., & Zigoni, A. (2022). Data inventories for the modern age? Using data science to open government data. Harvard Data Science Review, 4(2).

Smith, T. L. (2020). Demonstrating the value of government investment in science: Developing a data framework to improve science policy. Harvard Data Science Review, 4(2).

Sourati, J., Belikov, A., & Evans, J. (2022). Data on how science is made can make science better. Harvard Data Science Review, 4(2).

©2022 Henry Kautz. This article is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the article.

1 of 3
No comments here
Why not start the discussion?