Science has transformed our existence and has generated many advances improving human life and productivity. The digital revolution in science provides increasing opportunities for accelerating progress in disease prevention, diagnosis, and treatments. Robots are powering our cars and performing surgery. Sensors are being deployed globally but also locally within our bodies. Aspects of science have sped up and become enormously more efficient; the first human genome took 13 years and cost $1,000. These digital advances require data, lots of it, and science and society are responding. Molecular biology has led the way by making ubiquitous the instant sharing of genomic data on robust platforms. The speed with which new tests, vaccines, and treatments were developed for the current COVID-19 pandemic is a testament to the power of ready access of both humans and computers to data, information, and knowledge.
Nevertheless, the 21st century opened with numerous warnings that science could not be trusted, not only because findings were often politically charged, but because many of the conclusions from papers in scientific journals could not be reproduced. The publication of the findings by Amgen in Nature in 2012 regarding their inability to reproduce the vast majority of landmark studies in cancer biology (Begley & Ellis, 2012) thrust the issue of reproducibility to the forefront of biomedical science. Trust cannot be reestablished without the reproducibility issues being addressed. Reliability and reproducibility became watchwords in the second decade of the 21st century.
The realization of fundamental problems in the way that science was conducted and communicated helped to fuel and focus the open science movement. The open science movement is driven by the possibilities enabled by the internet for rapid distribution of scientific content and the ability to share not just narrative works, but also data and code (Bourne et al., 2012). However, the reward systems and publishing platforms have been slow to adapt to the possibilities of access of all scientific products to high performance computing. The reproducibility issues gave open science proponents the impetus needed to move from the fringe to the mainstream, as transparent access to study design, data, and code were seen as necessary steps to improve reproducibility and restore trust in the scientific system. To realize these benefits requires that scientific products be readily available and designed for use by both computers and humans. These requirements gave rise to the FAIR data principles (Wilkinson et al., 2016), which provide guidelines as to how digital products should be designed so as to be findable, accessible, interoperable, and reusable.
These issues were taken out of the laboratory and into the 21st century has taken these issues out of academia and catapulted them into the nightly news with the start of the pandemic in early 2020. Open sharing of data and tools and mobilization of the biomedical enterprise saw rapid development of knowledge about a virus named SARS-CoV2, its disease named COVID-19, its transmission, its treatment, and the fastest development of multiple vaccines ever seen. These developments further reinforced the sense among many researchers, science funders, publishers, and the public that a revolution was in progress, led by a remarkable degree of open access to information about the worst pandemic in a century. New communication channels for biomedicine such as preprint servers, online repositories, and cloud computing combined with urgent public health demands to make available data to support key findings. Access to pools of public data allowed data scientists and others to utilize their skills to help fight the pandemic; for example, the AlphaFold algorithm for predicting protein structure was made possible by the standard practice of structural biologists depositing their protein structures in public repositories (Higgins, 2021; Thornton et al., 2021). Both the successes and the barriers faced by researchers using data resources to confront the pandemic (Yehudi et al., 2022) have reinforced the sense that the biomedical enterprise must continue to change to make valid discoveries much faster.
Although the United States’ National Institutes of Health (NIH) had taken steps towards supporting data sharing in the past, with the issuance of issuance of its Data Management and Sharing Policy (Office of the Director, National Institutes of Health, 2020), NIH has laid out a sweeping challenge to biomedicine to accelerate the trends of the open science movement. The policy is expected to move science more fully toward open access, where the data that provides the basis of publications supported by public funds is made available for both further research and validation. All applications for funding of research will require a data management and sharing plan starting January 25, 2023(Office of the Director, National Institutes of Health, 2020). The scientific community is encouraged to seize the opportunity to reshape science for maximum support for researchers and maximum efficiency of knowledge accumulation for the benefit of humanity and planetary biology.
Toward these ends, a meeting was held virtually by the National Academies of Sciences, Engineering and Medicine on April 28 and 29, 2021, with the title “Changing the Culture of Data Management and Sharing.” The workshop was sponsored by the National Institutes of Health to hear from stakeholders across biomedicine on how to prepare for implementation of the Data Sharing and Management Policy and to explore what is needed to foster a culture shift toward broader and more routine sharing of scientific data. A focus of the meeting was assessing how ready the biomedical community is for this shift in terms of attitudes, infrastructure, training, and compliance enforcement. In other words, are we ready for 2023?
The workshop explored this question from many angles over the 2 days, hearing from more than 30 multidisciplinary speakers. The speakers were unanimous in their desire that the new policy not be a mere box-checking exercise, but that it be impactful. The tone adopted was largely a positive one, moving away from mere enforcement of policies and toward the benefits that can accrue to the researcher and society from this culture shift. Nevertheless, participants were clear-eyed in their understanding of the hurdles that must be overcome. Biomedicine does not yet have a culture that values data; our current incentive and reward systems are still based heavily on narrative works. To shift the culture requires not just tools and infrastructure, but metrics, credit systems, and respect for the importance of public data and the methods employed for its interpretation. Currently the knowledge and infrastructure exist, but are scattered across multiple stakeholders. The work and skills required to build infrastructure and manage, curate, and share data also must be recognized and professionalized to ensure that effective practices are developed and that the necessary workforce is available to provide support. Several participants emphasized that the focus and burden of this culture shift cannot be placed solely on the individual researcher, but must target the laboratories, the institutions, and the funders themselves. In fact, several participants noted that the ‘unit’ of data sharing should really not be the individual, but the laboratory (Martone & Nakamura, 2022). If data can be effectively shared within the laboratory, then it is more likely to be effectively shared outside of the laboratory. But ensuring that laboratories can meet the challenge will require investment of time and effort by the laboratory and money by institutions and funders. All stakeholders will require tools, training, and support from data professionals to see this policy through.
In this special HDSR collection on Changing the Culture on Data Management and Data Sharing in Biomedicine, workshop participants and their colleagues were asked to explore the issues raised in the workshop at a deeper level or to reflect further on the themes of the workshop. We provide a brief synopsis of the workshop, summarizing key takeaways from each session and the workshop as a whole (Martone & Nakamura, 2022). As the workshop concerned a U.S. policy, the majority of talks focused on issues and practices in the United States, but the workshop did include several perspectives from colleagues in the European Union working across large, mature data sharing projects. Our European colleagues were asked for their reflections on the workshop and lessons learned from data management and sharing in the European Union (Bjaalie et al., 2022). We were also fortunate to interview Dr. Lawrence Tabak, Acting Director of NIH, and Dr. Lyric Jorgensen, Acting Associate Director for Science Policy and Acting Director of the Office of Science Policy at NIH, on the importance of the new policy to NIH and updates on NIH activities and priorities since the workshop (Tabak et al., 2022).
Several workshop participants, along with their colleagues, contributed more detailed perspectives or considerations of many of the important topics explored in the workshop. Drs. Borgman and Bourne (2022) contributed an article on why it takes a village to manage and share data, exploring challenges at the institutional level in more depth than was considered at the workshop. The key workshop theme of data management as the gateway to data sharing is considered further by Borghi and Van Gulick (2022) explicitly highlighting the important role of effective data management as a driver of open science, followed by a case study and discussion on the impact of data management in the laboratory on data FAIRness and productivity by Dempsey, Foster, Fraser, and Kesselman (2022). Daniella Lowenberg of the Making Data Count initiative provides an article on what it will take to move forward on metrics such as data citation (Lowenberg, 2022). Such metrics are viewed as critical to the success of the Data Sharing and Management Policy both for developing a credit system to incentivize data contributors and to measure its impact. The workshop covered the legal and ethical challenges of sharing human subject data in session V. Shaping a Culture of Data Sharing—Reducing Barriers and Increasing Incentives. Session chair and speaker Kristen Rosati provides an in-depth consideration of navigating these issues when stewarding data or creating data management plans in this special collection (Rosati, 2022). Finally, Drs. Torres-Espín and Ferguson (2022) provide some additional perspectives on implementing FAIR that go beyond the current set of principles to consider how harmonization of data elements for individual participant data is a necessary step for data interoperability and reuse.
Together, the workshop materials and the articles in this special collection lay a foundation for researchers, funders, administrations, and other stakeholders to help prepare biomedicine for this exciting next phase built on routine and effective data management and sharing. So what might post-2023 biomedicine look like if the culture around data management and sharing successfully shifts as envisioned?
We should initially expect to see tangible outcomes such as a growing proportion of publication indicating availability of background data and required materials needed to check and support conclusions, increased FAIRness of data, and shortening of the average time to data sharing and reuse. Longer term, the success of the policy will be measured entirely in its impact: greater and more creative uses of shared data should emerge, while new careers and industries are launched based on the use of shared data. Ultimately, biomedical science and translation should advance faster.
The engine of this impact will be driven by a concomitant increase in academic prestige assigned to shared data and the act of sharing data. Honoring data sharing means data is valued as a first-class research output by researchers, societies, institutions, libraries, journals, and funders, and people can advance their career through data sharing in the same way they can through publication. Broader institutional support for data sharing culture will lead to better tools, better trained workforce, and sharing of information and examples across all of these stakeholder groups so that knowledge is not siloed. Within the laboratory, good data management and practice are default research practices that are willingly executed by researchers. The amount of time and effort required to do both well continues to decrease as the demand for ever better tools increases and investigators become trained in best practices. Experts in data management and sharing are part of the faculty of every major university, and training in data management and sharing has become a ubiquitous and standard part of training programs. These practices are supported by an expanding set of repositories sustained by dedicated funding streams.
Finally, data sharing will be supported by clear ethical guidelines and guardrails that help to build trust among researchers, research participants, and society at large. Infrastructures and policies will be in place to protect privacy and the rights of individuals. Researchers, institutions, and funders will develop norms and policies to manage risks associated with transparent sharing of data. Trust and equity are built on stable infrastructure, transparency, and respect for the roles that all stakeholders play in ensuring that the data needed to drive biomedical discovery is readily, securely, and safely available.
We realize that many researchers may respond to the new policy with trepidation, annoyance, or resignation; we are being asked to participate in a profound transformation of biomedical science and science communication. Responsible and impactful data sharing is a partnership across the research ecosystem, and we invite the community to participate in this effort. The pandemic gives us a glimpse of how open data and open science can benefit science and society, what can be achieved when data sharing and open science becomes the norm, and cautionary tales about what happens when data is not managed in a way where it can be shared (Mehra et al., 2020). We expect the same sense of urgency and spirit of shared purpose to infuse the day-to-day practice of biomedicine beyond global health emergencies. Data sharing should get easier as we practice and the infrastructure innovates and adapts to support these practices.
The authors wish to thank the workshop participants, particularly Drs. Neil Thakur and Carole Goble, for contributions to the content of this editorial.
Richard Nakamura has no financial or non-financial disclosures to share for this editorial. Maryann Martone is a founder and has equity interest in SciCrunch Inc, a tech start up that provides tools and services in support of rigor and reproducibility.
Begley, C. G., & Ellis, L. M. (2012). Drug development: Raise standards for preclinical cancer research. Nature, 483(7391), 531–533. https://doi.org/10.1038/483531a
Bjaalie, J. G., Goble, C., Sansone, S.-A., Nakamura, R., & Martone, M. (2022). Perspectives on data sharing and the new NIH policy from the European Union. Harvard Data Science Review, 4(3). https://doi.org/10.1162/99608f92.bcd0b999
Borgman, C. L., & Bourne, P. E. (2022). Why it takes a village to manage and share data. Harvard Data Science Review, 4(3). https://doi.org/10.1162/99608f92.42eec111
Bourne, P. E., Clark, T. W., Dale, R., de Waard, A., Herman, I., Hovy, E. H., & Shotton, D. (Eds.). (2012). Improving future research communication and e-scholarship [White paper]. Force11. https://osc.cam.ac.uk/files/force11manifesto20120219.pdf
Dempsey, W., Foster, I., Fraser, S., & Kesselman, C. (2022). Sharing begins at home: How continuous and ubiquitous FAIRness can enhance research productivity and data reuse. Harvard Data Science Review, 4(3). https://doi.org/10.1162/99608f92.44d21b86
Lowenberg, D. (2022). Recognizing our collective responsibility in the prioritization of open data metrics. Harvard Data Science Review, 4(3). https://doi.org/10.1162/99608f92.c71c3479
Martone, M., & Nakamura, R. (2022). Changing the culture on data management and sharing: Overview and highlights from a workshop held by the National Academies of Sciences, Engineering, and Medicine. Harvard Data Science Review, 4(3). https://doi.org/10.1162/99608f92.44975b62
Mehra, M. R., Ruschitzka, F., & Patel, A. N. (2020). Retraction-hydroxychloroquine or chloroquine with or without a macrolide for treatment of COVID-19: A multinational registry analysis.” The Lancet, 395(10240), Article 1820. https://doi.org/10.1016/s0140-6736(20)31324-6
National Academies of Sciences, Engineering, and Medicine. (2021, April 28–29). Changing the culture of data management and sharing: A workshop. https://www.nationalacademies.org/event/04-29-2021/changing-the-culture-of-data-management-and-sharing-a-workshop
Office of the Director, National Institutes of Health. (2020). Final NIH Policy for Data Management and Sharing (NOT-OD-21-013). https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-013.html
Rosati, K. B. (2022). Legal compliance and good data stewardship in data sharing plans. Harvard Data Science Review, 4(3). https://doi.org/10.1162/99608f92.5ff070bf
Tabak, L., Jorgenson, L., Martone, M., & Nakamura, R. (2022). Conversation with Dr. Lawrence Tabak and Dr. Lyric Jorgenson on the NIH perspective on data sharing and management. Harvard Data Science Review, 4(3). https://doi.org/10.1162/99608f92.b9e4ceec
Thornton, J. M., Laskowski, R. A., & Borkakoti, N. (2021). AlphaFold heralds a data-driven revolution in biology and medicine. Nature Medicine, 27(10), 1666–1669. https://doi.org/10.1038/s41591-021-01533-0
Torres-Espín, A., & Ferguson, A. R. (2022). Harmonization-information trade-offs for sharing individual participant data in biomedicine. Harvard Data Science Review, 4(3). https://doi.org/10.1162/99608f92.a9717b34
Wilkinson, M. D., Dumontier, M., Aalbersberg, Ij. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., … Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, Article 160018. https://doi.org/10.1038/sdata.2016.18
Yehudi, Y., Hughes-Noehrer, L., Goble, C., & Jay, C. (2022). COVID-19: An exploration of consecutive systemic barriers to pathogen-related data sharing during a pandemic. arXiv. https://doi.org/10.48550/arXiv.2205.12098
©2022 Maryann Martone and Richard Nakamura. This editorial is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the editorial.