Skip to main content
SearchLoginLogin or Signup

Recognizing Our Collective Responsibility in the Prioritization of Open Data Metrics

Published onJul 28, 2022
Recognizing Our Collective Responsibility in the Prioritization of Open Data Metrics
·

Abstract

With the rise in data-sharing policies, development of supportive infrastructure, and the amount of data published over the last decades, evaluation and assessment are increasingly necessary to understand the reach, impact, and return on investment of data-sharing practices. As biomedical research stakeholders prepare for the implementation of the updated National Institutes of Health (NIH) Data Management and Sharing Policy in 2023, it is essential that the development of responsible, evidence-based open data metrics are prioritized. If the community is not mindful of our responsibility in building for assessment upfront, there are prominent risks to the advancement of open data-sharing practices: failing to live up to the policy’s goals, losing community ownership of the open data landscape, and creating disparate incentive systems that do not allow for researcher reward. These risks can be mitigated if the community recognizes data as its own scholarly output, resources and leverages open infrastructure, and builds broad community agreement around approaches for open data metrics, including using existing standards and resources. In preparation for the NIH policy, the community has an opportune moment to build for researchers’ best interests and support the advancement of biomedical sciences, including assessment, reward, and mechanisms for improving policy resources and supportive infrastructure as the space evolves.

Keywords: open data, data metrics, research evaluation, research policy


Media Summary

Sharing of research data through their publication at data repositories has gained increasing traction over the last two decades. The National Institutes of Health (NIH) has announced an update to its data management and sharing policy, requiring responsible management and availability of data produced from NIH research funds. In preparing for the implementation of this updated policy, which will go into effect in 2023, research-supporting stakeholders—data repositories, journal publishers, institutional libraries and offices of research, funding agencies, and others—have a responsibility to build supportive infrastructure and provide resources for biomedical researchers to comply with and benefit from this policy. One aspect of this preparation that must be prioritized is open data metrics, ways of evaluating the reach, impact, and return on investment of biomedical research data-sharing practices and the investments in supporting the policy. The building blocks of open data metrics, such as data citation and data usage infrastructure, are in development and require broad community buy-in and resourcing. There are real and immediate risks if open data metrics are not prioritized or valued and open data metrics development is not included as key responsibilities of research-supporting stakeholders. These risks include failing to realize the potential and goals of the policy, allowing commercial entities misaligned with open science goals to gain a leadership position in the open data landscape, and ineffectively building in researcher incentives and reward. While these risks exist, they can be mitigated by emphasizing the need for open infrastructure and community agreement around evidence-based and open approaches that are readily available. Broad biomedical science communities have an opportune moment to drive social change and to advance discovery through data sharing; it will require assessment and evaluation of infrastructure to ensure success in meeting this moment.


1. Introduction

Over the last decades, sharing data through publication in repositories has increased, although variably across disciplines (Tedersoo et al., 2021). This uptake can be attributed not only to increased funder, publisher, and institutional policies requiring research data be made openly available, but also to the fact that reproducible, transparent science is increasingly valued across biomedical science domains and, more broadly, in the public eye. Drivers and incentives for researchers remain ambiguous or lacking, but there are undoubtedly fewer barriers for researchers to share their data in 2021 than a decade ago. A feedback loop has been created in which an increase in new and updated policies and perceived benefits has led to increased amounts of open data, and as data publishing becomes more normalized, policies are able to mature and advance requirements. 

In October 2020, the National Institutes of Health (NIH) released the final update to their data-sharing policy, which is set for implementation by 2023 (Office of the Director, National Institutes of Health, 2020). The policy requires more robust data management plans, data that has ethical and legal approvals must be shared in a variety of approved repositories, and provides clear signals to research institutions ways in which NIH would like to better enforce compliance. This policy update has been desired by biomedical research stakeholders invested in open and responsible research practices (e.g., libraries) over the last decade. Though it’s generally supported, and stakeholders globally weighed in on past drafts of the policy, it will require a large investment in resources and capacity to support proper implementation. The basic infrastructures necessary for implementation of this policy have become more robust and interoperable over the last decade—services like DMPTool for data management plans, more comprehensive resources for ‘FAIR (Findable, Accessible, Interoperable, Reusable) data’ and repositories, and mappings of connected research through persistent identifiers (Wilkinson et al., 2016). Further infrastructure for compliance checking will need to be developed, and sustainability for popular and essential infrastructures will need to be addressed as data-sharing practices increase. The investment required to support such a policy update is inherently sound as it drives a common goal: ensure open access to information necessary to advance science. 

We are at a pivotal moment. Research culture continues to shift toward more open practices, due in large part to federal public access policies like at the National Science Foundation (2015.) and the upcoming NIH Data Management and Sharing Policy (Office of the Director, National Institutes of Health, 2020), as well as to recent support in the current U.S. administration for open information (The White House, 2021a), with an emphasis on research integrity (The White House, 2021b) and increased funds for sciences (Mervis, 2021). As research stakeholders build up capacity and resourcing to support the hopeful increase in research data that is shared and made openly available, it is essential that evaluation infrastructures are prioritized.

In preparation for the 2023 policy, emphasis has largely been placed on support for repositories, grantees, and institutional research offices/libraries that will be supporting grantees in compliance with the policy. In order for the NIH and supporting stakeholders to realize the potential and goals of this policy, biomedical and life sciences researchers and research stakeholders need responsibly created and uniformly embraced open data metrics: transparent and auditable ways to evaluate and assess the reach, impact, and return on investment of data sharing behavior and resources (Lowenberg et al., 2019).

2. Data Metrics Are Complex

Successful implementation of this policy relies on compliance and an understanding of the return on investment, so infrastructures can better respond to needs based on lessons learned. If support for this policy includes a collective investment in substantial amounts of resources, how do we know that it worked? This assessment should be understood for each stakeholder, beyond the NIH itself, including institutions (libraries, offices of research), repositories, publishers, metadata, and persistent identifier organizations, and so on. The evidence behind these assessments needs to be transparent and credible, and broadly accessible to all involved. The NIH policy is based on the principles that people should have access to facts and underlying information required for data-driven and evidence-based decision-making, policy, and scientific advancement. Opening up research data includes metadata, and metadata about open data includes connections, citations, usage, and other points required for assessment that cannot be behind closed walls. 

Two critical components of the NIH Data Management and Sharing policy are compliance and incentives. While this is a mandate and seen as a ‘stick’ as opposed to a ‘carrot,’ it’s still important that the policy is embraced and the return on investment of complying with the policy is understood and valued by researchers and research-supporting stakeholders. To make these types of value assessments necessitates leveraging and prioritizing current open infrastructure and standards for open data metrics. 

There is a long-running question if ‘credit’ exists for publishing open data, and whether data citations are a form of credit is the subject of debate (Morrisette et al., 2020). For there to be a reward and credit system, there needs to be an agreed-upon mechanism for understanding the reach and impact of public data. Often, people turn to data citation as this form of credit. As Borgman et al. (2020) point out, data citation is not currently accepted as the same form of credit as an article citation (Borgman, 2016). Further, on the implementation side, data citation is not common practice. 

We do not currently have a sense for how many data citations exist (Lowenberg, 2021). Like all citations in journal articles, they need to be marked up and properly exposed in the article metadata. Unfortunately, this practice is not routinely applied for data citations. In fact, many data citations are stripped from reference lists or the underlying metadata of articles (Lowenberg et al., 2021). This lack of support for data citations by publishers leads to a majority of citations being neither findable nor countable. 

The Principles of Open Scholarly Infrastructure, a set of community guidelines for organizations and resources to follow in responsible governance, sustainability, and insurance, outline requirements for metadata to be open and accessible (Bilder et al., 2015b). It may not be obvious to researchers why the markup of their metadata in journal articles may be so important, but until journals prioritize the implementation of proper practices for data citations, and these are exposed openly, we will not have findable citations. Without that understanding of citations that exist, it would be impossible to assign any sort of credit or weight to randomized citations that are currently available. 

Before data citations could be an eligible component of ‘credit’ or reward, there must be contextualization of the citations between subdisciplines of science. Bibliometrics, scientometrics, and informetrics experts are keen to work with these citation data to understand researcher behavior: how data is reused and how citations may vary across disciplines, career status, and other factors. This type of research will be essential to assign any sort of meaning to the citations. It is important to avoid making assumptions across disciplines, taking a raw count of citations to evaluate a researcher’s performance when the size of the field or publishing practices in that field may alter citation habits. This research could even posit that citations are not the correct indicator for certain types and disciplines of data. But before that research can influence the open data space, repositories need to increase the quality of metadata they house, publishers need to better support data citations, and we as a larger research-supporting community need to better advocate for researchers to cite data and promote their own data publications (Ninkov, 2021). 

Related to data citation, there is a need for understanding the usage of data. Based on experience with usage and citations for journal articles, we can assume that usage and citations for data set may not entirely be correlated; for instance, an increase in usage may not predict an increase in citation (Chi & Glänzel, 2018). Traditionally, usage has been understood as views and downloads. As mentioned above, without bibliometrics research, it is not possible to determine the right indicators for data, but that does not negate the need for understanding and supporting open infrastructure for data usage counts. 

Above all, it is essential that the community come together and agree that not only are data metrics a priority, but also that data metrics are open, transparent, and auditable. There are risks to the biomedical sciences communities and research-supporting stakeholders if we do not focus on the development of these metrics now.

3. Acknowledging Potential Risks

The rollout of the NIH policy gives the community an opportune moment to have the attention of biomedical researchers and make an impact in advancing science with the policy. All stakeholders involved in the implementation and support of this policy have a responsibility to implement infrastructure, resourcing, and capacity building as comprehensively and thoroughly as possible. There are risks to the community if open compliance, reward, and metrics systems are not embraced as a priority set of infrastructures in support of the policy.

3.1. Risk 1—Not Living Up to the Spirit of the Policy’s Intended Goals

If we don’t prioritize the development of open data metrics up front, we could look back on the investments made in support of the policy and recognize that we don’t have ways to evaluate the efforts. 

Evaluation is necessary both for the researchers themselves looking for compliance and reward systems and for the stakeholders supporting the policy to understand where pivots need to be made, what needs to be changed to better support the policy, and where investments are not worth the resources applied. Agreement on basic open data metrics are necessary before repository-level, institutional-level, NIH-wide, and other metrics can be definitive and reliable. The intended goals of this policy are to advance science through increased open data–sharing practices and responsible data management of NIH funded research. Without thinking through the infrastructure necessary for evaluations of return on investment in the infrastructure, resources, and capacity necessary for supporting the policy as well as for tracking the trends and impacts of researcher practices, the community risks not living up to the full potential of the policy.

3.2. Risk 2—Losing Community Ownership of the Open Data Space

Metrics inherently alter behavior (Edwards & Roy, 2017). If data metrics are built on a set of systems not owned or operated by the community and not carried out in open, auditable ways through transparent and open infrastructure, we risk losing control of driving open data best practices that are best aligned with the goals of the policy.

In the context of this article, infrastructure refers broadly to knowledge infrastructures, such as underlying services, protocols, standards, and tools that support data sharing and reuse that may be visible or invisible to researchers (Borgman et al., 2020). Open infrastructure is defined here as supportive systems, structures, and services that comply with or strive to comply with the principles of open scholarly infrastructure: open and community governance, transparency in sustainability and operations, and insurance (Bilder et al., 2015a, 2015b). 

Transparency and auditability, two factors of open infrastructure, are crucial for trust. Concerns around journal impact factor and other traditional article metrics are often based on the fact that the underlying data are not available or auditable. Researchers themselves may not consider these factors in choosing where they look for metrics; they are often looking for the quickest most ‘trustworthy’ site or are driven by uniform tenure and promotion requirements. That trust may be instilled simply by having others in their field reference a site. As the notion of metrics for evaluating research data is fairly new, there is an underlying race for becoming the trusted source of information for data usage and citation. 

Commercial vendors have been marketing data metrics, with proprietary algorithms for how they’ve calculated impact and citations. This lack of transparency means there is no way to validate and check the accuracy of these data or replicate them across various stakeholders. This situation should ring alarms for institutions, repositories, publishers, and others that will be spending massive resources to support their researchers in policy compliance. Repositories and supportive data services have noticed their dependencies on these commercial systems and have acted, for instance in Zenodo’s cancellation of using Altmetric.com for altmetrics (Chodacki et al., 2020).

If we as a community do not move quickly to support the open infrastructure that has been built and can be leveraged and that allows for audits of how data are counted and transparency on the sources of these data, we will allow for proprietary/closed infrastructures to gain and take hold of data metrics, causing further inequity across the open data space. We have seen a similar scenario with questions raised around why Web of Science has a different count for citations of a data set from the repository itself (which relies on DataCite and Crossref, open infrastructure that supports repositories and publishers, respectively). Researchers should be spared any confusion about reliable sources or the discrepancy in those sources. Importantly, we need data metrics infrastructures to be reliable if the community will be looking at long-term, consistent investment analyses.

3.3. Risk 3—Creating Disparate Incentive and Reward Systems

If we fail to agree on the approach for open data metrics, and metrics are not established uniformly across disciplines, we risk a lack of standardization and it will not be possible for trusted or interoperable credit systems to exist.

Although indicators for impact and reach of research data may vary across disciplines, it is important that at the NIH level, institution level, and repository level there are agreed-upon mechanisms for assessing policy implementation and researcher behavior. If high-level open data metrics infrastructure and approaches are not agreed on by the community in the same ways that the concept of FAIR data has been universally accepted, the result will be inconsistency in how researchers are evaluated. There will always be variations in how metrics are applied across disciplines, subdomains, and so on, but for the sake of there being one policy, it is important that assessments are available across broad implementations and are interoperable and comparable. To do so, research-supporting stakeholders involved in the implementation of infrastructure for the policy, including metrics providers, need to come to an agreement on approaches for open data metrics, leveraging what has been built, instead of each creating metrics systems in silo.

4. A Better Future

For the level of investment that we are marking in support of this policy, we do not want to fall prey to these risks. A better future is possible and can be accomplished so long as we consider three proposed principles in implementation of the policy.

4.1. Build for Data as Its Own Entity

To value open data, and to ensure we do not repeat mistakes that are not favorable to the advancement of science, we must acknowledge the inherent value of data on their own. 

As we continue to place value on data with policies and the notion of credit systems, we need to recognize that data are meritable and valued scientific outputs that are related to and not dependent on articles. And so, the metrics evaluating data must be different. Data are complex and data metrics need to account for variations in data such as granularity, versioning, derived data, and other factors (Lowenberg et al., 2019). A first step in this recognition is understanding the importance of assigning interoperable and citable persistent identifiers like DOIs for biomedical data sets. DOIs have not typically gained much traction across the biomedical sciences as the National Center for Biotechnical Information (NCBI) and European Molecular Biology Laboratory's European Bioinformatics Institute (EBI) NCBI and EBI repositories use accession numbers, but the lack of DOIs has caused inconsistencies in being able to find and track citations to these outputs. Biomedical repositories should consider shifting to DOI-based identifiers for data sets as data sharing increases with the implementation of the policy, so allowing researchers to more uniformly cite and find citations to these data.

Next, we must hold back from defaulting to article metrics such as impact factor or h-index for data that could be detrimental to the open data space. These indicators have long been regarded as faulty (Waltman & Eck, 2012) and have had consequences in the open access movement, which we should squarely avoid for data (Haustein, 2012). The traditional h-index as well as the impact factor do not field normalize and thus favor more ‘productive’ disciplines, which could have negative effects on various disciplines of researchers publishing data. Metrics for research data should incentivize open data publishing in best practice formats, with reusable metadata, in open disciplinary specific or general homes. The decision on where and how to publish data should be influenced by best practices in scientific disciplines as opposed to commercial gains or competition for data to be accepted. When data metrics are available, they should have a driving goal of promoting (and rewarding) researchers for making their data as open and reusable as possible.

4.2. Prioritize Open Infrastructure

Increasingly, diverse science communities are recognizing that “open science is good science” (Wright, 2020). The values that underlie this statement are applicable to understand that open infrastructure is good infrastructure. Best practices in research are to have an audit trail, to open data in FAIR ways, and to share transparent protocols about how the experimentation can be replicated. Researchers following these best practices may still file for patents off of their discoveries or build clinical/therapeutic services off of their findings. In following these practices, however, the essential underlying research information and data are open. Similarly, essential infrastructures across the research landscape are best supported when they are open, providing central open sources of information. Open infrastructure allows for value-add services to be built on top of open information while still providing broad public access and trust in the sustainability of the source.

The NIH data policy may be relatively static, but supportive infrastructures and services will need to be routinely iterated in response to the evolving needs of researchers. The principles of open scholarly infrastructure point to avoiding vendor lock-in and ensuring that infrastructure can be forked and replicated (Bilder et al., 2015b). Doing so will protect necessary and core infrastructure from vanishing, leaving a gap in the research-supporting ecosystem that is destructive to the research landscape. 

Often, priorities change and infrastructure providers need to pivot. Services that were once resourced may no longer be an organizational priority and services can be shuttered (sometimes, immediately). This evolution happens regularly and is how business is designed. Vendors are built to reevaluate and focus their pivoting based on short-term gains. Proprietary infrastructure is meant to lock in customers, which creates a dependency that cannot be trusted. When this has happened in the past, for instance with Microsoft Academic Graph, it has left wide gaps exposing our dependency on proprietary information (Microsoft Research, 2021). 

Open infrastructures provide trust through transparency in operations, governance, and systems. Importantly, open infrastructure can be forked and copied. If it becomes necessary to shut down open infrastructures, it should be possible for the data and service to be reproduced and sustained by other communities (Bilder et al., 2015b). 

Looking specifically at infrastructures that are necessary to support open data infrastructure for the NIH policy, it is essential that metrics providers are built on open infrastructure that can gain broad community and researcher trust. This statement does not imply that commercial companies are excluded, it means that the underlying infrastructure for each of these metrics services should comply with principles of open infrastructure so as to not fall into traps that the scholarly communications space has faced in which the trusted data provider shuts down and cannot be replicated (Microsoft Research, 2021). For the level of investment that is required for proper support of NIH policy implementation across the landscape, there should be consistent trust in the availability of these essential systems.

We have an opportunity now, with the development of open data metrics, to instill community trust in open infrastructure that can be audited and held accountable routinely into the future. Despite significant public infrastructure funded by NIH and other federal agencies, private companies play a large role in the life and biomedical sciences. The ‘currency’ of good science remains journal articles and the majority of these journals are controlled by private entities. This emphasis for data metrics is not to say that commercial and competitive systems should not be built or used, but they should provide additional features on top of openly accessible raw data for metrics. Proprietary services should continue to be built on top of these data that provide extra value, as is done in biomedical research with biotech companies leveraging open data sources (e.g., data in NCBI repositories). Our central sources of information around open data metrics should follow the same principles of the NIH open data policy: open and broadly accessible for reuse. 

4.3. Build Community Agreement Around Open Data Metrics Principles and Frameworks

Standards, policies, and various infrastructure components have been built to support the development of open data metrics over the last decade. To enable the community to move forward, these components must be leveraged and broadly implemented, as opposed to continuing to build new competing infrastructures that distract from shared goals of evaluating the reach of open research data.

Various stakeholders have a role in these development efforts, namely, publishers and repositories supporting biomedical research. Publishers should implement best practices, as developed by Scholix (https://scholix.org), Crossref, and DataCite (Make Data Count, 2021), to support proper indexing and aggregation of data citation. It is also required that, following these standards, data citations are shared in open ways. It brings no benefit to keep these citations closed or sold as proprietary information. The need for open citations has largely been publicized by the Initiative for Open Citations (I4OC) (https://i4oc.org), an advocacy group launched in support of ensuring that even closed access articles could expose citation lists in articles openly. Similarly, data citations in journal articles should be openly exposed and marked up properly so they can be used for aggregation and informetrics research. When researchers deposit data to repositories, they should have access to understanding the citations of that data set, and funders, institutions, and other stakeholders should be able to see the reach of their researcher’s work. 

It is important to highlight that not all citations are found in journal articles. References to data are found across all published information—government documents, policies, books, news, and so on. Groups like Coleridge Initiative, who have hosted the Kaggle competition “Show US the Data” (Coleridge Initiative, 2021), have investigated ways to mine references to data through natural language processing and machine learning. These techniques are going to be a primary driver of understanding the reach of publicly available data and require community support and resourcing. All references to data should be made openly available for aggregation through open source hubs to ensure they are broadly accessible. 

For repositories, data usage has typically had a less defined framework of standards developed as data citations, and often the inconsistency is in the way of counting. The Make Data Count (https://makedatacount.org) initiative partnered with COUNTER, a publisher and library organization, traditionally focused on the counting of metrics for journal articles, to write a COUNTER Code of Practice for Research Data (Project Counter, 2018). It is important that the diverse repository ecosystem count data usage under standardized criteria, using this open standard to ensure that inflated numbers by bots and those looking to game the system are not popularized. Broad adoption of this standard by repositories, especially across generalist and discipline specific, in this case, NIH-supported repositories, is important for assessing the return on investment of NIH-funded research. In combination with citations, the usage information should help to demonstrate the value in opening and publishing data in repositories under the policy. Building trust in open and transparent ways for counting and displaying usage of data is an immediate priority to prevent silos of companies with proprietary algorithms locking in researcher trust and dependency. Arguably, standardized counting/reporting for usage should be prioritized over citation, especially to biomedical data repositories, as usage does not depend on publishers and indexing services and it provides a lower level information (e.g., you have to use data before you can cite it, could be more straightforward to understand data usage). 

Lastly, accounting for and resourcing support for bibliometrics and informetrics research is necessary to ensure that open data metrics are guided by evidence-based studies, understanding which indicators are right for which types of data, and for higher-level investment analysis. Community agreement on open data metrics infrastructure includes agreement on the value and necessity of bibliometrics research. Doing so will prevent a situation in which the community agrees on and applies data metrics before they are sufficiently mature, better aligning development with the Leiden Manifesto (Hicks et al., 2015). During this research phase of open data metrics development, while metrics are not understood enough to be used for funding, tenure, and other decisions, we need to come up unilaterally with an intermediary goal, achievable in the next 12–24 months leading up to the implementation of the NIH policy. This could be, for example, to focus on the availability of open data citation and usage counts following defined sets of criteria (e.g., Scholix and Make Data Count mentioned above). Based on previous work (Borgman, 2015), it is understood that data look different across fields, so it is important that bibliometrics research look at benchmarks that may be available within specific fields for what a count of citations means. One way to do so would be through percentiles or an actual versus expected ratio (acknowledging that these methods have flaws as well). Acknowledging the need to drive metrics by evidence, and specific to research fields, to best align with intended researcher behavior, will help to alleviate the urge to jump to metrics once created for journals that may disincentivize best practices for open data.

5. Conclusion

We have an opportunity now to drive widespread adoption of best practices and to support open infrastructure and initiatives that have been built up to support the NIH policy and beyond that are trustworthy and can help us meet shared goals around access to information. A required component of these efforts is to prioritize, resource, and agree upon open data metrics frameworks that are aligned with the spirit of the NIH open data policy. Recognizing and leveraging its strength in capturing stakeholder buy-in, the NIH can set an example for other federal agencies (e.g., NSF) by using this policy rollout to prioritize open data citation infrastructure.

If we do not move quickly, under these premises, we risk losing an optimal moment with biomedical researchers—at a pivotal point where we have their attention and can implement a system of resources, compliance, and evaluation mechanisms that align with best intentions and drive forward biomedical research. There is real risk of implementing the policy and metrics irresponsibly, but solutions like prioritizing and emphasizing the need for open infrastructure, leveraging resources already built, and community agreement are achievable. 

Investments made in open data metrics can continue to be embraced and more broadly supported by biomedical stakeholders. If done right, biomedical researchers should not have to understand the complexities of the infrastructure, but rather can and will trust what has been set up and be incentivized by the evaluation systems. As a community of infrastructure providers, institutions, publishers, and funders, we have a responsibility to support biomedical researchers in this capacity. Part of that support means resourcing and sustaining open infrastructure and experts who will lead these efforts in developing responsibly created and broadly accepted open data metrics.

Specifically, in line with the policy wording itself, these systems should be based on open, transparent, and reusable infrastructures. This is not about ideology, or being open for the sake of openness; these requirements are in support of strong reliability, resilience, stability, sustainability, and trust in the large investment required to support various implementations of the NIH open data policy. 

As we build capacity for supporting the implementation of the policy, we must ensure we are consistently aligned with the intentions of the policy and use metrics responsibly to ensure we are iterating on our implementations to best meet the needs of researchers in advancing scientific discovery. 


Acknowledgments

The author wishes to thank Dr. Martin Fenner, Dr. Jennifer Lin, Dr. Stefanie Haustein, Dr. Lisa Federer, Dr. Cameron Neylon, and Dr. Christine Borgman for foundational discussions around topic areas and premises discussed in the manuscript as well as input on the preparation of this manuscript.

Disclosure Statement

The author is the Principal Investigator of Make Data Count, an initiative that advocates for the development of open data metrics, including using open infrastructure for data citations and data usage.


References

Bilder, G., Lin, J., & Neylon, C. (2015a). What exactly is infrastructure? Seeing the leopard’s spots. figshare. https://doi.org/10.6084/M9.FIGSHARE.1520432.V1

Bilder, G., Lin, J., & Neylon, C. (2015b). Principles for open scholarly infrastructures-v1. figshare. https://doi.org/10.6084/m9.figshare.1314859.v1

Borgman, C. L. (2015). Big data, little data, no data: Scholarship in the networked world. MIT Press.

Borgman, C. L. (2016). Data citation as a bibliometric oxymoron. De Gruyter Saur. https://www.degruyter.com/document/doi/10.1515/9783110308464-008/html

Borgman, C. L., Darch, P. T., Pasquetto, I. V., & Wofford, M. F. (2020). Our knowledge of knowledge infrastructures: Lessons learned and future directions. eScholarship. https://escholarship.org/uc/item/9rm6b7d4

Chi, P.-S., & Glänzel, W. (2018). Comparison of citation and usage indicators in research assessment in scientific disciplines and journals. Scientometrics, 116(1), 537–554. https://doi.org/10.1007/s11192-018-2708-8

Chodacki, J., Fenner, M., & Lowenberg, D. (2020, July 10). Open metrics require open infrastructure. Make Data Count. https://makedatacount.org/2020/07/10/open-metrics-require-open-infrastructure/

Coleridge Initiative. (2021). Show us the data. Kaggle.
https://kaggle.com/c/coleridgeinitiative-show-us-the-data

Edwards, M. A., & Roy, S. (2017). Academic research in the 21st century: Maintaining scientific integrity in a climate of perverse incentives and hypercompetition. Environmental Engineering Science, 34(1), 51–61. https://doi.org/10.1089/ees.2016.0223

Haustein, S. (2012). Multidimensional journal evaluation: Analyzing scientific periodicals beyond the impact factor. De Gruyter Saur. https://doi.org/10.1515/9783110255553

Hicks, D., Wouters, P., Waltman, L., de Rijcke, S., & Rafols, I. (2015). Bibliometrics: The Leiden Manifesto for research metrics. Nature520(7548), 429–431. https://doi.org/10.1038/520429a

Lowenberg, D. (2021). Data citation: Prioritizing adoption. Zenodo. https://doi.org/10.5281/ZENODO.4726087

Lowenberg, D., Chodacki, J., Fenner, M., Kemp, J., & Jones, M. B. (2019). Open data metrics: Lighting the fire. Zenodo. https://doi.org/10.5281/ZENODO.3525349

Lowenberg, D., Lammey, R., Jones, M. B., Chodacki, J., & Fenner, M. (2021). Data citation: Let’s choose adoption over perfection. Zenodo.

https://doi.org/10.5281/ZENODO.4701079

Make Data Count. (2021, January 29). Data citation. https://makedatacount.org/data-citation/

Mervis, J. (2021, April 6). Biden, Congress roll out big plans to expand National Science Foundation. Science. https://www.sciencemag.org/news/2021/04/biden-congress-roll-out-big-plans-expand-national-science-foundation

Microsoft Research. (2021). Next steps for Microsoft Academic—Expanding into new horizons. (2021). Microsoft. https://www.microsoft.com/en-us/research/project/academic/articles/microsoft-academic-to-expand-horizons-with-community-driven-approach/

Morissette, E., Peters, I., & Haustein, S. (2020). Research data and the academic reward system. Zenodo. https://doi.org/10.5281/ZENODO.4034585

Ninkov, A. (2021, July 14). Data citations in context: We need disciplinary metadata to move forward. Make Data Count. https://makedatacount.org/2021/07/14/data-citations-in-context-we-need-disciplinary-metadata-to-move-forward/

National Science Foundation. (2015). Nsf15052 public access plan: Today’s data, tomorrow’s discoveries: Increasing access to the results of research funded by the national science foundation. https://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf15052

Office of the Director, National Institutes of Health. (2020). Final NIH Policy for Data Management and Sharing (NOT-OD-21-013:). National Institutes of Health. https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-013.html

Project Counter. (2018). Code of Practice for Research Data. https://www.projectcounter.org/code-of-practice-rd-sections/foreword/

Tedersoo, L., Küngas, R., Oras, E., Köster, K., Eenmaa, H., Leijen, Ä., Pedaste, M., Raju, M., Astapova, A., Lukner, H., Kogermann, K., & Sepp, T. (2021). Data sharing practices and data availability upon request differ across scientific disciplines. Scientific Data, 8(1), Article 192. https://doi.org/10.1038/s41597-021-00981-0

Waltman, L., & van Eck, N. J. (2012). The inconsistency of the h-index. Journal of the American Society for Information Science and Technology63(2), 406–415. https://doi.org/10.1002/asi.21678

White House. (2021a, January 27). Memorandum on restoring trust in government through scientific integrity and evidence-based policymaking. https://www.whitehouse.gov/briefing-room/presidential-actions/2021/01/27/memorandum-on-restoring-trust-in-government-through-scientific-integrity-and-evidence-based-policymaking/

White House. (2021b, May 10). The White House announces scientific integrity task force formal launch and co-chairs. https://www.whitehouse.gov/ostp/blog/2021/05/10/the-white-house-announces-scientific-integrity-task-force-formal-launch-and-co-chairs/

Wilkinson, M. D., Dumontier, M., Aalbersberg, Ij. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., … Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3(1), Article 160018. https://doi.org/10.1038/sdata.2016.18

Wright, D. (2020, November 3). Open science is good science. ArcNews. https://www.esri.com/about/newsroom/arcnews/open-science-is-good-science/


©2022 Daniella Lowenberg. This article is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the article.

Comments
0
comment
No comments here
Why not start the discussion?