Skip to main content
SearchLoginLogin or Signup

Uses and Reuses of Scientific Data: The Data Creators’ Advantage: Supplementary Materials

Published onNov 15, 2019
Uses and Reuses of Scientific Data: The Data Creators’ Advantage: Supplementary Materials
·
key-enterThis Pub is a Supplement to

The supplementary materials section contains a detailed explanation of the qualitative research methods used in CENS and DataFace, a summary of the findings for each research question, and a full bibliography that includes references not cited in the main text of the article.

1. Research Methods Section

1.1. Ethnographic Observation and Document Analyses

Ethnographic work reported in this article includes observations of activities in laboratories and in field deployments, laboratory and community meetings, and other events. Data practices researchers also observed CENS (Center for Embedded Networked Sensing) and DataFace community members during formal gatherings such as research reviews and retreats, weekly research seminars, and informal gatherings such as discussions within the lab and offices of CENS and DataFace. Throughout our engagement with these communities, we gathered public and private documentation of their work, ranging from publications and websites to equipment specifications, lab notes, and working documentation provided by our research subjects. In both consortia, we conducted about two years of ethnographic observation to understand their data practices before designing and conducting interviews. We returned to our ethnographic notes, memos, and documentation on data reuse issues, using data reported in our publications as a guide. We collected documents at every opportunity, analyzing them in concert with other forms of evidence.

1.2. Semi-structured Interviews

Over the course of our studies of CENS and DataFace, we conducted a total of 127 interviews that touched on topics of data reuse. These qualitative, open-ended interviews ranged from 45 minutes to 2 hours, with an average of 60 minutes per interview. Members of the UCLA Center for Knowledge Infrastructures and its predecessor labs conducted the interviews. All interviews were based on a shared protocol, which enables us to integrate data across these studies. Among the interview questions relevant to this comparative analysis are these: Do you use data you did not generate yourself? To what end do you use data you did not generate yourself? Where do you find these data? How do you find the data? How do you make sense of these data? What kind of analysis do you conduct on others’ data? Please describe how you use others’ data in your daily research or in your research workflows. Taken together, answers to these and similar interview questions provided self-reports that we compared to our observations of practices and reports of research activity in publications, reports, and other documents.

Systematic samples are difficult to achieve in long-term qualitative research. In these studies, we endeavored to interview participants from a broad range of disciplinary affiliations and career stages. For the CENS collaboration, participants were selected from technology research or scientific applications. Researchers in the areas of ecology, biology, marine sciences, seismology, and environmental engineering were classified in the sciences (Wallis, Rolando, & Borgman, 2013). Technology participants include those in computer science, electrical engineering, robotics, and related areas (Borgman, Wallis, & Mayernik, 2012; Mayernik, Wallis, & Borgman, 2013). For the first round of CENS interviews in 2005–2006, 22 participants were selected using stratified random sampling based on whether their research fell within the realm of science or technology (Borgman, Wallis, & Enyedy, 2007). For interviews conducted in 2009–2010, 21 participants were selected using stratified random sampling based on degrees of centrality in a coauthorship network constructed using CENS publications (Borgman, Bowker, Finholt, & Wallis, 2009; Borgman et al., 2012; Pepe, 2010, 2011). For interviews conducted in 2012, seven research projects were selected using stratified random sampling of CENS research groups based on whether they were classified as technical or scientific (Wallis, 2012). For the latter study, 34 authors were interviewed, drawn from one representative publication of each research project.

For the DataFace collaboration, ethnographic fieldwork was conducted at nine of the 11 DataFace sites: one engineering hub, two technology centers, and six laboratories. The engineering hub was responsible for creating a centralized open repository for DataFace data. This hub developed data models, metadata schemas, and a data search engine. One of the technology centers developed the ontological schemas used to describe DataFace data. The other technology center developed the data analysis software. The six laboratories selected for the study varied by disciplinary affiliations, data types produced and consumed, and model organisms used in the experiments. Participants’ disciplinary backgrounds spanned clinical genetics, computer science, engineering, dentistry, plastic surgery, and developmental biology. They produced and reused a variety of datasets, such as 3D facial images (microCT, TIFF), facial measurements, gene expression data and drawings, annotation data on gene functions, RNA-seq, and ChIP-seq data. Interviewees collected data from four animal models: zebrafish, mouse, chimpanzees, and humans.

For each of the nine teams selected, Pasquetto (2018) interviewed the lead scientist, a lab manager, and one or more doctoral students or postdoctoral students. Most of the DataFace teams included no more than five individuals. Ethnography at each lab, averaging 10-day visits, included observing team meetings and spending recreational time with participants. Ethnography of the DataFace consortium also included participating in four annual all-hands meetings, giving presentations, and informal interactions.

1.3. Qualitative Data Analysis

Throughout these studies, interviews were audio-recorded, transcribed, and complemented by the interviewers’ memos on noteworthy topics and themes. Transcription of the CENS interviews from Round 1 (2005–2006) totaled 312 pages, the transcription of the Round 2 (2009–2010) interviews totaled 406 pages, and the transcription of Round 3 (2012) totaled 686 pages. Transcription of the DataFace Interviews (2016–2018) totaled 726 pages. Ethnographic notes and documentation consume several file cabinets and considerable disk space.

Overall, we conducted analytical coding of notes, memos, interviews, and other texts with NVIVO software for qualitative research (QRS International, 2011). As the CENS research on data practices evolved into a longitudinal study, we developed a set of analytical categories for observation, interview protocols, and a codebook. These analytical categories from CENS were the initial basis for studies of DataFace and sites in other sciences. Full details of our data analysis processes are reported in the publications cited throughout the article.

The CENS study used the methods of grounded theory (Glaser & Strauss, 1967) to identify themes and to test them in the full corpus of interview transcript and notes. Prior to our first formal round of interviews discussed in the article, we had already been members and observers of the CENS community for four years (Borgman, Wallis, & Enyedy, 2007). We examined these initial notes, informal interviews, and texts to identify emergent themes, then tested and refined these themes iteratively in the coding of subsequent interviews (Borgman, Wallis, & Enyedy 2007; Wallis et al., 2007). For each round of interviews, we worked with the existing codebook to test and refine themes in coding of subsequent interviews (Mayernik et al., 2013). With each refinement, the remaining interviews were searched for confirming or contradictory evidence. A similar process was employed to analyze DataFace’s interviews and field notes. Full details of methods and analysis for the DataFace study are reported in Pasquetto (2018).

All of these analytical tools and protocols evolved over the 16 years of research reported here. Each new grant had its own focus, and each dissertation addressed specific research questions within the scope of current funding. NVIVO software was upgraded several times, which required data migration. Similarly, computing platforms were upgraded on a regular basis. The advantages of large, long-term, distributed qualitative studies—which are rare—are opportunities for multiple comparisons and for theoretical development. The disadvantages are the changing circumstances of the sites, turnover in research personnel conducting the studies (graduate students and postdoctoral fellows), differing expectations of funding agencies supporting the research program, and changes in technology. We have accommodated these disadvantages by paying close attention to variances within and between individual studies, respecting the power of our sample size, and acknowledging the limitations of the study in our conclusions.

2. Summary of Findings

For the first research question, “Where do scientists find reusable data?” we found that sources varied widely by purpose, project, and individual researcher. At the time of CENS research, from 2002 to 2012, data deposit was required only in genomics and seismology. Few of the other domain areas in CENS had archives of research data on which to draw. However, they did make regular use of databases that contained observations of the natural world, such as the U.S. Geological Survey (USGS), National Oceans and Atmospheres Administration (NOAA), and domain-specific databases such as bird sounds. DataFace, a project that began seven years later (2009), and in a domain with a long history of data archiving, made extensive use of open archives in biomedicine. Scientists in both CENS and DataFace also asked other researchers for access to their data. Sometimes these contacts were identified through publications or presentations; other times through personal knowledge of others’ research.

Our second research question, “How do scientists reuse others’ data?” revealed a continuum of purposes for data reuse, ranging from comparative to integrative. In both CENS and DataFace, uses of external data sources for comparative purposes were by far the most common. Some of their data sources were observations collected and curated expressly for comparative purposes, such as the USGS, NOAA, ClinVar, GenBank, and GWAS (Genome Wide Association Study) databases. These were essential sources of data for ground-truthing, calibration, and comparison. Archives of data deposited by individual researchers, such as OMIM (Online Mendelian Inheritance in Man), also were useful for comparisons. Data and literature searches were often conducted together as background for new studies. At the integrative end of the continuum of data uses, researchers reused data for new analyses, alone or in combination with other datasets. This continuum emerged in both CENS and DataFace, a decade apart, in different research domains.

Our third research question, “How do scientists interpret others’ data?” yielded the most complex findings, as expected. In both CENS and DataFace, researchers generally were able to reuse data from archives for comparative purposes, provided the documentation was sufficient for the particular application. When they wished to integrate data created by others, however, published documentation of data was less likely to suffice for interpretation. In these cases, reusers typically collaborated with data creators to conduct new analyses, test new hypotheses, or combine data from multiple studies.

3. Full Bibliography

The supplemental bibliography includes all references cited in the main text of the article and other background materials.

Alberts, B., Cicerone, R. J., Fienberg, S. E., Kamb, A., McNutt, M., Nerem, R. M., Schekman, R., Shiffrin, R., Stodden, V., Suresh, S., Zuber, M. T., Pope, K. B., & Jamieson, K. H. (2015). Self-correction in science at work. Science, 348(6242), 1420–1422. https://doi.org/10.1126/science.aab3847

Allison, D. B., Brown, A. W., George, B. J., & Kaiser, K. A. (2016). Reproducibility: A tragedy of errors. Nature, 530(7588), 27–29. https://doi.org/10.1038/530027a

Arrow, K. J. (1970). Political and economic evaluation of social effects and externalities. In J. Margolis (Ed.), The analysis of public output (pp. 1–30). Retrieved from https://www.nber.org/chapters/c3349

Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature, 533(7604), 452–454. https://doi.org/10.1038/533452a

Bekaert, J., & Van de Sompel, H. (2006). Augmenting interoperability across scholarly repositories. Retrieved from http://msc.mellon.org/Meetings/Interop/FinalReport

Berman, F., Lavoie, B., Ayris, P., Cohen, E., Courant, P., Dirks, L., … Van Camp, A. (2010). Sustainable economics for a digital planet: Ensuring long-term access to digital information [Final Report of the Blue Ribbon Task Force on Sustainable Digital Preservation and Access]. Retrieved from http://brtf.sdsc.edu/biblio/BRTF_Final_Report.pdf

Bietz, M. J., & Lee, C. P. (2009). Collaboration in metagenomics: Sequence databases and the organization of scientific work. In I. Wagner, H. Tellioğlu, E. Balka, C. Simone, & L. Ciolfi (Eds.), ECSCW 2009 (pp. 243–262). https://doi.org/10.1007/978-1-84882-854-4_15

Borgman, C. L. (2015). Big data, little data, no data: Scholarship in the networked world. Cambridge, MA: MIT Press.

Borgman, C. L., Bowker, G. C., Finholt, T. A., & Wallis, J. C. (2009). Towards a virtual organization for data cyberinfrastructure. In Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries (pp. 353–356). https://doi.org/10.1145/1555400.1555459

Borgman, C. L., Darch, P. T., Sands, A. E., Pasquetto, I. V., Golshan, M. S., Wallis, J. C., & Traweek, S. (2015). Knowledge infrastructures in science: Data, diversity, and digital libraries. International Journal on Digital Libraries, 16(3–4), 207–227. https://doi.org/10.1007/s00799-015-0157-z

Borgman, C. L., Darch, P. T., Sands, A. E., Wallis, J. C., & Traweek, S. (2014). The ups and downs of knowledge infrastructures in science: Implications for data management. In Proceedings of the 2014 IEEE/ACM Joint Conference on Digital Libraries (pp. 257–266). https://doi.org/10.1109/JCDL.2014.6970177

Borgman, C. L., Golshan, M. S., Sands, A. E., Wallis, J. C., Cummings, R. L., Darch, P. T., & Randles, B. M. (2016). Data management in the long tail: Science, software, and service. International Journal of Digital Curation, 11(1), 128–149. https://doi.org/10.2218/ijdc.v11i1.428

Borgman, C. L., Scharnhorst, A., & Golshan, M. S. (2019). Digital data archives as knowledge infrastructures: Mediating data sharing and reuse. Journal of the Association for Information Science and Technology, 70(8), 888–904. https://doi.org/10.1002/asi.24172

Borgman, C. L., Wallis, J. C., & Enyedy, N. D. (2006). Building digital libraries for scientific data: An exploratory study of data practices in habitat ecology. In J. Gonzalo, C. Thanos, M. F. Verdejo, & R. C. Carrasco (Eds.), Lecture Notes in Computer Science: Vol. 4172. ECDL 2006: Research and Advanced Technology for Digital Libraries (pp. 170–183). https://doi.org/10.1007/11863878_15

Borgman, C. L., Wallis, J. C., & Enyedy, N. (2007). Little science confronts the data deluge: Habitat ecology, embedded sensor networks, and digital libraries. International Journal on Digital Libraries, 7(1–2), 17–30. https://doi.org/10.1007/s00799-007-0022-9

Borgman, C. L., Wallis, J. C., & Mayernik, M. S. (2012). Who’s got the data? Interdependencies in science and technology collaborations. Computer Supported Cooperative Work, 21, 485–523. https://doi.org/10.1007/s10606-012-9169-z

Borgman, C. L., Wallis, J. C., Mayernik, M. S., & Pepe, A. (2007). Drowning in data: Digital library architecture to support scientific use of embedded sensor networks. In Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries (pp. 269–277). https://doi.org/10.1145/1255175.1255228

Borgman, C. L., Wofford, M. F., Golshan, M. S., Darch, P. T., & Scroggins, M. J. (2019, In Review). Collaborative ethnography at scale: Reflections on 20 years of data integration. Science and Technology Studies. Retrieved from https://escholarship.org/uc/item/5bb8b1tn

Boscoe, B. M. (2019). From blurry space to a sharper sky: Keeping twenty-three years of astronomical data alive (PhD Dissertation, UCLA). Retrieved from https://escholarship.org/uc/item/2jv941sb

Bowker, G. C. (2000). Biodiversity datadiversity. Social Studies of Science, 30(5), 643–683. https://doi.org/10.1177/030631200030005001

Bowker, G. C. (2005). Memory practices in the sciences. Cambridge, MA: MIT Press.

Bowker, G. C., & Star, S. L. (1999). Sorting things out: Classification and its consequences. Cambridge, MA: MIT Press.

Buneman, P. (2005). Curated databases. In B. Ludäscher & L. Raschid (Eds.), Data Integration in the Life Sciences (pp. 2–2). https://doi.org/10.1007/11530084_2

Center for Open Science. (2019). Center for Open Science: Openness, integrity, and reproducibility. Retrieved from http://centerforopenscience.org/

Chard, K., Gaffney, N., Jones, M. B., Kowalik, K., Ludäscher, B., Nabrzyski, J., … Willis, C. (2019). Implementing computational reproducibility in the whole tale environment. In Proceedings of the 2nd International Workshop on Practical Reproducible Evaluation of Computer Systems (pp. 17–22). https://doi.org/10.1145/3322790.3330594

Collins, H. M., & Evans, R. (2007). Rethinking expertise. Chicago, IL: University of Chicago Press.

Collins, H. M., Evans, R., & Gorman, M. (2007). Trading zones and interactional expertise. Studies in History and Philosophy of Science Part A, 38, 657–666. https://doi.org/10.1016/j.shpsa.2007.09.003

Crick, T., Hall, B., & Ishtiaq, S. (2017). Reproducibility in research: Systems, infrastructure, culture. Journal of Open Research Software, 5(1), 32. https://doi.org/10.5334/jors.73

Culina, A., Crowther, T. W., Ramakers, J. J. C., Gienapp, P., & Visser, M. E. (2018). How to do meta-analysis of open datasets. Nature Ecology & Evolution, 2(7), 1053–1056. https://doi.org/10.1038/s41559-018-0579-2

Curty, R. G., Crowston, K., Specht, A., Grant, B. M., & Dalton, E. D. (2018, March 20). What factors do scientists perceive as promoting or hindering scientific data reuse? Retrieved http://blogs.lse.ac.uk/impactofsocialsciences/2018/03/20/what-factors-do-scientists-perceive-as-promoting-or-hindering-scientific-data-reuse/

Darch, P. T. (2019). The core of the matter: How do scientists judge the trustworthiness of physical samples? Manuscript submitted for publication.

Darch, P. T., & Borgman, C. L. (2016). Ship space to database: Emerging infrastructures for studies of the deep subseafloor biosphere. PeerJ Computer Science, 2, e97. https://doi.org/10.7717/peerj-cs.97

Dervin, B., & Nilan, M. (1986). Information needs and uses. Annual Review of Information Science and Technology, 21, 3–33.

Edwards, P. N. (2010). A vast machine: Computer models, climate data, and the politics of global warming. Cambridge, MA: MIT Press.

Edwards, P. N., Jackson, S. J., Chalmers, M. K., Bowker, G. C., Borgman, C. L., Ribes, D., Burton, M., & Calvert, S. (2013). Knowledge infrastructures: Intellectual frameworks and research challenges. Retrieved from http://hdl.handle.net/2027.42/97552

Edwards, P. N., Mayernik, M. S., Batcheller, A. L., Bowker, G. C., & Borgman, C. L. (2011). Science friction: Data, metadata, and collaboration. Social Studies of Science, 41(5), 667–690. https://doi.org/10.1177/0306312711413314

European Commission High Level Expert Group on Scientific Data. (2010). Riding the wave: How Europe can gain from the rising tide of scientific data [Final report of the High Level Expert Group on Scientific Data. A submission to the European Commission]. Retrieved from https://www.fosteropenscience.eu/content/riding-wave-how-europe-can-gain-rising-tide-scientific-data

Faniel, I. M., & Jacobsen, T. E. (2010). Reusing scientific data: How earthquake engineering researchers assess the reusability of colleagues’ data. Journal of Computer Supported Cooperative Work, 19(3–4), 355–375. https://doi.org/10.1007/s10606-010-9117-8

Faniel, I. M., & Yakel, E. (2017). Practices do not make perfect: Disciplinary data sharing and reuse practices and their implications for repository data curation. In L. R. Johnston (Ed.), Curating research data, Volume One: Practical strategies for your digital repository (pp. 103–126). Retrieved from http://www.oclc.org/research/publications/2017/practices-do-not-make-perfect.html

Federer, L. M. (2019). Who, what, when, where, and why? Quantifying and understanding biomedical data reuse (PhD dissertation, University of Maryland). Retrieved from https://drum.lib.umd.edu/handle/1903/21991

Federer, L. M., Lu, Y.-L., Joubert, D. J., Welsh, J., & Brandys, B. (2015). Biomedical data sharing and reuse: Attitudes and practices of clinical and scientific research staff. PLoS ONE, 10(6), Article e0129506. https://doi.org/10.1371/journal.pone.0129506

Feger, S. S., Dallmeier-Tiessen, S., Woźniak, PawełW., & Schmidt, A. (2019). The role of HCI in reproducible science: Understanding, supporting and motivating core practices. Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems, LBW0246:1–LBW0246:6. https://doi.org/10.1145/3290607.3312905

Galison, P. (1997). Image and logic: A material culture of microphysics. Chicago, IL: University of Chicago Press.

Gitelman, L. (Ed.). (2013). “Raw data” is an oxymoron. Cambridge, MA: MIT Press.

Glaser, B. G., & Strauss, A. L. (1967). The discovery of grounded theory: Strategies for qualitative research. Chicago: Aldine Pub. Co.

Hamilton, M. P., Graham, E. A., Rundel, P. W., Allen, M. F., Kaiser, W., Hansen, M. H., & Estrin, D. L. (2007). New approaches in embedded networked sensing for terrestrial ecological observatories. Environmental Engineering Science, 24(2), 192–204. https://doi.org/10.1089/ees.2006.0045

Hanson, B., Sugden, A., & Alberts, B. (2011). Making data maximally available. Science, 331(6018), 649. https://doi.org/10.1126/science.1203354

Hardwig, J. (1985). Epistemic dependence. Journal of Philosophy, 82(7), 335–349. https://doi.org/jphil198582747

Hardwig, J. (1991). The role of trust in knowledge. Journal of Philosophy, 88(12), 693–708. https://doi.org/jphil199188121

Hess, C., & Ostrom, E. (2007). Understanding knowledge as a commons: From theory to practice. Cambridge, MA: MIT Press.

Hilgartner, S., & Brandt-Rauf, S. I. (1994). Data access, ownership, and control: Toward empirical studies of access practices. Science Communication, 15(4), 355–372. https://doi.org/10.1177/107554709401500401

Hutson, M. (2018). Artificial intelligence faces reproducibility crisis. Science, 359(6377), 725–726. https://doi.org/10.1126/science.359.6377.725

Incorporated Research Institutions for Seismology (IRIS). (2018). Retrieved August 10, 2018, from https://www.iris.edu/hq/

Jackson, S. J., Ribes, D., Buyuktur, A., & Bowker, G. C. (2011). Collaborative rhythm: Temporal dissonance and alignment in collaborative scientific work. Proceedings of the ACM 2011 Conference on Computer Supported Cooperative Work, 245–254. https://doi.org/10.1145/1958824.1958861

Jirotka, M., Procter, R., Hartswood, M., Slack, R., Simpson, A., Coopmans, C., … Voss, A. (2005). Collaboration and trust in healthcare innovation: The eDiaMoND case study. Computer Supported Cooperative Work (CSCW), 14(4), 369–398. https://doi.org/10.1007/s10606-005-9001-0

Jones, M. B., Schildhauer, M. P., Reichman, O. J., & Bowers, S. (2006). The new bioinformatics: Integrating ecological data from the gene to the biosphere. Annual Review of Ecology, Evolution, and Systematics, 37, 519–544. https://doi.org/10.1146/annurev.ecolsys.37.091305.110031

Kahnemann, D., Slovic, P., & Tversky, A. (1982). Judgment under incertainty: Heuristic and biases. Cambridge, UK: Cambridge University Press.

Karasti, H., & Blomberg, J. (2017). Studying infrastructuring ethnographically. Computer Supported Cooperative Work (CSCW) 27(2), 233–265. https://doi.org/10.1007/s10606-017-9296-7

Kelty, C. M. (2012). This is not an article: Model organism newsletters and the question of ‘open science.’ BioSocieties, 7(2), 140–168. https://doi.org/10.1057/biosoc.2012.8

Kohler, R. E. (1994). Lords of the fly: Drosophila genetics and the experimental life. Chicago, IL: University of Chicago Press.

Latour, B. (1987). Science in action: How to follow scientists and engineers through society. Cambridge, MA: Harvard University Press.

Latour, B., & Woolgar, S. (1979). Laboratory life: The social construction of scientific facts. Beverly Hills, CA: SAGE.

Latour, B., & Woolgar, S. (1986). Laboratory life: The construction of scientific facts (2nd ed.). Princeton, NJ: Princeton University Press.

Leonelli, S. (2010). Packaging small facts for re-use: Databases in model organism biology. In M. Morgan & P. Howlett (Eds.), How well do facts travel? The dissemination of reliable knowledge. Cambridge, UK: Cambridge University Press (pp. 325–348).

Leonelli, S. (2013). Integrating data to acquire new knowledge: Three modes of integration in plant science. Studies in History and Philosophy of Science Part C: Studies in History and Philosophy of Biological and Biomedical Sciences, 44(4), 503–514. https://doi.org/10.1016/j.shpsc.2013.03.020

Leonelli, S. (2015). What counts as scientific data? A relational framework. Philosophy of Science, 82(5), 810–821. https://doi.org/10.1086/684083

Leonelli, S. (2016). Data-centric biology: A philosophical study. Chicago, IL: University of Chicago Press.

Longo, D. L., & Drazen, J. M. (2016). Data sharing. New England Journal of Medicine, 374, 276–277. https://doi.org/10.1056/NEJMe1516564

Loukissas, Y. A. (2019). All data are local: Thinking critically in a data-driven society. Cambridge, MA: The MIT Press.

Madison, M. J. (2014). Commons at the intersection of peer production, citizen science, and big data: Galaxy zoo. arXiv. https://doi.org/10.48550/arXiv.1409.4296

Mandell, R. A. (2012). Researchers’ attitudes towards data discovery: Implications for a UCLA data registry. Libraries in the Digital Age (LIDA) Proceedings, 12. Retrieved from https://escholarship.org/uc/item/5bv8j7g3

Mayernik, M. S. (2011). Metadata realities for cyberinfrastructure: Data authors as metadata creators (PhD Dissertation, UCLA). http://doi.org/10.2139/ssrn.2042653

Mayernik, M. S. (2016). Research data and metadata curation as institutional issues. Journal of the Association for Information Science and Technology, 67(4), 973–993. https://doi.org/10.1002/asi.23425

Mayernik, M. S., & Acker, A. (2017). Tracing the traces: The critical role of metadata within networked communications. Journal of the Association for Information Science and Technology, 69(1), 177–180. https://doi.org/10.1002/asi.23927

Mayernik, M. S., Batcheller, A. L., & Borgman, C. L. (2011). How institutional factors influence the creation of scientific metadata. Proceedings of the 2011 IConference, 417–425. https://doi.org/10.1145/1940761.1940818

Mayernik, M. S., Wallis, J. C., & Borgman, C. L. (2013). Unearthing the Infrastructure: Humans and sensors in field-based research. Computer Supported Cooperative Work, 22(1), 65–101. https://doi.org/10.1007/s10606-012-9178-y

Mayernik, M. S., Wallis, J. C., Borgman, C. L., & Pepe, A. (2007). Adding context to content: The CENS deployment center. Annual Meeting of the American Society for Information Science & Technology, 44(1), 1–7. https://doi.org/10.1002/meet.1450440388

Mayol, T. (2016, September 10). The man at the cutting edge of medicine and big data. OZY. Retrieved from http://www.ozy.com/rising-stars/the-doctor-who-wants-you-to-be-a-research-parasite/68411

McNutt, M. (2014). Reproducibility. Science, 343(6168), 229. https://doi.org/10.1126/science.1250475

Meng, X.-L. (1994). Multiple-imputation inferences with uncongenial sources of input. Statistical Science, 9(4), 538–558. https://doi.org/10.1214/ss/1177010269

Mirowski, P. (2018). The future(s) of open science. Social Studies of Science, 48(2), 171–203. https://doi.org/10.1177/0306312718772086

Mirowski, P., & Nik-Khah, E. (2017). The knowledge we have lost in information: The history of information in modern economics. Oxford, UK: Oxford University Press.

Mosconi, G., Li, Q., Randall, D., Karasti, H., Tolmie, P., Barutzky, J., … Pipek, V. (2019). Three gaps in opening science. Computer Supported Cooperative Work (CSCW), 28(3–4), 749–789. https://doi.org/10.1007/s10606-019-09354-z

National Academies of Sciences, Engineering, and Medicine (2019). Reproducibility and Replicability in Science. The National Academies Press. https://doi.org/10.17226/25303

National Research Council, Committee on Issues in the Transborder Flow of Scientific Data. (1997). Bits of power: Issues in global access to scientific data. Retrieved from http://www.nap.edu/openbook.php?record_id=5504

Networking and Information Technology Research and Development. (2009). Harnessing the power of digital data for science and society. Report of the Interagency Working Group on Digital Data to the Committee on Science of the National Science and Technology Council. Retrieved from https://www.nitrd.gov/about/harnessing_power_web.pdf

Newell, A., & Simon, H. A. (1972). Human problem solving. Englewood Cliffs, NJ: Prentice-Hall.

Organisation for Economic Co-operation and Development. (2007). OECD principles and guidelines for access to research data from public funding. Retrieved from http://www.oecd.org/dataoecd/9/61/38500813.pdf

Page, M. J., Altman, D. G., Shamseer, L., McKenzie, J. E., Ahmadzai, N., Wolfe, D., … Moher, D. (2018). Reproducible research practices are underused in systematic reviews of biomedical interventions. Journal of Clinical Epidemiology, 94, 8–18. https://doi.org/10.1016/j.jclinepi.2017.10.017

Paisley, W. J. (1980). Information and work. In B. Dervin & M. J. Voigt (Eds.), Progress in the communication sciences (Vol. 2, pp. 114–165). Norwood, NJ: Ablex.

Pasquetto, I. V. (2018). From open data to knowledge production: Biomedical data sharing and unpredictable data reuses (PhD dissertation, UCLA). Retrieved from https://escholarship.org/uc/item/1sx7v77r

Pasquetto, I. V., Randles, B. M., & Borgman, C. L. (2017). On the reuse of scientific data. Data Science Journal, 16, 1–9. https://doi.org/10.5334/dsj-2017-008

Peer, L., Green, A., & Stephenson, E. (2014). Committing to data quality review. International Journal of Digital Curation, 9(1), 263–291. https://doi.org/10.2218/ijdc.v9i1.317

Pepe, A. (2010). Structure and evolution of scientific collaboration networks in a modern research collaboratory (PhD dissertation, UCLA). Retrieved from http://doi.org/10.2139/ssrn.1616935

Pepe, A. (2011). The relationship between acquaintanceship and coauthorship in scientific collaboration networks. Journal of the American Society for Information Science and Technology, 62(11), 2121–2132. https://doi.org/10.1002/asi.21629

Pepe, A., Borgman, C. L., Wallis, J. C., & Mayernik, M. S. (2007). Knitting a fabric of sensor data resources. Proceedings of the 2007 ACM IEEE International Conference on Information Processing in Sensor Networks. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.596.2608

Pepe, A., Mayernik, M. S., Borgman, C. L., & Van de Sompel, H. (2010). From artifacts to aggregations: Modeling scientific life cycles on the semantic Web. Journal of the American Society for Information Science and Technology, 61(3), 567–582. https://doi.org/10.1002/asi.21263

Polanyi, M. (1966). The tacit dimension. Garden City, NY: Doubleday.

Porter, T. M. (1996). Trust in numbers: The pursuit of objectivity in science and public life. Princeton, NJ: Princeton University Press.

Prieto, A. G. (2009). From conceptual to perceptual reality: Trust in digital repositories. Library Review, 58(8), 593-606. https://doi.org/10.1108/00242530910987082

QRS International. (2011). NVivo 9 research software for analysis and insight. Retrieved May 6, 2011, from NVivo website: http://www.qsrinternational.com/products_nvivo.aspx

Rheinberger, H.-J. (1997). Toward a history of epistemic things: Synthesizing proteins in the test tube. Stanford, CA: Stanford University Press.

Ross, S., & McHugh, A. (2006). The role of evidence in establishing trust in repositories. D-Lib Magazine, 12(7/8). https://doi.org/10.1045/july2006-ross

Rung, J., & Brazma, A. (2012). Reuse of public genome-wide gene expression data. Nature Reviews Genetics, 14(2), 89–99. https://doi.org/10.1038/nrg3394

Ryle, G. (1949). The concept of mind. London, UK: Hutchinson.

Sands, A. E. (2017). Managing astronomy research data: Data practices in the Sloan Digital Sky Survey and Large Synoptic Survey Telescope Projects (PhD dissertation, UCLA). Retrieved from http://escholarship.org/uc/item/80p1w0pm

Schmidt, K. (2012). The trouble with "tacit knowledge." Computer Supported Cooperative Work (CSCW), 21(2–3), 163–225. https://doi.org/10.1007/s10606-012-9160-8

Shapin, S. (1994). A social history of truth: Civility and science in seventeenth-century England. Chicago, IL: University of Chicago Press.

Star, S. L., Bowker, G. C., & Neumann, L. J. (2003). Transparency beyond the individual level of scale: Convergence between information artifacts and communities of practice. In A. Bishop, N. A. Van House, & B. P. Buttenfield (Eds.), Digital library use: Social practice in design and evaluation (pp. 241–270). Cambridge, MA: MIT Press.

Star, S. L., & Griesemer, J. (1989). Institutional ecology, “translations,” and boundary objects: Amateurs and professionals in Berkeley’s Museum of Vertebrate Zoology, 1907–1939. Social Studies of Science, 19(3), 387–420. https://doi.org/10.1177/030631289019003001

Star, S. L., & Ruhleder, K. (1996). Steps toward an ecology of infrastructure: Design and access for large information spaces. Information Systems Research, 7(1), 111–134. https://doi.org/10.1287/isre.7.1.111

Stodden, V. (2015). Reproducing statistical results. Annual Review of Statistics and Its Application, 2(1), 1–19. https://doi.org/10.1146/annurev-statistics-010814-020127

Strauss, A., & Corbin, J. M. (1998). Basics of qualitative research: Techniques and procedures for developing grounded theory. Thousand Oaks, CA: SAGE.

Stubailo, I., Lukac, M., Mayernik, M., Foote, E., Guy, R., Davis, P., … Husker, A. (2009, May). Subduction zone seismic experiment in Peru: Results from a wireless seismic network. Presented at the Center for Embedded Networked Sensing. Retrieved from https://escholarship.org/uc/item/5dk8r03w

Tenopir, C., Allard, S., Douglass, K., Aydinoglu, A. U., Wu, L., Read, E., … Frame, M. (2011). Data sharing by scientists: Practices and perceptions. PLoS ONE, 6(6), Article e21101. https://doi.org/10.1371/journal.pone.0021101

Tenopir, C., Dalton, E. D., Allard, S., Frame, M., Pjesivac, I., Birch, B., Pollock, D., & Dorsett, K. (2015). Changes in data sharing and data reuse practices and perceptions among scientists worldwide. PLOS ONE, 10(8), Article e0134826. https://doi.org/10.1371/journal.pone.0134826

Thompson, E. P. (1971). The moral economy of the English crowd in the eighteenth century. Past & Present, 50(10), 76–136. https://doi.org/10.1093/past/50.1.76

UCLA Center for Knowledge Infrastructures. (2018). Home page. Retrieved from https://knowledgeinfrastructures.gseis.ucla.edu/

University of California eScholarship Repository. (2011). Center for Embedded Network Sensing. Retrieved from https://escholarship.org/uc/cens

U. S. National Science Board. (2005). Long-lived digital data collections: Enabling research and education in the 21st century (No. US NSF-NSB-05-40). Retrieved from https://www.nsf.gov/pubs/2005/nsb0540/

Vasilevsky, N. A., Minnier, J., Haendel, M. A., & Champieux, R. E. (2017). Reproducible and reusable research: Are journal data sharing policies meeting the mark? PeerJ, 5, Article e3208. https://doi.org/10.7717/peerj.3208

Wallis, J. C. (2012). The distribution of data management responsibility within scientific research groups (PhD Dissertation, UCLA). https://doi.org/10.2139/ssrn.2269079

Wallis, J. C., & Borgman, C. L. (2011). Who is responsible for data? An exploratory study of data authorship, ownership, and responsibility. Annual Meeting of the American Society for Information Science & Technology, 48(1), 1–10. https://doi.org/10.1002/meet.2011.14504801188

Wallis, J. C., Borgman, C. L., Mayernik, M. S., & Pepe, A. (2008). Moving archival practices upstream: An exploration of the life cycle of ecological sensing data in collaborative field research. International Journal of Digital Curation, 3(1), 114–126. https://doi.org/10.2218/ijdc.v3i1.46

Wallis, J. C., Borgman, C. L., Mayernik, M. S., Pepe, A., Ramanathan, N., & Hansen, M. A. (2007). Know thy sensor: Trust, data quality, and data integrity in scientific digital libraries. In L. Kovács, N. Fuhr, & C. Meghini (Eds.), Lecture Notes in Computer Science: Vol. 4675. Research and Advanced Technology for Digital Libraries (pp. 380–391). https://doi.org/10.1007/978-3-540-74851-9_32

Wallis, J. C., Mayernik, M. S., Borgman, C. L., & Pepe, A. (2010). Digital libraries for scientific data discovery and reuse: From vision to practical reality. In Proceedings of the 10th Annual Joint Conference on Digital Libraries (pp. 333–340). https://doi.org/10.1145/1816123.1816173

Wallis, J. C., Pepe, A., Mayernik, M. S., & Borgman, C. L. (2008). An exploration of the life cycle of eScience collaboratory data. Presented at the iConference 2008: iFutures: Systems, Selves, Society, Los Angeles, CA. Retrieved from https://www.ideals.illinois.edu/handle/2142/15122

Wallis, J. C., Rolando, E., & Borgman, C. L. (2013). If we share data, will anyone use them? Data sharing and reuse in the long tail of science and technology. PLOS ONE, 8(7), Article e67332. https://doi.org/10.1371/journal.pone.0067332

Wilholt, T. (2013). Epistemic trust in science. British Journal for the Philosophy of Science, 64(2), 233–253. https://doi.org/10.1093/bjps/axs007

Wilkinson, M. D., Dumontier, M., Aalbersberg, Ij. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-M., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., . . . & Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, Article 160018. http://doi.org/10.1038/sdata.2016.18

Yakel, E., Faniel, I. M., Kriesberg, A., & Yoon, A. (2013). Trust in digital repositories. International Journal of Digital Curation, 8(1), 143–156. https://doi.org/10.2218/ijdc.v8i1.251

Zimmerman, A. S. (2008). New knowledge from old data: The role of standards in the sharing and reuse of ecological data. Science, Technology & Human Values, 33(5), 631–652. https://doi.org/10.1177/0162243907306704


Disclosure Statement

Irene V. Pasquetto, Christine L. Borgman, and Morgan F. Wofford have no financial or non-financial disclosures to share for this article.


©2019 Irene V. Pasquetto, Christine L. Borgman, and Morgan F. Wofford. This supplement is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the supplement.

Comments
0
comment
No comments here
Why not start the discussion?