
This study investigates how participants locate and evaluate data they do not create themselves.
The survey consists of three main sections:
• Part 1: Data Needs
• Part 2: Finding Data
• Part 3: Evaluating Data
Our funding comes from the Netherlands Organization for Scientific Research (NWO). The study is part of a collaborative research project between researchers at the Data Archiving and Networked Services (DANS), the University of Amsterdam, the Vrije Universiteit Amsterdam and Elsevier.
By clicking on the below button to start the survey, you indicate your consent to participate in this research. You can read more about the survey and what will be done with the data here (this will launch a new window).
Thank you for your participation.
Please click >> button to indicate consent to participate and to begin the survey.
Q1: Which of the following best describes you?
Please select one answer
Researcher
Student
Librarian, archivist or research/data support provider
Manager
Other. Please specify ____________
Q2: Please describe the secondary data that you (might) need. (We define secondary data as data that you do not create yourself).
Please write your answer in the box below:
|
Q3: Please select the options that describe the secondary data that you (might) need.
Please select all that apply
Observational or empirical (e.g. sensor data, survey data, interview transcripts, sample data, neuroimages, ethnographic data, diaries)
Experimental (e.g. gene sequences, chromatograms, toroid magnetic field data)
Simulation (e.g. climate models, economic models)
Derived or compiled (e.g. text and data mining, compiled database, 3D models)
Other, Please specify ____________
Q4: Why do you use or need secondary data?
Please select all that apply
As the basis for a new study
To calibrate instruments or models
For benchmarking
To verify my own data
As model, algorithm or system inputs
To generate new ideas
For teaching/training
To prepare for a new project or proposal
To experiment with new methods and techniques (e.g. to develop data science skills)
To identify trends or make predictions
To compare multiple datasets to find commonalities or differences
To create summaries, visualizations, or analysis tools
To integrate with other data to create a new dataset
Other. Please specify ____________
Q5: Have you ever used data outside of your area of expertise?
Please select one answer
Yes
No
Q5a: How did you find this data?
Please write your answer in the box below:
|
Q6: When you need data, who finds it for you?
Please select all that apply
I find it myself
Graduate student
Research support professional (e.g. librarian, archivist, data or literature manager)
Someone else in my personal network (e.g. peers, collaborators, mentors)
Other. Please specify ____________
Q7: How frequently do you use the following to find data?
Please select one answer per row
| Often | Occasionally | Never |
Multidisciplinary data repositories | |||
Discipline-specific data repositories | |||
Governmental agencies and websites | |||
Personal networks (e.g. colleagues, peers) | |||
Academic literature (e.g. journal articles, conference proceedings | |||
Code repository (e.g. GitHub) | |||
General search engines (e.g. Google) | |||
Professional associations | |||
Data specific search engines | |||
Commercial sources | |||
Consultation with research support professionals (e.g. librarians, archivists or data managers) |
Q7_open: Please specify any other resources that you use to find data:
Please write your answer in the box below:
|
Q7a: Which statement(s) describe how you discover data using the academic literature?
Please select all that apply
I search the academic literature with the goal of finding data.
I find data serendipitously while reading articles or performing literature searches.
I follow citations and references in the literature to datasets.
I extract and use data from the literature directly (e.g. from tables, graphs, or instrument specifications and parameters)
Other. Please specify ____________
Q7b How successful are you at finding data with a general search engine (e.g. Google)?
Please select one answer
Very successful
Successful
Sometime successful, sometimes not
Rarely successful
Not successful
Q8: How frequently do you find data in the following ways?
Please select one answer per row
| Often | Occasionally | Never |
By actively searching for data in an online resource | |||
Serendipitously, when searching for something else (e.g. when looking for journal articles or news) | |||
Serendipitously, when NOT actively looking for something else (e.g. via an email notice or interaction with a colleague) | |||
In the course of sharing or managing my own data |
Q9: Please indicate if you use the following to discover, access, or make sense of data.
Please select all that apply
| Q10a - Discover | Q10b - Access | Q10c - Making sense of data |
Conversations with personal networks (e.g. colleagues, peers) | |||
Contacting the data creator | |||
Developing new academic collaborations with data creators | |||
Attending conferences | |||
Disciplinary mailing lists or discussion forums |
Q10: Do you discover data differently than how you discover academic literature?
Please select one answer
Yes
Sometimes
No
Q10a: How is your process for finding data different than your process for finding academic literature?
Please write your answer in the box below:
|
Q11: How easy is it to find data?
Please select one answer
Easy
Sometimes challenging
Difficult
Q11a: Why is it challenging to find the data that you need?
Please select all that apply
The data are not accessible (e.g. behind paywalls, held by industry).
I don't know where or how to best look for the data.
The data are located in many different places.
The data are not digital.
Online search tools are inadequate.
I do not have the personal network needed to find or access the data.
Other. Please specify ____________
Q12: Please indicate the importance of the following information when deciding whether or not to use secondary data.
Please select one answer per row
| Extremely important | Important | Somewhat important | Less important | Not important |
Data collection conditions and methodology | |||||
How data has been processed and handled | |||||
Reputation of data creator | |||||
Personally knowing the data creator | |||||
Reputation of data source (e.g. repository or journal) | |||||
Detailed and complete metadata and documentation | |||||
Data size | |||||
Data format | |||||
Licensing/copyright conditions | |||||
Correct coverage (time, location, population, etc.) | |||||
Original purpose of the data | |||||
Ease of access | |||||
Topic relevance |
Q12_open: Please specify any other information you consider when deciding whether to use or not secondary data.
|
Q13: How important are the following strategies in evaluating and making sense of data?
Please select one answer per row
| Extremely important | Important | Somewhat important | Less important | Not important |
Consulting associated journal articles | |||||
Consulting data documentation and codebooks | |||||
Consulting the data creator | |||||
Consulting personal networks (e.g. colleagues, peers) | |||||
Exploratory data analysis (e.g. statistical checks, graphical analysis) |
Q13_open: Please specify any other strategies you consider to evaluate and make sense of data.
Please write your answer in the box below:
|
Q14: Please indicate the importance of the following in helping you to establish trust in secondary data.
Please select one answer per row
| Extremely important | Important | Somewhat important | Less important | Not important |
Others' prior usage of the data | |||||
Reputation of source (e.g. repository, journal) | |||||
Reputation of data creator | |||||
Transparency in data collection methods | |||||
Lack of errors | |||||
Ease of access | |||||
Personal relationship with the data creator |
Q14_open: Please specify any other important aspects you consider to help establish trust in secondary data.
Please write your answer in the box below:
|
Q15: Please indicate the importance of the following in helping you to establish the quality of secondary data.
Please select one answer per row
| Extremely important | Important | Somewhat important | Less important | Not important |
Lack of errors | |||||
Ease of downloading and exploring data | |||||
Data size | |||||
Data completeness | |||||
Reputation of source (e.g. repository, journal) | |||||
Resolution or clarity | |||||
Reputation of data creator | |||||
Detail or amount of work done to prepare data | |||||
Consistency of formatting |
Q15_open: Please specify any other important aspects you consider to help establish the quality of secondary data.
Please write your answer in the box below:
|
You are nearly at the end of the survey. Below are some questions to help us classify your answers.
D1: In which subject discipline do you specialize?
Please check all that apply.
|
|
D2: How many years of professional experience do you have in your field?
Please select one answer
0-5
6-15
16-30
31+
D3: In which county do you currently work?
Afghanistan
Albania
Algeria
American Samoa
Andorra
Angola
Anguilla
Antarctica
Antigua and Barbuda
Argentina
Armenia
Aruba
Australia
Austria
Azerbaijan
Bahamas
Bahrain
Bangladesh
Barbados
Belarus
Belgium
Belize
Benin
Bermuda
Bhutan
Bolivia
Bosnia and Herzegovina
Botswana
Brazil
British Indian Ocean Territory
Brunei
Brunei Darussalam
Bulgaria
Burkina Faso
Burundi
Cambodia
Cameroon
Canada
Cape Verde
Cayman Islands
Central African Republic
Chad
Chile
China
Christmas Island
Cocos (Keeling) Islands
Colombia
Comoros
Congo
Cook Islands
Costa Rica
Cote d'Ivoire
Croatia
Cuba
Cyprus
Czech Republic
Denmark
Djibouti
Dominica
Dominican Republic
East Timor
Ecuador
Egypt
El Salvador
Equatorial Guinea
Eritrea
Estonia
Ethiopia
Falkland Islands (Malvinas)
Fiji
Finland
France
French Guiana
French Polynesia
French Southern Territories
Gambia
Georgia
Germany
Ghana
Gibraltar
Greece
Greenland
Grenada
Guadeloupe
Guam
Guatemala
Guinea-Bissau
Haiti
Heard Island and McDonald Islands
Holy See (Vatican City State)
Honduras
Hong Kong
Hungary
Iceland
India
Indonesia
Iran (Islamic Republic of)
Iraq
Ireland
Israel
Italy
Jamaica
Japan
Jordan
Kazakhstan
Kenya
Kiribati
North Korea
Kuwait
Kyrgyzstan
Lao People's Democratic Republic
Laos
Latvia
Lebanon
Lesotho
Liberia
Libyan Arab Jamahiriya
Lithuania
Luxembourg
Macau
Madagascar
Malawi
Malaysia
Maldives
Mali
Malta
Martinique
Mauritania
Mauritius
Mexico
Micronesia (Federated States of)
Monaco
Mongolia
Montserrat
Morocco
Mozambique
Myanmar
Namibia
Nauru
Nepal
Netherlands
Netherlands Antilles
New Caledonia
New Zealand
Nicaragua
Niger
Nigeria
Niue
Norfolk Island
Norway
Oman
Pakistan
Palau
Panama
Papua New Guinea
Paraguay
Peru
Philippines
Pitcairn
Poland
Portugal
Puerto Rico
Qatar
Reunion
Romania
RUSSIA
Rwanda
Saint Helena
Saint Kitts and Nevis
Saint Lucia
Saint Vincent and the Grenadines
Samoa
Sao Tome and Principe
Saudi Arabia
Senegal
Serbia and Montenegro
Seychelles
Sierra Leone
Singapore
Slovakia
Slovenia
Solomon Islands
Somalia
South Africa
South Korea
Spain
Sri Lanka
Sudan
Suriname
Swaziland
Sweden
Switzerland
Syrian Arab Republic
Taiwan
Tajikistan
TANZANIA
Thailand
Togo
Tonga
Trinidad and Tobago
Tunisia
Turkey
Turkmenistan
Turks and Caicos Islands
Uganda
Ukraine
United Arab Emirates
United Kingdom
United States Minor Outlying Islands
Uruguay
USA
Uzbekistan
Vanuatu
Venezuela
Viet Nam
Virgin Islands
Virgin Islands (US)
Virgin Islands, British
Wallis and Futuna
Yemen
Zambia
Zimbabwe
Palestinian Territory, Occupied
Moldova, Republic of
Marshall Islands
Macedonia, The Former Yugoslav Republic of
Liechtenstein
Korea, Republic of
Guyana
Guinea
Gabon
Faroe Islands
Zanzibar
Tokelau
D4: What type of organization do you work for?
Please select one answer
University or college
Research institution
Government agency
Corporate
Independent archive or library
m Other. Please specify ____________
D5: Please indicate how the following people feel about sharing their research data.
Please select one answer per row
| Data sharing is strongly encouraged | Data sharing is somewhat encouraged | Data sharing is neither encouraged nor discouraged | Data sharing is somewhat discouraged | Data sharing is strongly discouraged | Don't know/ Not applicable |
You | ||||||
The people you work with directly | ||||||
Your disciplinary community | ||||||
Your institution |
D6: Please indicate how the following people feel about reusing data produced by other people.
Please select one answer per row
| Data reusing is strongly encouraged | Data reusing is somewhat encouraged | Data reusing is neither encouraged nor discouraged | Data reusing is somewhat discouraged | Data reusing is strongly discouraged | Don't know/ Not applicable |
You | ||||||
The people you work with directly | ||||||
Your disciplinary community | ||||||
Your institution |
D7: Have you ever shared your own research data?
Please select one answer
Yes
No
D8: Final comments: Do you have anything else that you would like us to know?
Please write your comments in the box below:
|
Additional questions asked to participants selecting “Librarian, archivist or research/data support provider” as their role.
L3: Do you use or need secondary data for your own research or to support others?
Please select one answer
For my own research
To support others
For both my own research and to support others
L4: Who are the people whom you support?
Please select all that apply
Students
Researchers
Industry employees
Other. Please specify ____________
L5: How do you support people with their data needs?
Please select all that apply
I teach people about data management planning (e.g. through consultations, workshops, etc.).
I teach people how to discover and evaluate data (e.g. through consultations, workshops, etc.).
I find data for people.
I help people to curate their data.
I find literature for people.
Other. Please specify ____________
Note. Significance was determined at the p < .05 level with a Bonferroni correction with m = 155. Significant associations are marked with an asterisk and colored in blue.
Note. Significance was determined at the p < .05 level with a Bonferroni correction with m = 70. Significant associations are marked with an asterisk and colored in blue. “Other” options are not shown as there were no significant associations present.
Note. Significance was determined at the p < .05 level with a Bonferroni correction with m = 196. Significant associations are marked with an asterisk and colored in blue. “Other” options are not shown as there were no significant associations present; duplicate values were removed.
Note. Significance was determined at the p < .05 level with a Bonferroni correction with m = 434. Significant associations are marked with an asterisk and colored in blue. “Other” options are not shown as there were no significant associations present.
Note. Significance was determined at the p < .05 level with a Bonferroni correction with m = 196. Significant associations are marked with an asterisk and colored in blue. “Other” options are not shown as there were no significant associations present.
Figure C1. Sources used in the disciplinary subset for respondents selecting only one discipline. Percents are percent respondents. Arts & humanities (n = 43); astronomy (n=14); biological science (n = 46); computer science (n = 57); earth & planetary science (n = 24); engineering & technology (n = 80); environmental science (n = 22); medicine (n = 91); physics (n = 42); social science (n = 81).
The article associated with this supplement is part of the project Re-SEARCH: Contextual Search for Research Data and was funded by the NWO Grant 652.001.002
©2020 Kathleen Gregory, Paul Groth, Andrea Scharnhorst, and Sally Wyatt. This supplement is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the supplement.