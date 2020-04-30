Appendix A

Survey questionnaire

Introduction

This study investigates how participants locate and evaluate data they do not create themselves.

The survey consists of three main sections:

• Part 1: Data Needs

• Part 2: Finding Data

• Part 3: Evaluating Data



Our funding comes from the Netherlands Organization for Scientific Research (NWO). The study is part of a collaborative research project between researchers at the Data Archiving and Networked Services (DANS), the University of Amsterdam, the Vrije Universiteit Amsterdam and Elsevier.

By clicking on the below button to start the survey, you indicate your consent to participate in this research. You can read more about the survey and what will be done with the data here (this will launch a new window).

Thank you for your participation.

Please click >> button to indicate consent to participate and to begin the survey.

Survey Questions

Part 1: Data Needs

Q1: Which of the following best describes you?

Please select one answer

Researcher

Student

Librarian, archivist or research/data support provider

Manager

Other. Please specify ____________

Q2: Please describe the secondary data that you (might) need. (We define secondary data as data that you do not create yourself).

Please write your answer in the box below:

Q3: Please select the options that describe the secondary data that you (might) need.

Please select all that apply

Observational or empirical (e.g. sensor data, survey data, interview transcripts, sample data, neuroimages, ethnographic data, diaries)

Experimental (e.g. gene sequences, chromatograms, toroid magnetic field data)

Simulation (e.g. climate models, economic models)

Derived or compiled (e.g. text and data mining, compiled database, 3D models)

Other, Please specify ____________

Q4: Why do you use or need secondary data?

Please select all that apply

As the basis for a new study

To calibrate instruments or models

For benchmarking

To verify my own data

As model, algorithm or system inputs

To generate new ideas

For teaching/training

To prepare for a new project or proposal

To experiment with new methods and techniques (e.g. to develop data science skills)

To identify trends or make predictions

To compare multiple datasets to find commonalities or differences

To create summaries, visualizations, or analysis tools

To integrate with other data to create a new dataset

Other. Please specify ____________

Q5: Have you ever used data outside of your area of expertise?

Please select one answer

Yes

No

Q5a: How did you find this data?

Please write your answer in the box below:

Part 2: Finding Data

Q6: When you need data, who finds it for you?

Please select all that apply

I find it myself

Graduate student

Research support professional (e.g. librarian, archivist, data or literature manager)

Someone else in my personal network (e.g. peers, collaborators, mentors)

Other. Please specify ____________

Q7: How frequently do you use the following to find data?

Please select one answer per row

Often Occasionally Never Multidisciplinary data repositories Discipline-specific data repositories Governmental agencies and websites Personal networks (e.g. colleagues, peers) Academic literature (e.g. journal articles, conference proceedings Code repository (e.g. GitHub) General search engines (e.g. Google) Professional associations Data specific search engines Commercial sources Consultation with research support professionals (e.g. librarians, archivists or data managers)

Q7_open: Please specify any other resources that you use to find data:

Please write your answer in the box below:

Q7a: Which statement(s) describe how you discover data using the academic literature?

Please select all that apply

I search the academic literature with the goal of finding data.

I find data serendipitously while reading articles or performing literature searches.

I follow citations and references in the literature to datasets.

I extract and use data from the literature directly (e.g. from tables, graphs, or instrument specifications and parameters)

Other. Please specify ____________

Q7b How successful are you at finding data with a general search engine (e.g. Google)?

Please select one answer

Very successful

Successful

Sometime successful, sometimes not

Rarely successful

Not successful

Q8: How frequently do you find data in the following ways?

Please select one answer per row

Often Occasionally Never By actively searching for data in an online resource Serendipitously, when searching for something else (e.g. when looking for journal articles or news) Serendipitously, when NOT actively looking for something else (e.g. via an email notice or interaction with a colleague) In the course of sharing or managing my own data

Q9: Please indicate if you use the following to discover, access, or make sense of data.

Please select all that apply

Q10a - Discover Q10b - Access Q10c - Making sense of data Conversations with personal networks (e.g. colleagues, peers) Contacting the data creator Developing new academic collaborations with data creators Attending conferences Disciplinary mailing lists or discussion forums

Q10: Do you discover data differently than how you discover academic literature?

Please select one answer

Yes

Sometimes

No

Q10a: How is your process for finding data different than your process for finding academic literature?

Please write your answer in the box below:

Q11: How easy is it to find data?

Please select one answer

Easy

Sometimes challenging

Difficult

Q11a: Why is it challenging to find the data that you need?

Please select all that apply

The data are not accessible (e.g. behind paywalls, held by industry).

I don't know where or how to best look for the data.

The data are located in many different places.

The data are not digital.

Online search tools are inadequate.

I do not have the personal network needed to find or access the data.

Other. Please specify ____________

Part 3: Evaluating Data

Q12: Please indicate the importance of the following information when deciding whether or not to use secondary data.

Please select one answer per row

Extremely important Important Somewhat important Less important Not important Data collection conditions and methodology How data has been processed and handled Reputation of data creator Personally knowing the data creator Reputation of data source (e.g. repository or journal) Detailed and complete metadata and documentation Data size Data format Licensing/copyright conditions Correct coverage (time, location, population, etc.) Original purpose of the data Ease of access Topic relevance

Q12_open: Please specify any other information you consider when deciding whether to use or not secondary data.

Q13: How important are the following strategies in evaluating and making sense of data?

Please select one answer per row

Extremely important Important Somewhat important Less important Not important Consulting associated journal articles Consulting data documentation and codebooks Consulting the data creator Consulting personal networks (e.g. colleagues, peers) Exploratory data analysis (e.g. statistical checks, graphical analysis)

Q13_open: Please specify any other strategies you consider to evaluate and make sense of data.

Please write your answer in the box below:

Q14: Please indicate the importance of the following in helping you to establish trust in secondary data.

Please select one answer per row

Extremely important Important Somewhat important Less important Not important Others' prior usage of the data Reputation of source (e.g. repository, journal) Reputation of data creator Transparency in data collection methods Lack of errors Ease of access Personal relationship with the data creator

Q14_open: Please specify any other important aspects you consider to help establish trust in secondary data.

Please write your answer in the box below:

Q15: Please indicate the importance of the following in helping you to establish the quality of secondary data.

Please select one answer per row

Extremely important Important Somewhat important Less important Not important Lack of errors Ease of downloading and exploring data Data size Data completeness Reputation of source (e.g. repository, journal) Resolution or clarity Reputation of data creator Detail or amount of work done to prepare data Consistency of formatting

Q15_open: Please specify any other important aspects you consider to help establish the quality of secondary data.

Please write your answer in the box below:

Part 4: Demographics

You are nearly at the end of the survey. Below are some questions to help us classify your answers.



D1: In which subject discipline do you specialize?

Please check all that apply.

Agriculture

Arts and Humanities

Astronomy

Biochemistry, Genetics, and Molecular Biology

Biological Sciences

Business, Management and Accounting

Chemical Engineering

Chemistry

Computer Sciences / IT

Decision Sciences

Dentistry

Earth and Planetary Sciences

Economics, Econometrics and Finance

Energy

Engineering and Technology

Environmental Sciences Health professions

Immunology and Microbiology

Materials Science

Mathematics

Medicine

Multidisciplinary

Neuroscience

Nursing

Pharmacology, Toxicology and Pharmaceutics

Physics

Psychology

Social Science

Veterinary

Information science

Other. Please specify____________

D2: How many years of professional experience do you have in your field?

Please select one answer

0-5

6-15

16-30

31+

D3: In which county do you currently work?

Afghanistan

Albania

Algeria

American Samoa

Andorra

Angola

Anguilla

Antarctica

Antigua and Barbuda

Argentina

Armenia

Aruba

Australia

Austria

Azerbaijan

Bahamas

Bahrain

Bangladesh

Barbados

Belarus

Belgium

Belize

Benin

Bermuda

Bhutan

Bolivia

Bosnia and Herzegovina

Botswana

Brazil

British Indian Ocean Territory

Brunei

Brunei Darussalam

Bulgaria

Burkina Faso

Burundi

Cambodia

Cameroon

Canada

Cape Verde

Cayman Islands

Central African Republic

Chad

Chile

China

Christmas Island

Cocos (Keeling) Islands

Colombia

Comoros

Congo

Cook Islands

Costa Rica

Cote d'Ivoire

Croatia

Cuba

Cyprus

Czech Republic

Denmark

Djibouti

Dominica

Dominican Republic

East Timor

Ecuador

Egypt

El Salvador

Equatorial Guinea

Eritrea

Estonia

Ethiopia

Falkland Islands (Malvinas)

Fiji

Finland

France

French Guiana

French Polynesia

French Southern Territories

Gambia

Georgia

Germany

Ghana

Gibraltar

Greece

Greenland

Grenada

Guadeloupe

Guam

Guatemala

Guinea-Bissau

Haiti

Heard Island and McDonald Islands

Holy See (Vatican City State)

Honduras

Hong Kong

Hungary

Iceland

India

Indonesia

Iran (Islamic Republic of)

Iraq

Ireland

Israel

Italy

Jamaica

Japan

Jordan

Kazakhstan

Kenya

Kiribati

North Korea

Kuwait

Kyrgyzstan

Lao People's Democratic Republic

Laos

Latvia

Lebanon

Lesotho

Liberia

Libyan Arab Jamahiriya

Lithuania

Luxembourg

Macau

Madagascar

Malawi

Malaysia

Maldives

Mali

Malta

Martinique

Mauritania

Mauritius

Mexico

Micronesia (Federated States of)

Monaco

Mongolia

Montserrat

Morocco

Mozambique

Myanmar

Namibia

Nauru

Nepal

Netherlands

Netherlands Antilles

New Caledonia

New Zealand

Nicaragua

Niger

Nigeria

Niue

Norfolk Island

Norway

Oman

Pakistan

Palau

Panama

Papua New Guinea

Paraguay

Peru

Philippines

Pitcairn

Poland

Portugal

Puerto Rico

Qatar

Reunion

Romania

RUSSIA

Rwanda

Saint Helena

Saint Kitts and Nevis

Saint Lucia

Saint Vincent and the Grenadines

Samoa

Sao Tome and Principe

Saudi Arabia

Senegal

Serbia and Montenegro

Seychelles

Sierra Leone

Singapore

Slovakia

Slovenia

Solomon Islands

Somalia

South Africa

South Korea

Spain

Sri Lanka

Sudan

Suriname

Swaziland

Sweden

Switzerland

Syrian Arab Republic

Taiwan

Tajikistan

TANZANIA

Thailand

Togo

Tonga

Trinidad and Tobago

Tunisia

Turkey

Turkmenistan

Turks and Caicos Islands

Uganda

Ukraine

United Arab Emirates

United Kingdom

United States Minor Outlying Islands

Uruguay

USA

Uzbekistan

Vanuatu

Venezuela

Viet Nam

Virgin Islands

Virgin Islands (US)

Virgin Islands, British

Wallis and Futuna

Yemen

Zambia

Zimbabwe

Palestinian Territory, Occupied

Moldova, Republic of

Marshall Islands

Macedonia, The Former Yugoslav Republic of

Liechtenstein

Korea, Republic of

Guyana

Guinea

Gabon

Faroe Islands

Zanzibar

Tokelau

D4: What type of organization do you work for?

Please select one answer

University or college

Research institution

Government agency

Corporate

Independent archive or library

m Other. Please specify ____________

D5: Please indicate how the following people feel about sharing their research data.

Please select one answer per row

Data sharing is strongly encouraged Data sharing is somewhat encouraged Data sharing is neither encouraged nor discouraged Data sharing is somewhat discouraged Data sharing is strongly discouraged Don't know/ Not applicable You The people you work with directly Your disciplinary community Your institution

D6: Please indicate how the following people feel about reusing data produced by other people.

Please select one answer per row

Data reusing is strongly encouraged Data reusing is somewhat encouraged Data reusing is neither encouraged nor discouraged Data reusing is somewhat discouraged Data reusing is strongly discouraged Don't know/ Not applicable You The people you work with directly Your disciplinary community Your institution

D7: Have you ever shared your own research data?

Please select one answer

Yes

No

D8: Final comments: Do you have anything else that you would like us to know?

Please write your comments in the box below:

Additional questions asked to participants selecting “Librarian, archivist or research/data support provider” as their role.

L3: Do you use or need secondary data for your own research or to support others?

Please select one answer

For my own research

To support others

For both my own research and to support others

L4: Who are the people whom you support?

Please select all that apply

Students

Researchers

Industry employees

Other. Please specify ____________

L5: How do you support people with their data needs?

Please select all that apply

I teach people about data management planning (e.g. through consultations, workshops, etc.).

I teach people how to discover and evaluate data (e.g. through consultations, workshops, etc.).

I find data for people.

I help people to curate their data.

I find literature for people.

Other. Please specify ____________

Appendix B

P-Value Tables

Table B1. P-value table for Figure 6: Associations between disciplinary domain and needed data.

Note. Significance was determined at the p < .05 level with a Bonferroni correction with m = 155. Significant associations are marked with an asterisk and colored in blue.



Table B2. P-value table for Table 4: Associations between types of data use and needed data type.

Note. Significance was determined at the p < .05 level with a Bonferroni correction with m = 70. Significant associations are marked with an asterisk and colored in blue. “Other” options are not shown as there were no significant associations present.



Table B3. P-value table for Table 4: Associations between types of data use and other data uses.

Note. Significance was determined at the p < .05 level with a Bonferroni correction with m = 196. Significant associations are marked with an asterisk and colored in blue. “Other” options are not shown as there were no significant associations present; duplicate values were removed.



Table B4. P-value table for Figure 8: Associations between disciplinary domain and data use.

Note. Significance was determined at the p < .05 level with a Bonferroni correction with m = 434. Significant associations are marked with an asterisk and colored in blue. “Other” options are not shown as there were no significant associations present.



Table B5. P-value table for Figure 15: Associations between data use and evaluation criteria.

Note. Significance was determined at the p < .05 level with a Bonferroni correction with m = 196. Significant associations are marked with an asterisk and colored in blue. “Other” options are not shown as there were no significant associations present.

Appendix C

Sources Used in Disciplinary Subset

Figure C1. Sources used in the disciplinary subset for respondents selecting only one discipline. Percents are percent respondents. Arts & humanities (n = 43); astronomy (n=14); biological science (n = 46); computer science (n = 57); earth & planetary science (n = 24); engineering & technology (n = 80); environmental science (n = 22); medicine (n = 91); physics (n = 42); social science (n = 81).

Disclosure Statement

The article associated with this supplement is part of the project Re-SEARCH: Contextual Search for Research Data and was funded by the NWO Grant 652.001.002

©2020 Kathleen Gregory, Paul Groth, Andrea Scharnhorst, and Sally Wyatt. This supplement is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the supplement.