Skip to main content
SearchLoginLogin or Signup

Discovering Datasets on the Web Scale: Challenges and Recommendations for Google Dataset Search

Forthcoming. Now Available: Just Accepted Version.
Published onFeb 07, 2024
Discovering Datasets on the Web Scale: Challenges and Recommendations for Google Dataset Search
·

Abstract

With the rise of open data in the last two decades, more datasets are online and more people are using them for projects and research. But how do people find datasets? We present the first user study of Google Dataset Search, a dataset-discovery tool that uses a Web crawl and open ecosystem to find datasets. Google Dataset Search contains a superset of the datasets in other dataset-discovery tools – a total of 45M datasets from 13K sources. We found that the tool addresses a previously identified need: a search engine for datasets across the entire Web, including datasets in other tools. However, the tool introduced new challenges due to its open approach: building a mental model of the tool, making sense of heterogeneous datasets, and learning how to search for datasets. We discuss recommendations for dataset-discovery tools and open research questions.

 Keywords: data discovery, data search, data reuse, data sharing, open data policy



02/07/2024: To preview this content, click below for the Just Accepted version of the article. This peer-reviewed version has been accepted for its content and is currently being copyedited to conform with HDSR’s style and formatting requirements.


©2024 Katrina Sostek, Daniel Russell, Nitesh Goyal, Tarfah Alrashed, Stella Dugall, and Natasha Noy. This article is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the article.

Comments
0
comment
No comments here
Why not start the discussion?