Access to information about the data used to train foundation AI models is vital for many tasks. Despite progress made by sections of the AI community, there remains a general lack of transparency about the content and sources of training datasets. Whether the result of voluntary initiative by firms or regulatory intervention, this has to change.
Keywords: artificial intelligence, machine learning, foundation models, training data, transparency, trust
12/13/2023: To preview this content, click below for the Just Accepted version of the article. This peer-reviewed version has been accepted for its content and is currently being copyedited to conform with HDSR’s style and formatting requirements.
©2023 Jack Hardinges, Elena Simperl, and Nigel Shadbolt. This article is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the article.