Skip to main content
SearchLoginLogin or Signup

A Review of Data Valuation Approaches and Building and Scoring a Data Valuation Model

Published onJan 26, 2023
A Review of Data Valuation Approaches and Building and Scoring a Data Valuation Model
·

Abstract

Data valuation has been given increasing thought for the past 20 years. The importance of data as an asset in both the private and public sectors has systematically increased, and organizations are striving to treat it as such. However, this remains a challenge, as data is an intangible asset. Today, there is no standard to measure the value of data. Different approaches include market-based valuation, economic models, and applying dimensions to data. The first part of this article (Data Valuation Framework) examines these approaches and suggests a framework for grouping them. The second part of this article (Building and Scoring a Dimensional Data Valuation Model) describes how we built and scored a data valuation model.

Keywords: data valuation, intangibles, data as an asset, data monetization, data valuation model, data valuation framework


Media Summary

We often hear that data is becoming the new currency across our economy (e.g., Keller, 2020). It is a clear indication that we, as a society, want a way to value data in concrete terms. We are not there yet.

Today, business gambles on the future value of data by acquiring competitors for huge amounts of money based on things like “eyeballs.” Governments estimate future economic prosperity based on availability of data. Lastly, many of us, either explicitly or implicitly, calculate the value of specific data sets in improving our business outcomes. That these approaches exist is a sign that we are striving to apply currency to data. However, these different approaches to data valuation highlight that we are still experimenting with a repeatable approach to assessing data in terms of currency.

Over the past two or three decades, researchers have increasingly sought to define a repeatable approach to data valuation. Our research builds a framework that shows how approaches to data valuation typically fall into three categories. We examine these approaches in terms of their characteristics, differences, and commonalities, and highlight their strengths and challenges. We also present real-world examples of each approach. We then look more closely at one of these approaches and expand on historical attempts of its use to value data. In so doing, we develop an easy-to-use, repeatable model to value data for two use-cases.

We acknowledge that no single approach to data valuation exists today, and that different approaches—even a combination of approaches—can be used, depending on the use case.


1. Introduction

As longtime practitioners and teachers of data management, we are struck by the many references to ‘data as an asset.’ The implication is that data should be valued similarly to traditional assets. When a market exists, discounted value of future utility can be measured in monetary terms. However, when no market exists, the value of data must be calculated more creatively. For example:

  • Many organizations benefit from freely obtained data provided by people. What freely provided data is worth is subject to interpretation. In one recent large-scale survey, researchers estimated compensation required for individuals to forgo certain data-intensive applications, such as email, maps, and social media. They estimated, for example, that a typical U.S. user of Facebook might require $48 per month to forgo that data (Brynjolfsson et al., 2019). That is slightly more, but in the same ballpark as the roughly $27 per user Facebook’s revenue divided by the number of users might indicate.1

  • A 2011 market assessment of public sector data in the United Kingdom estimated its value at £1.8 billion (Deloitte, 2013). This includes direct value to sellers of public sector data, direct value to entities that interact with public data, and indirect value affecting supply chains.2 This estimate triples in value when factors such as data reuse and wider societal impacts are taken into account.

Sometimes, when no market exists, the value of data can be stated in only relative terms rather than in monetary terms. For example, during the recent COVID-19 pandemic, many data sources were assembled and often available for free. Entities including hospitals, research organizations, governments, and academia had to decide which data sources served them best by examining a host of factors, including data volume and variety, data quality, and update frequency. These factors were not readily expressible in monetary terms. Instead, organizations had to value them in relative terms by implicitly or explicitly scoring their value.

To better understand how value can be applied to data, we took a two-fold approach. First, we did an environmental scan of approaches that have already been tried. Second, we built a data valuation model based on a small amount of real-world data.

Our research found many studies and executions of estimating data value. No standard approach to data valuation exists, and perspectives vary considerably based on the use case. Based on our findings, we created a framework that grouped data valuation approaches into three models: market-based, economic, and dimensional.

We found that business typically estimates the value of data in terms of cost and revenue when buying and selling data or data-intensive businesses (market-based). Government approaches to data valuation center on estimating economic benefit as a result of making data available—for example, making government data such as census, transportation, and health data publicly available in hopes of stimulating predetermined economic growth (economic). A third approach leverages data dimensions (dimensional). This approach examines valuation points of a specific data set both inherent to data, like data quality (e.g., completeness, accuracy, timeliness), and contextual to value data (e.g., frequency of use, ownership). For example, organizations routinely make decisions on acquiring (or keeping) one data set over another similar data set based on these dimensions.

This research article consists of two parts:

The first part covers data value initiatives to date. We group initiatives into three models:

  • market-based models, which calculate data’s value in terms of cost and revenue

  • economic models, which estimate data’s value in terms of economic and public benefit

  • dimensional models, which estimate data value based on categories or dimensions

Our research shows that each model can be used in different circumstances and none of these approaches work for every case. All models are speculative and subject to context external to the data. We also note that the three models overlap with each other. For example, government policy and legal regulation (e.g., privacy) affect all models. The right approach depends on a given use case. As a group, they can serve as the basis for a data valuation framework, with each use case leveraging one or more models.

The second part describes how we built and scored a dimensional data valuation model. We developed a survey containing about 30 questions. We used our dimensional model research as our baseline and built on prior work. Our proposed data valuation is based on questions in dimensions of ownership, cost, utility, age, privacy, data quality, and volume and variety.

The dimensional model is best suited to comparing one data set with another. Therefore, we focused on two use cases: how to compare the value of two similar data sets and how to assess the value of adding a data set to an existing data pool. We expanded on prior work by examining how stakeholders with different perspectives might weight and value dimensions differently. For example, we examined how government, a research organization, a hospital, and an academic institution might each weigh certain questions differently.

Our goal was to design an easy-to-use, customizable approach that helps organizations assess the value of specific data sets for specific use cases using a small, consistent set of dimensions. Our scoring reflects the relative value of data sets. It shows clear differences in some comparisons and more subtle differences in others. We concluded that our model can be used effectively as a baseline for determining the value of a data set in terms of a score and that the weighting of scores can vary significantly based on context and stakeholder perspective.

2. Data Valuation Framework

2.1. Models for Data Valuation

Our environmental scan reviewed many examples of data valuation spanning from more than 40 years ago to today. Through our research of prior approaches, we arrived at a data valuation framework that groups data valuation approaches into three models. We define these as follows:

  • The market-based model values data based on income (e.g., selling data), cost (e.g., buying data), and/or stock value (e.g., value of data-intensive organizations). Organizations routinely buy and sell data and data-intensive companies.

  • The economic model values data in terms of its economic impact. This model is frequently used by governments to assess the value of publicizing data. For example, governments share weather data, which helps sustain an ecosystem of weather forecasting.

  • The dimensional model values data by assessing attributes inherent to a data set (e.g., data volume, variety, and quality) as well as the context in which data is used (e.g., how the data will be used and integrated with other data). For example, organizations inherently decide to acquire, keep, or prioritize one of several similar but different data sets. To date, this is an informal process.

This grouping allows data researchers, practitioners, and policymakers from industry and government to better approach data valuation. Table 1 summarizes our data valuation framework. It provides an overview of the types of approaches included in each model. Subsequent sections provide examples for each model, a detailed description of each model, and each model’s strengths and challenges. In addition, Appendix A provides a summary of each model’s strengths and challenges, and Appendix B provides a comparison of data assets with traditional assets.

Table 1. Data valuation framework.

Model

Description

Market-Based Model

Models that assess data value in terms of its income, cost, and/or market worth, including:

  • Income and cost based

    • Buying and selling data

    • Leveraging data to improve products or services

    • Enhancing customer experience through associated data products or services

    • Assessing the value of data breach or loss

    • Licensing

  • Stock market based (e.g., mergers, acquisitions, initial public offerings)

Economic Model

Models that assess data value in terms of its economic impact, including:

  • Financial estimates of economic benefits

  • Value of data for the public good

  • Policy and legal regulation impact

Dimensional Model

Models that identify and prioritize categories (or dimensions), both data related and contextual, and then attempt to calculate or estimate data value, including:

  • Explicit comparisons of datasets

  • Formulas using dimensions to assess the value of data to business functions

2.2. Model Overlap

While the three models are different and rooted in specific use cases, there is overlap among them. For example, governmental policy and legal regulation (e.g., privacy) affect all models. Similarly, survey questions can be constructed to accommodate any model. Finally, we saw that cost and utility (sometimes expressed in financial terms) was used as a valuation method across all models. This overlap highlights the underlying similarity of the three models as well as their unique focus. Figure 1 reflects differences and commonalities in graphic terms.

Figure 1. Data Valuation Model Overlap.

2.3. Market-Based Model Examples

Market-based approaches to data valuation are an extension of physical asset valuation. Just like physical assets, data can be valued based on its cost, its sale value, or its income potential (Internal Revenue Service [IRS], 2020). In addition to these approaches to data valuation, companies are also using at least two different forms of cost, besides purchase cost. The first is data valuation in terms of insurance cost—what would the compromise or loss of data cost? The second is estimating the value of their competitors’ data and sometimes costing a purchase. Below are some examples of market-based data valuation:

  • Buying and selling data

    • Acxiom, Equifax, and Dun & Bradstreet are companies that only buy and sell data. They aggregate that data and enhance and repackage it for consumption. These data brokers value their data using cost and whatever income it will fetch in the free market.

    • Email addresses for marketing are available for purchase on the open market. For example, in 2014, it was possible to purchase four million emails for $75.95 (Nash, 2014).

  • Leveraging data to improve products or services

    • Price Waterhouse Coopers markets its services by showing that firms with greater effectiveness at gaining a financial return on investment from data—data trust pacesetters—routinely explicitly value their data using cost-benefit analysis (PWC, 2017).

    • Large retailers are selling their purchasing data to suppliers, which are eager to buy this data to improve the time to market of their products (Najjar & Kettinger, 2013).3

  • Enhancing customer experience through associated data products or services

    • FedEx provided its customers with online package tracking, enhancing its package delivery service. Thus, such data might be measured by the additional business it generates.

    • Companies frequently offer premium versions of software or free apps, including weather, fitness tracking, and analytic platforms for purchase. Companies like Spotify and Netflix use customer data to deliver enhanced streaming content and recommendations to their users.

  • Assessing the value of data breach or loss

    • With data breaches and data ransom increasing, companies routinely go through a data valuation exercise to determine how much and what kind of insurance to buy for their information assets. Here, value of the data is defined in terms of fines, loss of customers, and cost of preventing future breaches. Note that a company may want to insure a discrete piece of data or intellectual property. The answer, generally, is that it cannot be done because it is difficult to value data discretely (Najjar & Kettinger, 2013).

    • One observer noted that the TJX Co. breach disclosed in 2007, estimated to cost the company at least $180 million on over 46 million records, worked out to more than $4 per customer record—prompting the question of how a theoretical insurer coming into the company ahead of time to create such a policy would have calculated the value (Todd, 2015).

  • Purchasing or selling data-intensive companies

    • There are many examples of companies buying other companies for their data, which largely determines their worth in the marketplace. In one example in 2016, Microsoft Corp. acquired the online professional network LinkedIn Corp. for $26.2 billion (Microsoft, 2022). Other examples include Google’s acquisition of YouTube ($1.7 billion, 2006), Nest ($3.2 billion, 2014), and fitbit ($2.1 billion, 2019) or Facebook’s acquisition of Instagram ($1 billion, 2012) and WhatsApp ($22 billion, 2014).

2.4. Economic Model Examples

We found two types of studies on economic models: ones that explicitly estimate the value of open data and ones that focus on how policy creates public data value. The following reflect some economic model examples:

  • The economic value of earth observation from space. An assessment of the value of geospatial data for the Australian economy examines the impact of satellite data from weather monitoring; ocean health; and activities like oil drilling, landscape monitoring, agriculture, water monitoring, natural disaster management, and mining. Completed in 2015, the assessment projects a total of about $3 billion (Australian) economic, social, and environmental benefits to the Australian public ( Acil Allen Consulting, 2015).

  • Valuing the census. A report that quantifies the benefits to New Zealand from the use of census and population information estimates the value of census data in areas such as improved health funding, reductions in use of underutilized capital investments, ability to craft more precise policy, and overall benefit to government and private sector firms. The report concludes that, despite significant difficulties in developing a rigorous quantification, census data presents a $1 billion (New Zealand) benefit to its public (<5 million population) over 25 years (Bakker, 2013).

  • The California Consumer Privacy Act (2018). This regulation, effective January 2020, requires businesses, when offering certain services, to document a reasonable and good faith method for calculating the value of the consumer’s data.

  • Taxing data. New York City is working on legislation to create a data sales tax. The proposal’s authors outline a four-step approach, with Step 1 being to “quantify the amount of data generated by New Yorkers and commercialized for profit.” (Adams & Gounardes, 2020) In a similar vein, California’s governor, Gavin Newsom, in 2019 tasked a team to research a “data dividend,” a tax paid to either consumers or the state for selling individuals’ data (Ulloa, 2019).

2.5. Dimensional Model Examples

In addition to market-based and economic models for data valuation, numerous studies attempted to quantify additional categories—or ‘dimensions’—to value data. Such dimensions were based both on the data itself (e.g., data quality) as well as on the context within which the data was used (e.g., timeliness of delivery). We term this approach the dimensional model.

We found different approaches used to evaluate dimensional models, including the use of mathematical formulas, survey questions, examinations of prior studies (sometimes with new ideas), and actual attempts at categorizing data assets. An example of applying a mathematical formula is the calculation of business value of information used by Doug Laney (2018, p. 253) in his book Infonomics:

Business Value of Information = p=1n(Relevancep)ValidityCompletenessTimeliness\sum_{p = 1}^{n}\left( {Relevance}_{p} \right)*Validity*Completeness*Timeliness
where p = the number of business process functions.

Other models used surveys that consisted of asking respondents to value specific data set characteristics, such as age of data, accuracy, operational value, and replacement cost. We found numerous examinations of prior studies, some of which explored the impact to data valuation of adding new dimensions. We also found studies that examined the application of dimensions to real-world use cases. For example, one study categorized usage as either log analysis, identifying data consumers, or number of views/downloads, depending on the use case (Brennan et al., 2018).

Although we have witnessed that informal evaluations using the dimensional model are frequently done, we found limited published real-world examples. In one example, Highways England, which manages data on roads and related infrastructure, explored how much of its £115 billion in intangible assets was attributable to data. It mapped key data assets to business functions and their financial value, modulated by an assessment of each data asset’s potential market value, to show that the organization’s data was worth £60 billion (Laney, 2021).

We summarize prior approaches to the dimensional model in more detail in Section 3.3.

3. Model Detail

3.1. Market-Based Model

3.1.1. Market-Based Model Overview

The key feature of this model is that it uses income or cost to value data. Based on our findings, the market-based model of data valuation is widely practiced. Policy development regarding this model is still evolving. Currently, it is similar to permitted valuation techniques of other intangible assets, such as patents, copyrights, or software. In fact, the IRS guidelines for valuing intangible assets list “technical data” as one type of intangible asset (IRS, 2020). According to the IRS, the value of an intangible asset can be determined in the same way as for tangible assets: using a cost basis, gauging the asset’s value in the marketplace, or basing the asset’s value on revenue potential of the asset in question.

3.1.2. Market-Based Model Strengths

Market-based models allow for the monetary valuation of data based on what the market will pay, whether valuation is rooted in anticipated income, how much a data-oriented company might fetch in a sale, or speculation on what the loss of data is worth.

On the cost side, market-based models calculate the cost of a data breach or loss and the cost of insurance. Similarly, companies estimate their cost of letting competitors into the market and sometimes decide to acquire those competitors based on the projected value of their data.

Data may even lend itself to be bought and sold on an exchange. Examples of data marketplaces for both personal and business data are starting to appear (see, e.g., Dilmegani, 2022). This is also happening for data in the illegal market, such as credit card and Social Security numbers. It remains to be seen whether formal exchanges for legally traded data are sustainable.

3.1.3. Market-Based Model Challenges

One challenge with the market-based model is that factors besides data, like talent acquisition, may play a role.4 Another challenge is the small marketplace of buyers and sellers of data, resulting in a limited ability to compare one purchase with another (e.g., most do not share their prices). In addition, while there are data brokers that collect and then sell information, their accountability regarding the quality of information is sometimes in question, and these markets are not transparent (Federal Trade Commission, 2014).5

The market-based model also does not consider the value of data created by consumers, while companies in turn receive advertising revenue for this data. There is debate about the degree to which a tax placed on the marginal use of individual data may benefit the public versus disincentivize organizations from creating a user information marketplace (Bergemann & Bonatti, 2019).

Market-based data valuation is also impacted by restrictions on local markets. For example, a company may incur costs to comply with a mandate to store individual data locally. Companies must consider the risk of local piracy, favoritism toward local competitors, and censorship. These types of local restrictions ultimately factor into the value of data.

3.2. Economic Model

3.2.1. Economic Model Overview

The economic model values data in terms of overall economic and public benefits. Economic benefit might look at overall job gains, while public benefit might look at social benefits such as impact to privacy, health, and infrastructure. In some cases, data valuation using the economic model is squarely counter to using the market-based model. For example, much work goes into evidence-based health care based on big data, which relies on broad data from many sources, including providers, payers (i.e., insurers), and individuals (e.g., health apps; Harwich & Lasko-Skinner, 2018). An economic model might look at the overall value of such data for the public, while industry might leverage a market-based model to reduce costs or increase revenue for its sector only. We found many studies on valuing data for the public good (see, e.g., Open Data Watch, 2021). These studies are often done on behalf of governments. They estimate the value of data to the economy for the likes of geospatial data, census data, or public sector data in general.

3.2.2. Economic Model Strengths

The strength of the economic model lies in its focus on data valuation for the public good in two types of studies: those that estimate the value of open data and those that suggest the use of policy to drive public data value. The former project how open data can be leveraged by both government and the private sector to produce economic benefit. The latter discuss tweaking policy to effect those same kinds of benefits. Economic models are being actively used to determine the value of data. For example:

  • Generating value through aggregating data from many sources. An example of effective data aggregation is the U.K. Hydrographic Office’s (UKHO’s) transition from paper to digital maps on surface water and geospatial measures. This digital conversion allowed UKHO to aggregate mapping data, include other diverse sources, and then apply analytics. Today, in addition to the Royal Navy and defense, 90% of ships trading internationally use this data, which generates £150 million annually (HM Treasury, 2018).

  • Data produced by public institutions often spurs private innovation. Weather data and transportation data generated by the government is routinely enhanced and provided back to society either for free or for a fee (e.g., premium versions).

  • Some observers maintain that economic models can spur enhanced data access by espousing pro-competitive policies, making it harder for a small number of companies to hoard such data (Coyle et al., 2020).

  • Economic models are exploring the valuation of personal data by extracting a tax paid either directly to end-users or to the state for use of that data.

  • Economic models can designate data ownership. Data ownership today is not well defined. As a result, de facto ownership is common. Creating laws regarding data ownership, such as privacy laws that assign ownership of personal data to individuals, significantly shifts data value in favor of the owner.

3.2.3. Economic Model Challenges

Economic models are based on calculations with limited scope and take a long time to verify. To some extent economic models have been executed, but unlike market-based models, economic models are slow to be put into practice. This is likely because their implications—good and bad—are significant, and there is no profit motive. Governments act carefully to avoid negative implications. Examples of challenges include:

  • Overly restrictive laws or policies that might negatively affect the value of data, discouraging competition and wide data reuse. Such laws may lead companies to hoard data, defeating the purpose of the economic model. As an example, the potential negative economic impact of overly restrictive privacy laws may outweigh their intended benefit (Jones & Tonetti, 2019).

  • Unlike with physical goods, the flow of data is not tracked. This might entail data flow between businesses or countries, or free services delivered to end-users, like email, search results, or driving directions. Consequently, the data value for activities like unpaid data creation, data reuse, and cross-border flow is difficult to include in models (Organisation for Economic Co-operation and Development [OECD], 2019; U.S. Department of Commerce, 2016).

  • Economic models reflect data valuation in terms of projected financial as well as social benefits. Valuing social benefits based on data is particularly challenging.

  • The purpose of intellectual property law is to maintain the balance between innovation and public good. This has been tried with data in limited ways. For example, the U.K. Copyright and Rights in Databases Regulations of 1997 allow copyright of databases, including contents. Some studies suggest that strong externalities, such as the benefit of aggregating data from many sources, make copyright-like protection less appropriate for data (Duch-Brown et al., 2017).

3.3. Dimensional Model

3.3.1. Dimensional Model Overview

Numerous studies attempt to value data via dimensions. Such dimensions are based on both the data itself (e.g., data quality, age, format) as well as the context within which the data is used (e.g., time savings, level of ownership, delivery frequency).

An early study, by Niv Ahituv (1980), examined mathematical formulas to evaluate data systems, including in terms of timeliness (response time and frequency), level of nondesired data, value of data aggregation, format (medium, data organization, and data representation), and ranking of data importance. A subsequent study by the same author investigated attributes of information valuation, including timeliness, content, format, and cost (Ahituv, 1989).

An often-cited study to value data by Daniel Moody and Peter Walsh (1999) looked at different approaches to information value based on accounting practices, namely, cost, market value, and present value of expected revenue potential. The authors concluded these were the most effective valuation parameters. The authors also examined communications theory, the attempt to measure the value of information based on the amount of information communicated. This, they correctly concluded, leaves out the value of the content and is not a useful approach to data valuation.

More recently, Gianluigi Viscusi and Carlo Batini (2017) performed a compilation, documenting various prior studies of using dimensional data valuation. This compilation reflects the use of information quality (e.g., accuracy, timeliness, credibility) and information structure (e.g., abstraction, codification). It reiterates the importance of utility (financial value) as a data valuation category. In addition, the study highlights information diffusion (e.g., scarcity, sharing) and infrastructure (e.g., abstraction, embeddedness) as key data valuation categories.

In 2018, Douglas Laney, an analyst and author with Gartner at the time, popularized the concept of “Infonomics” in an effort to centralize discussion on valuing data as an asset. He discussed several models, at least two of which (intrinsic value and business value of data) involve data dimensions.

Table 2 summarizes our research into dimensional data valuation models. It is notable that some categories, like data cost, quality, and utility, are repeated across multiple studies, suggesting that they are particularly valuable dimensions.

Table 2. Summary of prior approaches to data valuation using the dimensional model.

Study

Data Value Categories

Conclusion

Brennan et al. (2019)

Operational value, replacement cost, competitive advantage, regulatory risk, timeliness, secondarily ease of measurement, and data quality

Reinforces a hierarchy of data value dimensions—that is, utility (including operational impact), context (including timeliness and competitive advantage), usage and quality, cost (including replacement costs), and the use of manual survey-based methods as useful for data valuation.

Brennan et al. (2018)

Usage (log analysis), cost (creation, maintenance), quality, intrinsic value, IT operations (surveys, trouble ticket analysis), contextual (e.g., access frequency, purchase cost, volume, appropriate data quality threshold, relevance), and utility

Monitoring data value is a necessary prerequisite to strategic data management. It is possible to assess the maturity of data value monitoring processes. Usage and cost are easiest to implement, and utility or operational value is the most important for organizations.

Laney (2018)

Intrinsic value (validity, completeness, scarcity, lifecycle), business value (relevance, validity, completeness, timeliness), performance value (relative key performance indicator benefit when leveraging information assets), cost value, market value, and economic value

Models are imperfect and have greater utility in combination than when standing alone. Dimensions may be modified based on organization needs. Models provide an indicator of ‘information asset management’ maturity, which typically results in increased value from data.

Fleckenstein and Fellows (2018)

Cost, data type (quality), maturity of data stewardship, data architecture, and data lifecycle

Principles from physical asset valuation may be used to extract relevant dimensions related to data valuation.

Harwich and Lasko-Skinner (2018)

Quality, format, ability to link data, type of data, reason of data collection, quantity, actionability, use of data, market capitalization, and relative cost of getting data elsewhere

Public authorities should develop a clear national strategy that seeks to optimize the value of data; help the public sector when accessed for commercial purposes; and ensure the value of data is optimized between data owners, the public sector, and industry.

Viscusi and Batini (2017)

Information quality (accuracy, accessibility, completeness, currency, reliability, timeliness, usability, credibility, believability, reputation, trustworthiness), information structure (abstraction, codification, derivation, integration), information diffusion (scarcity, sharing), information infrastructure abstraction, embeddedness, evolving (timeliness), flexibility, openness, sharing, standardization (codification), financial value, pertinence, transaction costs

These metrics may be useful for measuring information value. Data valuation analysis produced was limited due to the complex and multidisciplinary nature of information value. Further studies are recommended to clarify data valuation categories.

Nagle and Sammon (2017)

Business value (cost reduction, revenue generation, risk mitigation), acquisition (cost and legitimate need of data), level of integration (existing vs. needed), analytics effectiveness, delivery (data quality and visual impact), and level of data governance

A data value map can be used to gain a shared understanding.

Heckman et al. (2015)

Value-based, qualitative, and cost-based parameters

This model is a rudimentary step toward building data valuation.

Higson and Waltho (2010)

Cost/cost reduction, return on investment, risk mitigation, data security, data quality, utility (number of users or applications using), business satisfaction, and results

Argues for an asset-centric, value-based approach to the management of information.

Sajko et al. (2006)

Quantitative dimensions (value to business, value for other businesses, cost of reconstruction, value of data over time) and qualitative dimensions (information importance and age)

Information assessment is determined by two components: dimensions of information value and importance or priority of the dimensions. The value of information is contextual.

Moody and Walsh (1999)

Amount of data

Found the amount of data ineffective, since it excludes value of content. Concluded that acquisition cost, market value, and potential revenue are the best indicators.

Ahituv (1989)

Timeliness, contents, format, and cost

The multi-attribute approach incurs challenges, including variable identification, measurement, variable relationship between each attribute and data value, and trade-offs between variables.

Ahituv (1980)

Timeliness (response time and frequency), level of nondesired data, value of data aggregation, format (medium, data organization, data representation), and ranking of data importance

If problems of information systems measurement can be overcome, methods of evaluation exist.

3.3.2. Dimensional Model Strengths

The dimensional model incorporates data-specific and contextual attributes like data quality and stewardship, which other models leave out. These attributes underline the effective use of data. They are, to a large extent, the focus of data management and maturity models, such as the Capability Maturity Model Integration Data Maturity Model (CMMI Institute, 2022), the Federal Data Maturity Model (Data Cabinet, 2018), and the Data Management Association’s Data Management Body of Knowledge (DAMA International, 2020). Additional strengths of the dimensional model include:

  • Data dimensions are useful for the relative comparison of similar data sets.

  • This model lends itself well to survey questions. It allows simple and straightforward evaluation of key data dimensions by business users.

  • Data dimensions can extend the valuation approach of other models. For example, the aggregation of data—viewed as a strength of the economic model—using high-quality data is more beneficial than similar aggregation using lower quality data. Similarly, the buying and selling of data is highly dependent on factors such as data accuracy and timeliness.

  • This model fosters a standard definition of data dimensions, which will lead to wider adoption. This reinforces investment into data management, ultimately creating better, more consistent data.

  • Some dimensions feed into other dimensions. For example, timeliness, accuracy, lifecycle, and others are likely factors of both cost and utility, two of the most universal and useful dimensions. Being able to break down and compare cost and utility in these terms allows for concise appraisals of data valuation.

3.3.3. Dimensional Model Challenges

Key challenges of this model are listed below:

  • Value may vary considerably based on factors such as who uses the data and for what purpose it is used. For example, fraud detection relies on near-real-time data at the cost of data quality. Alternatively, an analysis of purchasing history using the same data demands higher data quality and can afford significant latency.

  • Similar data sets may be nonfungible. In some situations, a variety of data sets may contain similar but slightly different information and thus may hold different values. Because of this nonsimilarity, data assets cannot always be easily compared or substituted (Yousif, 2015).

  • Data value in this model is often measured via survey questions. Even if we can clearly define each dimension and how it is measured, valuation is subject to interpretation. Different survey takers may interpret the need for, say, data quality or stewardship differently.

  • This model is still evolving, and the surveys we found were small. Surveys are expensive to execute. The goal is that we can, over time, leverage a much larger data set pool to standardize and streamline survey questions sufficiently.

  • While it is possible to determine the relative value of two different data sets based on dimensions, translating that value to monetary terms likely requires the secondary application of a market-based or economic model to a given data set.

4. Building and Scoring a Dimensional Data Valuation Model

For the second part of our research, we focused on building a dimensional data valuation model that expands on prior models.6

We designed a survey of about 30 questions around an extended set of dimensions, both intrinsic to data (e.g., data quality) and contextual (e.g., data usage). For data, we leveraged three types of data sets: COVID-19 data, flight scheduling and navigation data, and voter data. We examined two use cases:

  1. how the value of one data set compares to a similar data set (flight scheduling and navigation, voter data)

  2. how a given data set adds value to existing data (COVID-19 data).

To define dimensions, we leveraged the research described in Sections 2 and 3 and expanded on that research. As a result, we created questions on cost, age, and ownership and added other questions around dimensions like privacy, licensing restrictions, and volume and variety.

We used our professional data management experience to apply a score to each question, weighted that score, and, in some cases, scored a data set from different perspectives. We explain details on scoring in Section 4.2. In the case of flight scheduling and navigation data, we vetted the results with the data set owners, as they were internal to our company.

4.1. Model Design

Our aim was to create a model that was simple, not too time-consuming, and usable by multiple stakeholders, including business-side analysts, engineers, and executives. The following were our steps:

  • We seeded our model based on dimensions uncovered in prior research, particularly those reflected repeatedly. We also relied on our experience in data management to confirm that certain dimensions, like usage, cost, and data quality, were useful to data valuation.

  • We created a set of survey questions around our dimensions, which we subsequently used to score the value of sample data sets.

  • We scored data sets to attain a value.

  • We expanded on prior research by adding new dimensions, adding weighted scoring, and scoring from different perspectives.

4.1.1. Survey Design

In our research, we saw limited design and execution of dimensional models. In one instance, a team asked 16 diverse participants to evaluate their own data sets based on a standard set of questions. They determined that certain dimensions, like operational impact, replacement cost, and timeliness, were more significant contributors to data valuation than others, such as competitive advantage or regulatory risk (Brennan et al., 2019).

Our aim was to create a model that was simple, not too time-consuming, and usable by multiple stakeholders, including business-side analysts, engineers, and executives. We felt that our model could serve as an initial evaluation and point to areas (e.g., cost, usage, data quality) that the evaluator can explore in more detail, if desired. We strove for simplicity and speed over detailed precision. While such a model may not provide all the answers, it can indicate the relative value of a data set, highlight potential risks, and promote informed decisions.

Our focus was on executing data valuation against two use cases. Our initial use case was aimed at comparing the value of similar data sets. For this, we used two similar flight scheduling and navigation data sets as well as two similar voter data sets. From experience, we knew this to be useful to any large organization that wants to reduce the number of similar data sources or wants to replace an existing data set with a similar but better one (e.g., less costly, more reliable, less maintenance).

We formulated our second use case by working with several internal projects. These projects needed to evaluate adding new data to their existing data pool. For this, we used baseline COVID-19 data sets, to which we added additional data. Thus, our second use case became a comparison of the value of existing data versus that of existing data plus new data. The section on Data Sets discusses our data sets in more detail.

Through our research and data mapping, we had a good idea of which dimensions mattered. Our dimensions expanded on prior work and evolved through repeated testing and interactions with stakeholders. In the end, we found our best results by asking questions in the dimensions of ownership, cost, utility, age, privacy, data quality, and volume and variety. Table 3 reflects our final set of dimensions.7

Table 3. Data valuation dimensions.

Dimension

Description

Ownership

Addresses outright data set ownership plus licensing restrictions and service agreements

Cost

Addresses the cost of data set acquisition, maintenance, and replacement

Usage

Addresses dataset mission criticality, ability to integrate, usage scope, usage frequency, metadata, additional resources, expected increase in demand, and diminishing value

Age

Addresses refresh rate and available history

Privacy

Addresses whether the dataset contains sensitive data such as Personally Identifiable Information (PII) and Protected Health Information, and meets privacy standards

Data quality

Addresses completeness, accuracy, currency, consistency, duplication, trustworthiness, and timeliness

Volume and variety

Addresses the number of records, scope of information for each record, and ability to answer needed questions

Next, we started formulating questions, and a related set of answers, within each dimension. For answers, we assigned incremental point values, giving a single point to the answer we deemed least valuable and adding an additional point for each answer we felt provided more value. We used our experience as researchers and our diverse backgrounds working for both government agencies and industry to formulate the questions.

We then started to apply the questions and answers to our data sets and perspectives. During this process, we looked for redundancies, clarity, and gaps in our questions. For redundancies, we ended up removing unwanted or duplicate questions. For lack of clarity, we rephrased the questions and answers to make them easier to read and understand. For gaps, we added missing questions. This process was repetitive in nature and led to a refined set of questions and answers.

4.1.2. Data Sets

For data, we leveraged three types of data sets. Specifically, for COVID-19, we leveraged cases/death rates, testing, and vaccination data sets; for flight scheduling and navigation, we leveraged similar vendor-compiled data sets; and for voter data, we leveraged data from two states: Ohio and North Carolina. These data sets were either openly available (COVID-19, Johns Hopkins University , 2021; voter data, U.S. Election Assistance Commission, 2020), or, in the case of flight scheduling and navigation data, accessible to us.

For flight scheduling, we used two owned data sets that were purchased at different times. For the navigation data sets, we used one data set that was provided for free and a similar data set that was purchased. COVID-19 data sets were publicly available from the Johns Hopkins University Coronavirus Resource Center (JHU), and voter data was publicly available from the U.S. Election Assistance Commission. One motivator for using both purchased and free data sets was that we could explicitly factor the cost into our data valuation comparisons. This allowed us, for example, to validate whether a more expensive data set had more data or higher data quality. We used COVID-19 and voter data because it is freely available, popular, abundant, of good quality, well suited to different perspectives, and easy to augment with variety, and it could commingle well. This approach allowed us to examine a variety of comparisons.

4.2. Scoring

To score data valuation, we used our own experience working with industry and government. We first assigned a raw score to each question. We based this raw score on the point value. We assigned a point value of 1 to the answer contributing least to a data set’s value and increased, by one, the value of each subsequent answer with the highest score for the answer contributing most to the data set’s value. Since some questions had more answers than others, the possible number of points was not the same for all questions. To standardize this process, we added a conversion factor so that questions with more answers were not automatically scored higher than those with fewer answers. Finally, we added a weight factor between 1 and 5 to each score. This served as an indicator of the importance of a question relative to all other questions.

In cases where there were different perspectives, we allowed for different weights by perspective. We arrived at this design through trial and error, noticing that certain dimensions—or survey questions within a dimension—may matter more for some organizations than for others.

We looked at the value of COVID-19 data from the perspectives of government, a hospital, JHU, and a public service research organization. For flight scheduling and navigation data, we examined a vendor, government, and a public service research organization.

The tables below are sample snapshots of our scoring:8

  • Table 4 reflects the data quality dimension for the comparison of two similar data sets, in this case flight scheduling data. We can clearly see that data set 2 has higher data quality than data set 1. It is noteworthy that, separate from the data quality score, the cost, usage, age, and volume and variety scores for data set 2 are also higher.

  • Table 5 reflects a snapshot of our COVID-19 scoring in the volume and variety dimension. Here, we reflect how adding testing and vaccination data to COVID-19 case and death rate data increases the valuation. It is noteworthy that, separate from the data-quality score, the usage score is much higher for the combined data. Additionally, cost and ownership are not factors since both data sets are public under the creative commons license.

  • Table 6 shows how different organizations value the COVID-19 data set differently. Here again we show a snapshot of volume and variety for our COVID-19 evaluation but for four different perspectives: government, a research organization, a hospital, and JHU. These perspectives are based on our own best guesses.

Table 4. Example snapshot of comparing two similar data sets.


Table 5. Example snapshot of adding data to existing data pool.

Note. JHU = Johns Hopkins University Coronavirus Resource Center.

Table 6. Example of different perspectives.

Note. JHU = Johns Hopkins University Coronavirus Resource Center.

4.3. Findings

  • Our scoring verified typical assumptions. For example:

    • When comparing two similar data sets, higher cost also showed higher data quality, more usage, more history, and greater volume and variety. This is reflected in the comparison between the two flight scheduling data sets.

    • For flight navigation, data set 1 was freely licensed while data set 2 was purchased. The comparison shows data set 2, while more expensive to acquire, rates significantly higher in usage, including the inclusion of metadata, ease of integration with other data sets, inclusion of additional resources, and popularity. Data set 2 also scored higher in data quality and volume and variety.

    • When comparing data sets that add value to existing data, the combined data sets scored higher. This is reflected in the COVID-19 data, where combined cases/deaths and testing/vaccination data has significantly higher usage than just the cases/deaths data. However, ours was a simple case, adding a small data set to another, relatively small data set. We anticipate that adding a small data set to a large data pool may not always result in this outcome.

  • The data sets each had their own strengths and weaknesses. This sometimes evened out valuations. For example:

    • For flight navigation, data set 2 was purchased, making it more valuable than the licensed data set 1. However, data set 1 has less restrictions for sharing within the organization.

    • For COVID-19 data, adding testing/vaccination data to cases/deaths data yields significantly more variety and volume. However, this added volume and variety increased the cost to maintain the data sets.

  • Context is important. For example:

    • For flight scheduling data, we scored three perspectives: vendor, government, and research organization. One of our usage questions revolved around frequency of use, which is daily for the government and the research organization but rare for the vendor. This implies a lower value for the vendor, which is counterintuitive since the vendor stands to profit from the data set. Thus, the vendor might give this question a low weight or no weight at all.

    • For privacy, we scored for Personally Identifiable Information (PII) and whether the data set met required privacy compliance. In the case of voter data, both data sets contained PII, which we valued higher. Such data is useful for a variety of analyses. However, meeting privacy compliance might require an organization to mask PII data, in which case it may value masked data higher.

    • The ability or desire to answer new questions for COVID-19 data likely differs across stakeholders (e.g., government, research organization, hospital, and JHU). While we did not engage stakeholders from each of these organizations, we assumed that COVID-19 data sets were more likely to be used for analytics by the government and research organizations.9

  • Data sets are more valuable when accompanied by additional resources. For both the navigational and flight scheduling data, the value of the data set increased when accompanied by a complete set of metadata and other resources, such as code, data analysis, reports, or additional lookups. The same case applies to voter data, where one of the data sets comes up with full metadata that explains all fields.

  • Our team experimented a lot with applying different weights. In the end, we applied weights that we thought were reasonable. We also concluded that weights are very context specific. For example, cost may matter much more to a particular stakeholder or in a particular context. We realized that weights may also differ by perspective. While our weights fell between 1 and 5, we encourage users to experiment with weights in ways that work in their context. The survey acts as a blueprint for stakeholders to register their professional opinion on the value of data sets.

  • There were instances we were not able to investigate. For example:

    • Our scoring reflected that some dimensions mattered more than others (e.g., usage, data quality, volume and variety). However, our sampling of data sets was small and differed in key ways (e.g., cost, ownership). We would need to score a much larger data set sample to say with confidence that certain dimensions or questions matter more in all cases.

    • As part of our study, we briefly experimented with dependencies. For example, one might start with asking whether an organization owns a given data set and then evaluate other alternatives, such as cost, usage, or data quality. We found that documenting such dependencies quickly leads to many complex threads without any evidence that beginning with one dimension/question before another one is more correct than another approach.

    • For a given data set, raw scores that are inherent to data (e.g., data quality, privacy, volume and variety) remained the same across stakeholders with different perspectives. Only scores for dimensions that are separate from the data (e.g., ownership, usage, cost) change. That said, the definition of data quality, for example, is how well data is suited for intended use. This may render a given data set a better data-quality fit for one organization than for another. Similar concepts could be examined for other dimensions, such as privacy and volume and variety.

    • We realized that the value of a given data set may differ for stakeholders. This led us first to add different perspectives and subsequently to include weights for each perspective. It also led us to unanswered questions. For example, with COVID-19 data, JHU obtains data, wrangles it, and then makes it freely available to others who do not have to wrangle the data in the same way. However, folding data wrangling into acquisition cost proved difficult, since we value freely acquired data as more valuable in the cost dimension. We tried to reverse valuation scoring here, giving the highest score to data that is costly to acquire. While this solved the problem for the COVID-19 model with different perspectives, the approach did not hold for other data set valuations. We were not able to solve this paradox of highly valuing free data while also accounting for the value of sunk cost.

    • We were able to determine the relative value of two different data sets based on dimensions using a score-based approach. Translating that value into monetary terms likely requires the secondary application of a market-based or economic model to a given data set.

    • We anticipate that, given a sufficient database of survey responses, it will be possible to apply artificial intelligence and machine learning to these surveys so that they can be more automatically completed. We understand that this requires logging many additional use cases.

5. Conclusions and Future Work

The first part of this article examines research into data valuation. We found many examples and were able to construct a framework that grouped the three approaches into the following models:

  • market-based models, which calculate data’s value in terms of cost and revenue/profit

  • economic models, which estimate data’s value in terms of economic and public benefit

  • dimensional models, which value based on data dimensions like data quality and ownership—both data-specific and contextual

Determining which model to leverage relies heavily on the given use case (e.g., purchasing a data-intensive company, calculating economic impact of data policy, or internally valuing a data set). Our preliminary research shows that each model is most useful in different circumstances and no approach alone is effective in every case. We also found overlap between models (e.g., cost, utility, policy). Depending on the context, the three models could conceivably be used in some combination.

For the second part of our research, we built a simple tool that we think can help organizations quickly and proficiently assess the value of data sets for specific use cases using a small, consistent set of dimensions. We focused on the dimensional model since this model allowed us to score the value of two similar data sets or the value of adding a data set to an existing data pool. Based on our experience and working with several internal projects, these use cases reflected real-world needs. We evaluated these two use cases against four different data sets, and we examined multiple perspectives against two of those data sets.

Our model shows that dimensions can be used effectively to compare two similar data sets or to evaluate the addition of a data set to an existing data pool. Our model also exhibits that context and perspective matter, based on factors like how the data set can and will be used. The dimensional model falls short of being able to value data in monetary terms. This likely requires the additional application of a market-based or economic model. Our model expands on previous dimensional models by suggesting a larger set of data valuation dimensions and applying weighting and perspectives to scoring. The model will benefit from being applied against additional data sets and use cases and then being subsequently evolved.

The demand for data valuation is fast growing.10 We see our research as one step toward a data valuation methodology that includes survey questions, feedback loops, and—eventually—a maturity model. Our intent is to expand our work to more data sets, both to verify as well as to enhance our model. We would like to investigate some of the stated items that we were not able to focus on in our current research. We would very much like to explore perspectives more deeply by, for example, working more directly with the JHU team. We plan to make all details of our current model, as well as well as future versions, publicly available for others to leverage, collaborate, and enhance.


Acknowledgments

We thank Dr. Nitin Naik and Dr. Kris Rosjford for useful insights and discussions. We thank the MITRE Corporation Innovation Program (MIP) for funding this research.

Disclosure Statement

The views, opinions, and/or findings contained in this report are those of The MITRE Corporation and should not be construed as an official government position, policy, or decision, unless designated by other documentation.

Approved for Public Release. Distribution Unlimited. Public Release Case Number: 21-3464.


References

Acil Allen Consulting. (2015, December). The value of earth observations from space to Australia. Spatial Information Systems Research Ltd. https://www.crcsi.com.au/assets/Program-2/The-Value-of-Earth-Observations-from-Space-to-Australia-ACIL-Allen-FINAL-20151207.pdf

Adams, E., & Gounardes, A. (2020, June 1). A tax on data could fix New York’s budget. The Wall Street Journal. https://www.wsj.com/articles/a-tax-on-data-could-fix-new-yorks-budget-11591053159

Ahituv, N. (1980, January). A systematic approach towards assessing the value of an information system. MIS Quarterly, 4(4), 61–75. https://doi.org/10.2307/248961

Ahituv, N. (1989). Assessing the value of information: Problems and approaches. ICIS 1989 Proceedings, 45. https://aisel.aisnet.org/cgi/viewcontent.cgi?article=1007&context=icis1989

Bakker, C. (2013, April). Valuing the census. Statistics New Zealand. https://www.stats.govt.nz/assets/Research/Valuing-the-Census/valuing-the-census.pdf

Bergemann, D., & Bonatti, A. (2019, March). The economics of social data: An introduction (Cowles Foundation Discussion Paper No. 2171). Yale University Cowles Foundation for Research in Economics. https://cowles.yale.edu/sites/default/files/files/pub/d21/d2171.pdf

Brennan, R., Attard, J., & Helfert, M. (2018). Management of data value chains, a value monitoring capability maturity model. In Proceedings of the 20th International Conference on Enterprise Information Systems: Vol. 2: ICEIS (pp. 573–584). SCITEPRESS. https://doi.org/10.5220/0006684805730584

Brennan, R., Attard, J., Petkov, P., Nagle, T., & Helfert, M. (2019). Exploring data value assessment: A survey method and investigation of the perceived relative importance of data value dimensions. In Proceedings of the 21st International Conference on Enterprise Information Systems: Vol. 1: ICEIS (pp. 200–207). SCITEPRESS. https://doi.org/10.5220/0007723402000207

Brynjolfsson, E., Collis, A., & Eggers, F. (2019, March 26). Using massive online choice experiments to measure changes in well-being. PNAS, 116(15), 7250–7255. https://doi.org/10.1073/pnas.1815663116

California Consumer Privacy Act (2018), https://leginfo.legislature.ca.gov/faces/codes_displayText.xhtml?division=3.&part=4.&lawCode=CIV&title=1.81.5

CMMI Institute. (2022). Data Management Maturity (DMM). https://stage.cmmiinstitute.com/dmm

Coyle, D., Diepeveen, S., Wdowin, J., Tennison, J., & Kay, L. (2020, February). The value of data. Bennet Institute for Public Policy, University of Cambridge. https://www.bennettinstitute.cam.ac.uk/wp-content/uploads/2020/12/Value_of_data_Policy_Implications_Report_26_Feb_ok4noWn.pdf

DAMA International. (2020, July). Body of knowledge. https://www.dama.org/content/body-knowledge

Data Cabinet. (2018, October). The federal government data maturity model. GSA. https://my.usgs.gov/confluence/download/attachments/624464994/Federal%20Government%20Data%20Maturity%20Model.pdf?api=v2

Deloitte. (2013, May). Market assessment of public sector information. Department for Business Innovation & Skills. https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/198905/bis-13-743-market-assessment-of-public-sector-information.pdf

Dilmegani, C. (2022, September 12). Data marketplaces: What, why, how, types, benefits, vendors. AI Multiple. https://research.aimultiple.com/data-marketplace/

Duch-Brown, N., Martens, B., & Mueller-Langer, F. (2017). The economics of ownership, access and trade in digital data (JRC Digital Economy Working Paper No. 2017-01). EU Commission. https://joint-research-centre.ec.europa.eu/system/files/2017-03/jrc104756.pdf

Federal Trade Commission. (2014, May). Data brokers—A call for transparency and accountability. https://www.ftc.gov/system/files/documents/reports/data-brokers-call-transparency-accountability-report-federal-trade-commission-may-2014/140527databrokerreport.pdf

Fleckenstein, M., & Fellows, L. (2018). Modern Data Strategy. Springer. https://link.springer.com/book/10.1007/978-3-319-68993-7

General Services Administration. (2021, October 14). Office of Shared Solutions and Performance Improvement (OSSPI); Chief Data Officers Council (CDO); Request for Information on Behalf of the Federal Chief Data Officers Council, 86 Fed. Reg. 57147, 57147–57149. https://www.federalregister.gov/documents/2021/10/14/2021-22267/office-of-shared-solutions-and-performance-improvement-osspi-chief-data-officers-council-cdo-request

Harwich, E., & Lasko-Skinner, R. (2018, December). Making NHS data work for everyone. Reform. https://www.abhi.org.uk/media/2272/reform-nhsdata.pdf

Heckman, J. R., Boehmer, E. L., Peters, E. H., Davaloo, M., & Kurup, N. G. (2015). A pricing model for data markets. In iConference 2015 Proceedings. Core. https://core.ac.uk/download/pdf/158298935.pdf

Hempel, J. (2017, March 14). Now we know why Microsoft bought LinkedIn. Wired. https://www.wired.com/2017/03/now-we-know-why-microsoft-bought-linkedin/

Higson, C., & Waltho, D. (2010). Valuing information as an asset. EURIM. http://faculty.london.edu/chigson/research/InformationAsset.pdf

HM Treasury. (2018, October 29). Getting smart about intellectual property and intangible assets. https://www.gov.uk/government/publications/getting-smart-about-intellectual-property-and-intangible-assets

Internal Revenue Service. (2020, September 22). Internal revenue manual – 4.48.5 Intangible property valuation guidelines. http://www.irs.gov/irm/part4/irm_04-048-005.html

Johns Hopkins University. (2021, October). Coronavirus Resource Center. https://coronavirus.jhu.edu/about/how-to-use-our-data

Jones, C. I., & Tonetti, C. (2019, August). Nonrivalry and the economics of data (Working Paper No. 3716). Stanford University. https://www.gsb.stanford.edu/faculty-research/working-papers/nonrivalry-economics-data

Keller, S.A., Shipp, S., Schroeder, A., Korkmaz, G., “Doing Data Science: A Framework and Case Study”, Harvard Data Science Review, February 21, 2020. https://hdsr.mitpress.mit.edu/pub/hnptx6lq/release/10

Laney, D. (2018). Infonomics: How to monetize, manage, and measure information as an asset for competitive advantage. Gartner Research.

Laney, D. (2021, February 1). Data valuation paves the road to the future for Highways England. Forbes. https://www.forbes.com/sites/douglaslaney/2021/02/01/data-valuation-paves-the-road-to-the-future-for-highways-england/?sh=88d6039612c0

Microsoft. (2022). Microsoft buys LinkedIn. https://news.microsoft.com/announcement/microsoft-buys-linkedin/#:~:text=Microsoft's%20%2426.2%2Dbillion%20acquisition%20of,software%2C%20such%20as%20Office%20365.&text=LinkedIn%20retained%20its%20distinct%20brand,to%20Microsoft%20CEO%20Satya%20Nadella

Moody, D., & Walsh, P. (1999). Measuring the value of information – An Asset valuation approach. ECIS. https://www.semanticscholar.org/paper/Measuring-the-Value-Of-Information-An-Asset-Moody-Walsh/bc8ee8f7e8509db17e85f8108d41ef3bed5f13cc

Nagle, T., & Sammon, D. (2017) The data value map: A framework for developing shared understanding on data initiatives. In ECIS 2017: Proceedings of the 25th European Conference on Information Systems (pp. 1439–1452). ECIS. https://aisel.aisnet.org/ecis2017_rp/93

Najjar, M. S., & Kettinger, W. J. (2013) Data monetization: Lessons from a retailer’s journey. MIS Quarterly Executive, 12(4), Article 4. https://aisel.aisnet.org/misqe/vol12/iss4/4

Nash, K. S. (2014, June 13). CIOs consider putting a price tag on data. CIO. https://www.cio.com/article/291030/leadership-management-cios-consider-putting-a-price-tag-on-data.html

NEJM Catalyst. (2018, January 1). Healthcare big data and the promise of value-based care. https://catalyst.nejm.org/doi/full/10.1056/CAT.18.0290

Office of Fair Trading. (2006, December). The commercial use of public information. https://webarchive.nationalarchives.gov.uk/ukgwa/20140402164714/http://www.oft.gov.uk/OFTwork/publications/publication-categories/reports/consumer-protection/oft861

Open Data Watch. (2021, October). Value of data inventory. https://docs.google.com/spreadsheets/d/1QRNZUKIrwKxq7J6EEfA6fRLpjYUevaNDpXMbwqx_Ogw/edit#gid=37279104

Organisation for Economic Co-operation and Development. (2019). Measuring the digital transformation: A roadmap for the future. OECD Publishing. https://doi.org/10.1787/9789264311992-en

Organisation for Economic Co-operation and Development, World Trade Organization, & International Monetary Fund. (2020). Handbook on measuring digital trade (Version 1). https://www.oecd.org/sdd/its/Handbook-on-Measuring-Digital-Trade-Version-1.pdf

Own Your Own Data Act, S. 806, 116th Congress. (2019). https://www.congress.gov/bill/116th-congress/senate-bill/806

Price Waterhouse Coopers. (2019, April 4). Leading organizations don’t just have a data strategy, they have a data trust strategy. https://www.pwc.com/gx/en/news-room/press-releases/2019/digital-trust-insights-data-trust.html

Ritter, J., & Mayer, A. (2018). Regulating data as property: A new construct for moving forward. Duke Law & Technology Review, 16(1), 220–277. https://scholarship.law.duke.edu/dltr/vol16/iss1/7/

Sajko, M., Rabuzin, K., & Bača, M. (2006). How to calculate information value for effective security risk assessment. Journal of Information and Organizational Sciences, 30(2). https://jios.foi.hr/index.php/jios/article/view/22

Shelton Leipzig, D. (2019). Transform – Data as a pre-tangible asset for a post-data world: The leader’s playbook.

Short, J. E., & Todd, S. (2017, March 3). What’s your data worth? MIT Sloan Management Review. https://sloanreview.mit.edu/article/whats-your-data-worth/

Steele, M. L. (2017, May). The great failure of the IPXI experiment. Cornell Law Review 4(102), Article 5. https://scholarship.law.cornell.edu/cgi/viewcontent.cgi?article=4730&context=clr

Taylor, L. (2016). The ethics of big data as a public good: Which public? Whose good? Philosophical Transactions of the Royal Society A, 374(2083). https://doi.org/10.1098/rsta.2016.0126

Todd, S. (2015, August 11). Insurance and data value. Information Playground. https://stevetodd.typepad.com/my_weblog/2015/08/insurance-and-data-value.html

U.K. Copyright and Rights in Databases Regulations of 1997. (1997). U.K. Statutory Instruments, No. 3032. http://www.legislation.gov.uk/uksi/1997/3032/contents/made

Ulloa, J. (2019, May 5). Newsom wants companies collecting personal data to share the wealth with Californians. The Los Angeles Times. https://www.latimes.com/politics/la-pol-ca-gavin-newsom-california-data-dividend-20190505-story.html

U. S. Department of Commerce. (2016, September 30). Measuring the value of cross-border data flows. https://www.commerce.gov/data-and-reports/reports/2016/09/measuring-value-cross-border-data-flows

U.S. Election Assistance Commission. (2020, October 29). Availability of state voter file and confidential information. https://www.eac.gov/sites/default/files/voters/Available_Voter_File_Information.pdf

Viscusi, G., & Batini, C. (2017, March 28). Digital information asset evaluation: Characteristics and dimensions (Working Paper). EPFL and University of Milano-Bicocca.

Yousif, M. (2015). The rise of data capital. IEEE Cloud Computing, 2(2), 4–4, https://doi.org/10.1109/MCC.2015.39


Appendices

Appendix A. Summary of Model Strengths and Weaknesses

Table A1 summarizes the strengths and challenges of each model.

Table A1. Model strengths and challenges.

Approach/Model

Strengths

Challenges

All models

  • Standards to determine data value do not exist.

  • May inadequately reflect data value due to lack of intellectual property protection (OECD et al., 2020).

  • Data value is highly context dependent; the same data set may be valued differently for different use cases.

  • Data valuation is speculative.

  • Legal ownership of data is not yet clearly defined (Duch-Brown et al., 2017; Own Your Own Data Act, 2019).

Market-based model

Income based:

  • Easy data valuation in terms of potential revenue or profit.

  • Free services in exchange for data incentivizes data markets.

  • Free data collection has led to heavily used free services and products, indirectly attributing value to such data (Brynjolfsson et al., 2019).11,12

Cost based:

  • Data exchange markets could minimize transactional costs while maintaining competition (Steele, 2017).

  • Regularly used for calculating cost of security and insurance.

Stock market based:

  • Easy use of stock price as an indicator for valuing data for data-intensive firms.

Income based:

  • Lack of compensation for freely provided data; individual cost not fully recognized (Bergemann & Bonatti, 2019).13

Cost based:

  • The marketplace is very small. Data valuation is limited to data-intensive firms.14

  • Inability or unwillingness to enter local markets due to legal restrictions (e.g., local storage, privacy, censorship, favoritism, piracy, hacking; U.S. Department of Commerce, 2016).

Stock market based:

  • May reflect other factors besides data (e.g., talent acquisition).

Economic model

  • Has the ability to positively impact data value for the public sector through policies or laws (e.g., through fostering competition; Coyle et al., 2020).

  • May contribute to direct or indirect public income through a data dividend or taxation (Adams & Gounardes, 2019; Shelton Leipzig, 2019).

  • Provides societal benefits through externalities, such as open data (Taylor, 2016), broad data aggregation, and data privacy.

  • The value of data is based on contingencies such as projected use of data and job increases.15

  • The value of data in activities like unpaid data creation, data reuse, and cross-border flow may be difficult to measure (U.S. Department of Commerce, 2016; OECD, 2019).

  • Can negatively affect the value of data, through policies and laws, discouraging competition and wide data reuse (Jones & Tonetti, 2019).

  • Policies vary from one location (e.g., country) to another.

Dimensional model

  • Classification-based data valuation may be useful for relative data value comparisons within a given context when there are no pricing options.

  • Can be used in combination with other models to enhance those models.

  • May be able to apply a standard category hierarchy to aid data value determination (Brennan et al., 2018; Jones & Tonetti, 2019; Sajko et al., 2006).

  • Classification-based data valuation can be complex and is highly context dependent (Short & Todd, 2017; Viscusi & Batini, 2017).

  • Data value estimates based on survey questions can be inconsistent over time (Brennan et al., 2019).

Appendix B. Key Differences and Similarities Between Data and Traditional Assets

One of the things that makes data valuation particularly difficult is that data is, in some ways, different from physical assets. For example:

  • Data is nonrivalrous, as it can be consumed simultaneously by multiple parties. However, this must be seen in context, as others argue data value can be diminished through broad consumption (Nash, 2014).

  • Data is an intermediate good. It reveals ways in which to derive value from other assets.

  • Data is freely generated and traded. Personal data that individuals provide to companies for free may include demographics, financial data, health data, activity data, consumption data, and more. The discussion of pros and cons on corporate use of freely provided data as well as taxation of this data (Adams & Gounardes, 2020) is evolving.

  • Data ownership is an evolving concept. Some locations, like the European Union (EU; Duch-Brown et al., 2017)16 and United Kingdom,17 have passed database copyright laws. Mostly, there have been only studies, calls for guidelines, and proposals on data ownership (Ritter & Mayer, 2018).

  • Data value is impacted by externalities. Data often gains value from being combined with other data. This has been shown, for example, with improved ability to diagnose health problems.18

Data valuation is also similar to valuing physical assets in some ways. Below, we highlight some of these similarities:

  • Data value is impacted by law. Through regulation, the law makes certain data, particularly personal and sensitive data, less accessible. This forces companies to treat personal and sensitive data in more costly ways and likely increases its value.

  • Data value is impacted by exclusivity. There is much debate about data ownership, particularly considering all the freely gathered data from people, by a handful of very large companies (e.g., Google, Facebook, Apple, Amazon). While these companies offer valuable services and products sometimes for free, they also create barriers of entry, triggering questions about anticompetitive practices.


©2023 The MITRE Corporation. All rights reserved.

Comments
0
comment
No comments here
Why not start the discussion?