Skip to main content
SearchLoginLogin or Signup

Creating Engagements: Bringing the User Into Data Democratization

Published onApr 02, 2024
Creating Engagements: Bringing the User Into Data Democratization

You're viewing an older Release (#1) of this Pub.

  • This Release (#1) was created on Apr 02, 2024 ()
  • The latest Release (#2) was created on Apr 10, 2024 ().


The Foundations for Evidence-Based Policymaking Act of 2018 and supporting open science efforts offer a new opportunity for federal agencies to enhance government data ecosystems that improve public access to federal data. Public engagement will be critical to ensure that the efforts are sustainable. This article focuses on learning from other models to provide insights into building successful user engagement, drawing from fields as diverse as marketing, psychology, health care, economics, and information systems.

Keywords: user engagement, technology adoption, data democratization, open data, evidence-based policy

1. Introduction

Data are critical to informing evidence-based policymaking and ensuring reproducible empirical research (National Academies of Sciences, Engineering, and Medicine, 2018; Potok, 2023). The recent passage of the Foundations for Evidence-Based Policymaking Act of 2018 (hereafter Evidence Act) was a signal from the federal government of its commitment to enhance the efficiency of government programs, as well as public access to government data (Potok, 2023). Its passage also served as a call to action for federal agencies to improve government data ecosystems (Lane et al., 2022; Potok, 2023). New data access platforms like,, and have emerged to support federal agencies as they respond to federal mandates requiring them to publish their information online (Title II, Foundations for Evidence-Based Policymaking Act of 2018, Public Law 115-435). One example,, born out of the Evidence Act, promotes the concept of “open government,” which “aims to make government more open and accountable” (, 2024). This concept aligns with the democratization of science, which involves ensuring free availability and usability of scholarly information (National Academies of Sciences, Engineering, and Medicine, 2018). It also extends to making data available, accessible, secure, and usable to a broad user community. Achieving successful data democratization requires more than simply providing open access to data. It also involves engaging users, getting their input into what is most useful. User feedback is a central piece to informing decision-making upstream and enhancing the utility of data assets. As new technologies are developed in response to open data initiatives, much remains to be done to generate an active and engaged user community around federal data. This article focuses on learning from models in related fields to provide insights into building successful user engagement with federal data.

2. Theory of Change

This article uses a Theory of Change (ToC) framework to identify the necessary preconditions in developing an engaged user community and a possible path for specific strategic interventions aimed at sustaining open data initiatives (Taplin & Clark, 2012). User engagement is measured across disciplines, including health care, business, and marketing (Aji et al., 2019; Grover & Kar, 2020; Kumar et al., 2010). A deeper understanding of how customers interact with products or services, including data tools (Yarkoni et al., 2021), is shown to improve participation. Kumar et al. (2010) define engagement as the active interaction of a customer with a firm and the behavioral manifestation toward a brand or firm. Adapting this definition to our setting, we define engagement as the active interaction between users and the government and the adoption of open-data tools. In the spirit of data democratization, engaged users are a prerequisite to open data platforms that seek to promote active civic participation.

Central to the ToC framework are three stages. Assuming a set of initial conditions, strategic interventions, such as the deployment of user-friendly data tools (“Strategies”), work to improve data access for all. These interventions, if successful, drive short-term outcomes (“Targets”), which in turn propel long-term impacts (“Outcomes”) (Chen, 1990). Each element in this chain is interlinked, creating a coherent narrative from the inception of data democratization tools to its goals. In addition, various factors might influence the differential impacts of an intervention on those involved in this process. They include individual characteristics and contextual elements that can either enhance or diminish the program’s effectiveness, thereby providing insights into the nuanced responses of different groups to the implemented strategies. Understanding these factors (“Moderators”) and under which circumstances the process works effectively is relevant for tailoring and optimizing program outcomes (Figure 1).

Figure 1. Theory of Change Framework, adapted from Harvard University’s IDEAS Impact Framework (Center on the Developing Child, Harvard University, 2023).

Figure 1 (left column) illustrates strategies that could be used. These include the development and deployment of technology solutions, the execution of training and education programs, activities that foster user engagement, promoting policy advocacy and partnerships, and providing tailored support services. We hypothesize that when strategies are implemented together, for example, the development of accessible data platforms coupled with proper incentives, it will directly influence downstream targets like enhanced user engagement. Technology-based interventions are also augmented by educational tools like workshops and online courses. Workshops can cultivate networks that foster a collaborative environment essential for data interoperability and serve as forums for developing partnerships with federal agencies that can lead to a more robust data ecosystem. Support services can be offered to address individual user needs to optimize the utility of data resources. These strategies, though not exhaustive, collectively aim to democratize data access, ensuring users can leverage federal data effectively.

The targets of the ToC (Figure 1, middle column) include active engagement with open data tools and increased knowledge and confidence of users in interpreting and applying data. Data alone has little value; the use of open data platforms reinforces skills to understand and use data effectively to inform decisions—a value-adding process (Wolff et al., 2016). Accessible data platforms and educational strategies like workshops not only improve the awareness of data products and their usage but also give users technical and practical knowledge necessary for improved data literacy, such as instructing new users how to select, clean, analyze, visualize, and interpret federal data from open data platforms (Franklin & Bargagliotti, 2020; Martinez & LaLonde, 2020; Meng, 2020; Wolff et al., 2016). By instilling a sense of competence in data usability, user confidence grows. Promoting active engagement so data platforms are integrated into users’ workflows is only possible by encouraging users to adopt new technology. Strategies from stage one work together comprehensively to achieve these short-term targets in stage two, which are necessary preconditions for building an active community of users of federal data.

This integrated approach leads to long-term outcomes in the third stage (Figure 1, right column) that include accessible federal data, knowledgeable and empowered users, an open data community ethos, sustainable data practices, replicable research, and evidence-based policy actions, each of which are components of the core mission of democratizing data. While improving data access and usability of federal data is a primary goal of data democratization (Lane, 2024; Lane et al., 2022; Potok, 2023), another possible outcome, if strategies are implemented effectively, is a community of empowered users. Studies show that individuals who feel empowered using services remain loyal to that service provider (Vatanasombut et al., 2004) and user satisfaction has a positive impact on user retention, defined as the longevity of a user’s loyalty to and interest in a product or service (Bansal et al., 2004; Chen & Li, 2017; Grover & Kar, 2020; Gu et al., 2022). Users who feel equipped with the proficiency to navigate and utilize federal data can provide important feedback to federal agencies that are interested in supporting expanded research and innovation (Potok, 2023). This feedback loop aligns with open government practices and drives the data democratization process forward. The process embeds an open data ethos that promotes research replicability, and aims to institutionalize sustainable data practices, that is, accessibility, efficiency, usability, and transparency, which will continue to be important for federal agencies as they improve tools to enhance the effectiveness of evidence-based policymaking across the government (Potok, 2023).

Several factors can moderate the strategies, targets, or outcomes. Technological literacy and access to technology serve as moderators (Figure 1, bottom box), as individuals and institutions with greater digital resource access and proficiency are likely to benefit more substantially. Additionally, the resource availability at institutions, particularly in the context of 1890 schools and minority-serving institutions, contribute to the initial conditions that factor into successfully implementing strategies championing data democratization. Cultural contexts and educational backgrounds also significantly modulate the effectiveness of data democratization efforts, suggesting that this conceptual framework might be less effective if we do not address the diverse needs of various groups. These factors stress the need for comprehensive outreach activities that can reach diverse audiences, ensuring the democratization efforts are as inclusive and impactful as possible (Exec. Order No. 14035, 2021). Lastly, psychological barriers, such as a user’s trust in a new service or tool, act as moderators to technology adoption. In this article, we discuss psychological barriers of technology adoption and offer a set of empirically tested strategies that work to boost user engagement.

While open data initiatives target all citizens as its users, we define users here as those who use federal data for empirical studies, although the concepts presented in this article can be applied to a broader community of users. We define users in this way for two reasons. First, it is assumed that users within the research community have incentives to use federal data. As we will discuss later in this article, strategies to build user engagement require underlying motivation to engage with new technology, such as tools that promote increased data accessibility. A research-oriented user is more inclined to engage with a tool and faces fewer moderators than someone who does not intend to use federal data for empirical purposes. Second, the user community of empiricists actively participates in research production, which consists of asking a question, formulating testable hypotheses, identifying data to test the hypotheses, running statistical analyses on the data, deriving conclusions, and disseminating results. It is through the final step of disseminating results, either in research publications or government reports, that it is possible to both measure the reach of federal data and identify opportunities to improve the utility of federal data (Yarkoni et al., 2021). Because the research community is already embedded in a formalized research production process that is incentivized to provide feedback through data analysis, defining users as empirical researchers is the most straightforward way to conceptualize data democratization in this context.

Applying the ToC framework to a practical example that encapsulates the mission of data democratization, we look toward the Sloan Digital Sky Survey (SDSS). The SDSS project’s objective was to compile a comprehensive astronomy data set and provide it as a publicly accessible resource for its user base: the astronomy community (Szalay et al., 2000). At its inception, few examples served as a guide on how to present a large-scale data solution that would meet the long-term needs of their user base (Szalay, 2018). Aside from the capital investment necessary to fund this effort, community trust levels acted as a critical moderator. Initially, there was skepticism about whether SDSS would fulfill its promise of releasing timely data. The project consistently adhered to data release schedules and shortened proprietary periods almost to real-time public access, which helped build trust. Early-stage doubts eventually waned and the SDSS program gained credibility (Szalay, 2018). This shift in community trust strengthened user engagement, and, over time, this project has significantly impacted the astronomy community. Its success as one of the first open e-science archives lies in offering high-quality data through user-friendly platforms, leading to widespread community adoption of virtual telescopes (Szalay, 2018). While the long-term maintenance highlights the complexities of operating open archives, influencing how science manages and curates data, the project’s insights extend beyond astronomy, providing valuable lessons for other disciplines in managing complex, evolving data sets and integrating them into accessible platforms. We revisit this example later in this article, in the section “Lessons Learned.”

3. Technology Adoption and User Engagement

The Evidence Act and other related open data initiatives have catalyzed federal agencies to innovate the way in which federal data are provided to users. From data visualization tools to self-service platforms, new technologies from this unfunded mandate have provided or will provide advancements in the infrastructure and capabilities necessary for users to access, analyze, and interpret data (Potok, 2022). Potok (2022) describes the coordinated efforts required to support the development of, a robust search and discovery platform managed by New York University (NYU).1 While this one-of-a-kind platform is a hallmark example of the type of technologies that will support federal agencies’ efforts to broaden their community of users, the psychological barriers associated with new technology adoption remain, serving as a moderator in the ToC framework.

Examining technology development and bottlenecks related to their integration into users’ professional life is not new. A large, published literature on the psychological factors that explain technology adoption describes the dynamics of how user attitudes and motivation can significantly influence the adoption of new technologies (Marangunić & Granić, 2015, inter alia). As noted by Mathieson (1991), “Developers employ a number of techniques to ensure users will accept the systems they build” (p. 173). In the development stages of a new technology, estimating eventual use is difficult, which is why intention to use is often used as a proxy measurement. Two models that have been designed and applied across many disciplines to predict an individual’s intention to adopt a new technology are the theory of planned behavior (TPB) and the technology adoption model (TAM) (Ajzen, 1985; Davis, 1985; Granić & Marangunić, 2019; Marangunić & Granić, 2015; Mathieson, 1991). The TPB suggests that individuals’ intentions to engage in a particular behavior, for example, technology adoption, are influenced by their attitudes, subjective norms, and perceived behavioral control, each of which corresponds to an underlying belief system (Ajzen, 1985; Mathieson, 1991). In the context of data democratization, this theory helps us understand the motivations and barriers that users may face when adopting new technologies. In this case, an example would be the extent to which a user believes the technology will improve her knowledge, skills, or confidence in acquiring federal data. If this outcome is important to the user, then the likelihood of adopting the technology is greater. The TAM, on the other hand, focuses specifically on the acceptance and adoption of technology. It suggests that users’ acceptance of a technology is influenced by two main factors: perceived usefulness and perceived ease of use. Perceived usefulness refers to the extent to which users believe that a technology will enhance their performance and productivity, while perceived ease of use refers to the degree to which users believe that a technology is easy to use and understand (Davis, 1985; Marangunić & Granić, 2015; Mathieson, 1991). It is important to understand both the TPB and the TAM, as they provide insight into designing activities to communicate the value of newly introduced tools, thereby addressing the psychological barriers introduced as a moderator within the ToC framework (Figure 1).

In the following two sections, we review the literature on how other fields have introduced new technologies and successfully engaged users by presenting a series of empirically tested strategies, and apply these insights to the data democratization tools under development.

4. Building User Engagement

From a psychological perspective, adopting a new technology that facilitates data access is not without its costs. Just as customers navigate a transaction process when making purchases, the concept of transaction costs influences the decision-making process of users as they engage with data and information in new ways. The theory of transaction cost economics (TCE) is a well-established economic framework that explains the factors that influence an economic agent to make a transaction (Liang & Huang, 1998; Rindfleisch & Heide, 1997; Shelanski & Klein, 1995). Although originally applied to the context of economic trade (e.g., Grossman & Hart, 1986; Riordan & Williamson, 1985; Williamson, 1989), its application has extended to other disciplines, including organizational behavior (Rindfleisch & Heide, 1997; Shelanski & Klein, 1995), supply chain (Williamson, 2008), information systems (Cordelia, 2006; Liang & Huang, 1998), and agriculture (Coggan et al., 2015; Heiman et al., 2020), and the fundamental concepts can be applied to technology adoption within the realm of data democratization.

In TCE, perceived transaction costs deter service usage if they outweigh the benefits of a user’s current approach (Rindfleisch & Heide, 1997). These costs are influenced by several factors, among them the activities and resources required to access and leverage resources effectively (such as search costs and opportunity costs), the costs of forgone alternative productive uses of an individual’s time, and the costs associated with the unexpected outcomes of the product or service (Liang & Huang 1998). Asset specificity, which refers to the degree to which investments are tailored for particular transactions, and uncertainty significantly influence these costs (Grønhaug & Gilly, 1991; Liang & Huang, 1998; Riordan & Williamson, 1985). Shelanski & Klein (1995) note that decision complexity and decision frequency also affect perceived costs. Users’ perceptions of the value gained in exchange for these costs shape their engagement attitude (Rindfleisch & Heide, 1997), which ultimately drives decision outcomes. Applied to data democratization, a user’s core decision involves evaluating perceived transaction costs associated with whether to adopt a platform (uncertainty) while also considering the perceived value (asset specificity). Risk perception can significantly impact users’ evaluation of transaction costs. Yarkoni et al. (2021) acknowledged perceived transactions costs as a threat to effective adoption of automated validation tools. They noted that adoption hinges on the tool’s ease of use, which is positively correlated with the perceived benefit to the user. If automation tools cannot be integrated into platforms commonly used by researchers, the methodological success of the tool, however profound, will not be realized.

As the federal government and supporting organizations make capital-intensive investments in sophisticated, durable tools to meet users’ targeted needs, marketing strategies that foster user engagement should aim to minimize user uncertainty. In the context of data democratization, both creating a good user experience (UX) and providing incentives will play a role in building a broad user community. A good UX is key for converting first-time visitors into repetitive users and keeping visitors engaged (Kierkegaard, 2021), while incentives compensate users for the transaction costs they incur (Sarin et al., 2021; Williamson, 1979, 1981). Together, these activities promote engagement. We present examples of interventions—both incentive-based and to generate user-experience—in the next section.

4.1. Activities to Build User Engagement

Research from various disciplines, including marketing, economics, psychology, health care, and information systems, has explored the effects of interventions on changes in user engagement. Although common engagement measures include responses to consumer surveys (Calder et al., 2016; Gu et al., 2022; Kim et al., 2013), other forms of engagement can be tracked in a more automated way and used to measure engagement behaviors. In the context of online services, user engagement can be measured by electronic word-of-mouth, recommendations, reviews, bookmarking, site click-throughs, as well as time spent on a site or users’ hazard of ending sessions (Grover & Kar, 2020; Gu et al., 2022; Van Doorn et al., 2010). For instance, resharing of information can be tracked on platforms like X (formerly Twitter) (Grover & Kar, 2020), or the number of views and likes can be measured on YouTube or Facebook (Rossmann et al., 2016). We discuss various interventions that have been used to boost engagement, specifically reward systems, referral programs, social engagement features, user onboarding, and focus groups for gathering user feedback. In proposing these interventions, we align with the theories of TPB and TAM, emphasizing the role of clear communication about technology benefits, and address transaction costs, as outlined in TCE, to encourage technology adoption.

  • Reward systems. Rewards are a form of external incentive that compensate individuals for their time, energy, or knowledge (Hung et al., 2011). Incentives such as rewards (or penalties) can motivate users to “take up an activity and guide the way they perform it” (Hagger et al., 2020, p. 524). Incentives serve as a powerful strategy to motivate and compensate users for their engagement (Friedrich et al., 2020) and should be commensurate with the level of perceived transaction costs for the users. Incentives that align with users’ motivations mediate the perceived costs associated with new technology adoption and are anticipated to drive active engagement.

Tangible rewards are often given when strengthening a knowledge-sharing relationship between a seeker and a contributor is the goal, such as on knowledge exchange platforms. These rewards can be monetary or nonmonetary, but in many cases take the form of financial incentives. Although financial incentives generate higher levels of engagement, such as in the number of questions answered, the results about the quality of knowledge contributions are mixed (Hsieh et al., 2010; Hung et al., 2011; Kankanhalli et al. 2005). However, Kuang et al. (2019) found that financial incentives spill over on other related nonincentivized engagement behaviors, possibly underestimating the overall positive effect of monetary incentives on user engagement.

Alternatively, intangible rewards provide the opportunity to reward performance without giving it a material value (Friedrich et al., 2020). Gamification is one way to implement an extrinsic reward system and has been applied in a variety of contexts (Koivisto & Hamari, 2019, inter alia). Often used outside of knowledge-sharing contexts, it makes intangible reward systems more appealing for services that seek to attract new users and build user engagement (Hermawan & Tjhin, 2023). Gamification has been demonstrated to build user engagement as measured by increased intention to use, word-of-mouth, and positive site ratings (Bitrián et al., 2021; Charry et al., 2023). Virtual currencies, progress bars, badges, and ranking systems incentivize users to actively participate and contribute to a platform or app. Users can earn rewards through activities such as completing challenges, participating in competitions, or extensively using the site’s features.

In data democratization platforms, implementing reward systems can motivate users to contribute valuable information, such as insights on data sets. This approach incentivizes users to add to the collective pool of data and knowledge. Rossmann et al. (2016) found that users with higher cognitive capital (knowledge, experience, and skills) were more likely to interact with a site and post more meaningful and useful content that attracted users who had little or no prior experience. In this way, recognition-based strategies can be designed to encourage continuous learning and skills development and foster a more knowledgeable user base. Researchers are the most likely initial users of open data platforms, so a recognition-based reward system might look beyond the traditional research publication model and recognize those who add value by improving or creating high-impact data sets. By rewarding quality contributions to data access and usability, platforms can drive sustained engagement.

  • Referral programs. A special case of extrinsic rewards are referral programs. Referral behavior is the acquisition of a new customer through a devoted referral program managed by the organization (Hermawan & Tjhin, 2023). Referral programs have been widely recognized as effective mechanisms to promote user growth by turning prospects into new customers (Hermawan & Tjhin, 2023; Kumar et al., 2010). Referrals can reduce new user acquisition costs (Kumar et al., 2010) by reaching users with otherwise weak ties to a service provider and converting them into newly acquired customers (Ryu & Feick, 2007). By offering exclusive benefits for bringing in new users through a referral program, platforms can stimulate growth.

  • Social engagement features. Two forms of social engagement features include social media and crowdsourcing. First, social media is a critical piece of an organization’s marketing plan and is used as a primary vehicle for promoting services, engaging customers, and recruiting users (Grover & Kar, 2020, inter alia). Social media acts as an electronic word-of-mouth communication and is used to increase the scope and reach of services (Rossmann et al., 2016). Studies show that social media marketing influences user’s attitudes and builds trust among users (Grover & Kar, 2020; Kumar et al., 2013; Shareef et al., 2019). Second, platforms have evolved to offer embedded user engagement tools that enable users to interact within sites. Crowdsourcing features, such as the ability to submit content or access crowdsourced content, have been shown to increase engagement and retention among mobile gaming app users (Gu et al., 2022). Giving users the ability to control their product use experience was reported to empower users and had a positive effect on user retention (Gu et al., 2022). Whether used as a marketing tool for promoting new services or integrating social media features into new platforms, such as highlighting trending data sets and building integrated discussion forums, social engagement features can enhance user interaction, enrich the UX, and foster user retention.

  • User onboarding. User onboarding is a process that enables users to understand and use a product or service by providing resources and information to help users grasp its functionality and benefits (Chiappetta, 2020). User onboarding serves two primary purposes: promoting utility by introducing users to a product or platform, and enhancing value by demonstrating the benefits and making them realize their need for it. Together, effective user onboarding increases the person’s success with a product or service and contributes to user retention (Wernerson & Carlsson, 2019).

The onboarding process typically involves a series of steps, including user registration, product or site tours, interactive guides, and personalized recommendations. These elements are designed to familiarize users with the platform and encourage users to explore its features. An insight that emerging open data platforms should consider is the power of effectively communicating their benefits, such as timely releases of data sets or providing metadata that describes what topics data users are studying, through well-tailored onboarding activities.

Developing an onboarding strategy requires evaluating a user’s experience, a process that is evaluated through usability testing. Usability testing, or usability evaluation, is a common technical and methodological approach that detects problems in site development to minimize negative outcomes of its usage (Bastien, 2010). While usability testing generally requires conducting focus groups, analytics services can be used to track important metrics, such as when users abandon a website or by monitoring click-through rates, to determine where potential bottlenecks exist, which can then inform usability testing protocol.

  • Workshops. If onboarding materials are used to quickly communicate the core value of a product or service, then workshops can be used to fill in gaps or strengthen understanding about a product or service. Workshops provide training about the use of the product or platform and offer an agile solution to delivering information that may fall outside of a more traditional learning environment. Workshops accentuate onboarding materials in two ways. First, they provide greater details on data resources and their applications as well as skills or resources needed to use complex data. Second, they create a forum to solicit input from user communities to identify emerging technologies and capabilities (National AI Research Resource Task Force, 2023). The Democratizing Data Project, for example, held a workshop in 2023 to solicit community input about the data usage tool developed for the U.S. Department of Agriculture (USDA, 2023). The participants learned about new site features, including dashboards, Jupyter Notebooks, and an Application Programming Interface, or more commonly referred to as an API, in a more intensive learning environment—one that would have fallen outside the scope of user onboarding (USDA, 2023). Open data platforms can use the workshop format in a variety of contexts to highlight their value to their user base.

  • Focus groups. Focus groups, along with other user feedback mechanisms, are tools used to gather insights for continuous improvement. Focus groups are group interviews that ask a series of open-ended questions to a small group of about six to 10 individuals (Morgan et al., 1998) to solicit feedback and encourage users to offer suggestions and report issues. This process creates a feedback loop that contributes to the platform’s ongoing enhancement. The use of focus groups is often seen in the health care field to improve the quality of health care delivery systems (Beaudin & Pelletier, 1996; Peters, 1993; Rabiee, 2004). Strategies for user retention, like focus groups, ensure that users continue to find value and relevance in the initiative, thereby maintaining their engagement. Open government requires feedback from users; conducting focus groups is a strategy that can offer insights into what aspects of a platform need to change to involve a more engaged user base, ultimately driving increased participation from a broader audience.

5. Other Considerations

The framework presented in this article makes assumptions about initial conditions that may not always hold true. We discuss several considerations here that should be addressed going forward.

First, although we have previously defined the target user of federal data as the researcher or empiricist, we acknowledge that the breadth of federal data users intended is much wider. A key challenge recognized by developers of data democratization tools has been the difficulty in comprehensively identifying federal data users due to the vast and varied usage of federal data sets. Efforts to quantify the spectrum of users are underway, using machine and deep learning methods (Lane et al., 2020). This task, however, is complex and requires considerable resources.

Identifying the scope of federal data users is important and its consequences extend beyond tracking the number of impressions or downloads of federal data. The most immediate interest is whether target users are being reached. For data sets produced by government statistical agencies, the target audience could be simply defined as all citizens, as efforts to democratize data suppose more inclusive participation from diverse groups of people to make government more accountable (, 2024). With more than 294,941 data sets listed on, identifying whether these resources are used requires collecting user demographics and feedback, thereby tracking not only usage statistics but also information that can enhance the overall utility, relevance, and accuracy of public data.

Second, to fully gauge the impact of data democratization efforts we must consider underlying disparities and acknowledge that these efforts may not initially reach all user groups equally. This moderator can be problematic and raises questions about whether the core users are representative of the broader user community of users (Mathieson, 1991). Failure to adequately represent the broader user community could diminish overall effectiveness (Lane, 2024).

Third, sustaining open data tools poses another challenge. The Evidence Act, like other directives and statutes, is an unfunded mandate (Potok, 2023). Continual funding and resource allocation are required to maintain, update, and adapt these technologies. This includes addressing evolving technological needs and user expectations. The question of how to sustain these initiatives inevitably arises, prompting considerations of monetization, commercialization, and potential sponsorships as viable avenues for ensuring ongoing support.

Fourth, the studies on the interventions presented above involve various participant groups, ranging from those in field experiments to app users. While the diversity of participants highlights the broad applicability of the strategies, it also calls attention to the need for tailored approaches to different audiences to align with target user preferences. The use of incentives to motivate users, while shown to be beneficial in many contexts, works as an intensifier of the underlying motivation. Incentives cannot create motivation that is not already present (Friedrich et al., 2020). Although a well-designed incentive system strengthens motivation and has a positive effect on engagement, the process of designing incentive-based interventions can be lengthy. Organizations looking to introduce them should prioritize which approach fits their needs best based on an effort to impact ratio.

Fifth, the development of usage statistics that track user engagement and retention without relying on surveys presents a methodological challenge. Today, with the help of site analytics, user interaction data can be collected, but the next step in tracking engagement and retention will be to develop standardized metrics across the network of platforms. Automating and institutionalizing these measurements will enable continuous assessment of and improvement in data democratization tools.

6. Discussion

The purpose of this article has been to present a framework that emphasizes the importance of engaging users while building the infrastructure to support the democratization of data. The ToC model highlights that active user engagement with tools (a target, Figure 1) is a necessary precondition to support open data goals (an outcome, Figure 1). This article indicates that psychological barriers can inhibit technology adoption (a moderator, Figure 1). We review two models—the TPB and TAM—that predict a user’s intention to adopt a new technology. At the core of these models is the importance of communicating the value of a new technology. However, without also understanding transaction costs, as noted in the TCE, successful technology adoption could be threatened. We draw on literature from various domains to present a series of empirically supported interventions (strategies, Figure 1) to boost user engagement. These activities not only address barriers to technology adoption but also demonstrate the value of new technologies to potential users, enhancing their engagement and adoption rates.

These insights have yet to be deployed and tested in platforms that support open data at scale. However, we can revisit the use case of the SDSS, which provides lessons for sustaining new and emerging open data tools. We can then apply these insights to ongoing federal initiatives aimed at building a community of data users. The remainder of this section discusses each of these in turn.

6.1. Lessons Learned

The SDSS provides a compelling example of an open data platform that was built to support a broad user community of scientists, primarily in the physics and astronomy fields (Szalay, 2018). Through the lens of the ToC framework, the outcome for SDSS was to provide open access to a scientific data set that combined large-scale surveys of the universe to provide a comprehensive digital map of the cosmos (Loveday, 2019). Active user engagement was a necessary precondition (a target) because users were meant to contribute to the refinement of the data set. Their platform showed success, in part, because they understood the importance of user engagement, and implemented effective strategies to build their user community.

The SDSS identified that perceived usefulness was a key moderator in driving user engagement. Initially, the SDSS encountered significant distrust within the astronomy community regarding the timely release of data. It took several years to convince astronomers that the SDSS would deliver on its promises (Szalay, 2018). SDSS showed that if users found a tool difficult to understand they would be less likely to use it, but if high-quality data were presented intuitively, then an entire community of users would be willing to change their traditional approach (Szalay, 2018). They created an intuitive platform that minimized friction so users could easily navigate it, and tracked usage statistics to understand points of resistance that needed to be addressed or target areas where the platform was not reaching (Szalay, 2018). A system was created whereby users diagnosed discrepancies within the data set that was curated by an SDSS team, who would then make refinements, allowing for the continuous improvement of the data set and expanding the utility of the SDSS data for a wide range of astronomical research (Szalay, 2018). This feedback cycle is a fundamental component of user engagement, and not unlike what the open data initiatives aim for as well: creating opportunities for increased government accountability and enhancing the efficiency of government programs. By understanding their moderators and strategically introducing activities that could mitigate poor user adoption rates, the SDSS was successful in achieving its goal, documenting that it had reached half of the professional astronomy community within the first 5 years of its deployment (Singh et al., 2006; Szalay, 2018).

As technological advancements and competition to accelerate open data platforms intensify, it is important to quickly build user engagement. This urgency stems from the risk that even technologies with high potential can quickly become outdated or overshadowed within a short period, emphasizing the need for effective user engagement activities to sustain relevance and adoption. Szalay (2023) writes that as new open data resources become available, building trust within the target community and creating a wide user base are necessary conditions to ensure the progress of data-centric platforms.

Their example provides key lessons for consideration, especially as it pertains to the sustainability of open data tools. First, a reality of ensuring the operational longevity of open data platforms is the need to secure funding from government grants and private organizations. The SDSS has received financial support from several funding sources. Alternative funding streams include product commercialization or fee-based platforms; however, these would be antithetical to the mission of open data. Funding enables the hiring of skilled teams that can manage complex data sets and maintain data accessibility for its users. Extrapolating these insights to use cases beyond SDSS, it is well understood that financial resources will be required to facilitate the development of open data platforms and services.

Nonfinancially, SDSS emphasized community engagement and offered nonmonetary incentives for contributions. These activities are seen as important strategies to overcome the psychological barriers that appear when introducing new technologies. The SDSS was committed to timely data releases, which was indicated by Szalay (2018) to be a major contributor to sustaining user interest and support. Their initial success in building trust and attracting users is highlighted through its open data policy and the provision of web-based tools that could manage and analyze large data sets within the SDSS platform. At the time, integrated computational services were not available (Szalay, 2018). These tools helped build a robust user community, demonstrating a successful model for engaging users and contributors in scientific research platforms. At the time of writing, SDSS has released an 18th edition of their data (SDSS, 2024).

As the number of users grew, the scope of the SDSS evolved (Szalay, 2018). The increasing complexity and volume of data required more sophisticated computational services. SciServer was developed to meet these demands (Szalay, 2023). Unlike SDSS, which primarily focuses on astronomical data collection and distribution, SciServer enables users to perform complex data analyses directly on the server. The evolution from SDSS to SciServer emphasizes the need for adaptability in sustaining open data technologies and fostering a community of engaged data users.

6.2. Further Implications

The lessons from SDSS and SciServer can be applied to the open data initiatives catalyzed by the Evidence Act. The standard application process (SAP) and’s tools, including the federal data catalog, have begun to lay a foundation for a more engaged federal data user community. These efforts have streamlined the process of accessing data and enhanced data discoverability. For long-term adoption, overcoming moderators that can inhibit their adoption can focus on improving access to technology, building community trust, and improving technical literacy. Partnering with academic, private, and nonprofit sectors can also broaden the impact of federal data, particularly among minority-serving communities. These may include developing more interactive platforms that facilitate user feedback, creating incentive programs to encourage innovative use of federal data, and implementing educational initiatives to inform a broader community of users, such as the federal workforce and the general public.

These tools are more important now than ever. Given the Supreme Court’s review of Chevron deference2 in cases like Loper Bright Enterprises v. Raimondo (2022) and Relentless v. Department of Commerce (2024), there is a clear signal that the judicial system may play a more significant role in evaluating the expertise traditionally reserved for federal agencies. This recent development highlights the role of open science in ensuring that policy and legal decisions are based on open, transparent, and publicly accessible information, and emphasizes the need for open, accessible data to inform policy and legal decisions. The SAP and are two important tools for democratizing data access, but the potential shift toward judicially led expertise evaluation makes the case for additional data sharing tools where stakeholders can contribute to, and agree on, the relevance and application of scientific data in policymaking and legal contexts.

7. Summary

This article focuses on learning from models in other fields to enhance user engagement, a necessary precondition to building a community of federal data users. A Theory of Change framework is used to trace the path from specific strategic interventions to improved user engagement in data democratization tools. Common psychological barriers can hinder technology adoption, highlighting the importance of addressing users’ belief systems to minimize perceived transaction costs associated with adopting new technologies. Research from various disciplines, including marketing, economics, psychology, health care, and information systems, has explored the effects of interventions on changes in user engagement. Interventions, such as reward systems, referral programs, social engagement features, user onboarding, and focus groups for gathering user feedback, have been used to boost engagement, improve relevance, and retain a continuous user base. The call to action will be to operationalize these strategies to continue to drive forward the initiatives of data democratization.

Disclosure Statement

Lauren Chenarides has no financial or non-financial disclosures to share for this article.


Aji, M., Gordon, C., Peters, D., Bartlett, D., Calvo, R. A., Naqshbandi, K., & Glozier, N. (2019). Exploring user needs and preferences for mobile apps for sleep disturbance: Mixed methods study. JMIR Mental Health6(5), Article e13895. 

Ajzen, I. (1985). From intentions to actions: A theory of planned behavior. In J. Kuhl & J. Beckmann (Eds.), Action control: From cognition to behavior (pp. 11–39). Springer, Berlin, Heidelberg. 

Bansal, H. S., Irving, P. G., & Taylor, S. F. (2004). A three-component model of customer to service providers. Journal of the Academy of Marketing Science32(3), 234–250.

Barczewski, B. M. (2023). Chevron Deference: A Primer. Congressional Research Service.

Bastien, J. C. (2010). Usability testing: A review of some methodological and technical aspects of the method. International Journal of Medical Informatics79(4), e18–e23.

Beaudin, C. L., & Pelletier, L. R. (1996). Consumer-based research: Using focus groups as a method for evaluating quality of care. Journal of Nursing Care Quality10(3), 28–33.

Bitrián, P., Buil, I., & Catalán, S. (2021). Enhancing user engagement: The role of gamification in mobile apps. Journal of Business Research132, 170–185.

Center on the Developing Child, Harvard University. (2023). IDEAS Impact Framework.

Calder, B. J., Isaac, M. S., & Malthouse, E. C. (2016). How to capture consumer experiences: A context-specific approach to measuring engagement: Predicting consumer behavior across qualitatively different experiences. Journal of Advertising Research56(1), 39–52.

Charry, K., Poncin, I., Kullak, A., & Hollebeek, L. D. (2023). Gamification’s role in fostering user engagement with healthy food‐based digital content. Psychology & Marketing, 41(1), 69–85.

Chen, H. T. (1990). Theory-driven evaluations. Sage.

Chen, X., & Li, S. (2017). Understanding continuance intention of mobile payment services: An empirical study. Journal of Computer Information Systems57(4), 287–298.

Chevron U.S.A. Inc. v. Natural Resources Defense Council, 467 U.S. 837 (1984).

Chiappetta, A. (2020). Designing effective user onboarding experiences for mobile applications [Unpublished master’s thesis]. POLITECNICO.

Coggan, A., van Grieken, M., Boullier, A., & Jardi, X. (2015). Private transaction costs of participation in water quality improvement programs for Australia's Great Barrier Reef: Extent, causes and policy implications. Australian Journal of Agricultural and Resource Economics59(4), 499–517.

Cordelia, A. (2006). Transaction costs and information systems: Does IT add up? Journal of Information Technology21(3), 195–202. (2024). Open government.

Davis, F. D. (1985). A technology acceptance model for empirically testing new end-user information systems: Theory and results [Unpublished doctoral dissertation]. Massachusetts Institute of Technology.

Exec. Order No. 14035, 2021, 3 C. F. R. 34593 (2021).

Foundations for Evidence-Based Policymaking Act of 2018, Pub. L. No. 115-435, 132 Stat. 5529 (2019).

Franklin, C., & Bargagliotti, A. (2020). Introducing GAISE II: A guideline for precollege statistics and data science education. Harvard Data Science Review2(4).

Friedrich, J., Becker, M., Kramer, F., Wirth, M., & Schneider, M. (2020). Incentive design and gamification for knowledge management. Journal of Business Research106, 341–352.

Granić, A., & Marangunić, N. (2019). Technology acceptance model in educational context: A systematic literature review. British Journal of Educational Technology50(5), 2572–2593. 

Grønhaug, K., & Gilly, M. C. (1991). A transaction cost approach to consumer dissatisfaction and complaint actions. Journal of Economic Psychology12(1), 165–183. 

Grossman, S. J., & Hart, O. D. (1986). The costs and benefits of ownership: A theory of vertical and lateral integration. Journal of Political Economy94(4), 691–719.

Grover, P., & Kar, A. K. (2020). User engagement for mobile payment service providers – introducing the social media engagement model. Journal of Retailing and Consumer Services53, Article 101718.

Gu, Z., Bapna, R., Chan, J., & Gupta, A. (2022). Measuring the impact of crowdsourcing features on mobile app user engagement and retention: A randomized field experiment. Management Science68(2), 1297–1329.

Hagger, M. S., Cameron, L. D., Hamilton, K., Hankonen, N., & Lintunen, T. (Eds.). (2020). The handbook of behavior change. Cambridge University Press. 

Heiman, A., Ferguson, J., & Zilberman, D. (2020). Marketing and technology adoption and diffusion. Applied Economic Perspectives and Policy42(1), 21–30. 

Hermawan, J. H., & Tjhin, V. U. (2023). The effect of gamification on customer engagement in e-commerce. Journal of Theoretical and Applied Information Technology101(19).

Hsieh, G., Kraut, R. E., & Hudson, S. E. (2010). Why pay? Exploring how financial incentives are used for question & answer. In E. Mynatt (Ed.), Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 305–314). ACM.

Hung, S. Y., Durcikova, A., Lai, H. M., & Lin, W. M. (2011). The influence of intrinsic and extrinsic motivation on individuals' knowledge sharing behavior. International Journal of Human-Computer Studies69(6), 415–427.

Kankanhalli, A., Tan, B. C., & Wei, K. K. (2005). Contributing knowledge to electronic knowledge repositories: An empirical investigation. MIS Quarterly, 29(1), 113–143.

Kierkegaard, E. (2021). Optimizing the sign-up flow for a fintech company using Google Analytics, Hotjar and A/B testing [Degree project]. KTH Royal Institute of Technology.

Kim, Y. H., Kim, D. J., & Wachter, K. (2013). A study of mobile user engagement (MoEN): Engagement motivations, perceived value, satisfaction, and continued engagement intention. Decision Support Systems56, 361–370.

Koivisto, J., & Hamari, J. (2019). The rise of motivational information systems: A review of gamification research. International Journal of Information Management45, 191–210.

Kuang, L., Huang, N., Hong, Y., & Yan, Z. (2019). Spillover effects of financial incentives on non-incentivized user engagement: Evidence from an online knowledge exchange platform. Journal of Management Information Systems36(1), 289–320. 

Kumar, V., Aksoy, L., Donkers, B., Venkatesan, R., Wiesel, T., & Tillmanns, S. (2010). Undervalued or overvalued customers: Capturing total customer engagement value. Journal of Service Research13(3), 297–310.

Kumar, V., Bhaskaran, V., Mirchandani, R., & Shah, M. (2013). Practice prize winner—creating a measurable social media marketing strategy: Increasing the value and ROI of intangibles and tangibles for hokey pokey. Marketing Science32(2), 194–212.

Lane, J. (2024). An invisible hand for creating public value from data. Harvard Data Science Review, (Special Issue 4).

Lane, J., Gimeno, E., Zhang, Z., & Zigoni, A. (2022). Data inventories for the modern age? Using data science to open government data. Harvard Data Science Review, 4(2).

Lane, J., Mulvany, I., & Nathan, P. (2020). Rich search and discovery for research datasets: Building the next generation of scholarly infrastructure. Sage.

Legal Information Institute. (2022). Chevron deference. Cornell Law School. Retrieved from

Liang, T. P., & Huang, J. S. (1998). An empirical study on consumer acceptance of products in electronic markets: A transaction cost model. Decision Support Systems24(1), 29–43. 

Loper Bright Enterprises v. Raimondo. (2022). Supreme Court pending docket No. 22-451.

Loveday, J. (2019). The Sloan Digital Sky Survey. ArXiv.

Marangunić, N., & Granić, A. (2015). Technology acceptance model: A literature review from 1986 to 2013. Universal Access in the Information Society14, 81–95.

Martinez, W., & LaLonde, D. (2020). Data science for everyone starts in kindergarten: Strategies and initiatives from the American Statistical Association. Harvard Data Science Review2(3).

Mathieson, K. (1991). Predicting user intentions: comparing the technology acceptance model with the theory of planned behavior. Information Systems Research, 2(3), 173–191.

Meng, X. L. (2020). Reproducibility, replicability, and reliability. Harvard Data Science Review2(4).

Morgan, D. L., Krueger, R. A., & King, J. A. (1998). The focus group guidebook. Sage.

National AI Research Resource Task Force. (2023). Strengthening and democratizing the U.S. artificial intelligence innovation ecosystem: An implementation plan for a national artificial intelligence research resource. Retrieved December 11, 2023, from

National Academies of Sciences, Engineering, and Medicine. (2018). Open science by design: Realizing a vision for 21st century research. The National Academies Press. 

Peters, D. A. (1993). Improving quality requires consumer input: Using focus groups. Journal of Nursing Care Quality7(2), 34–41.

Potok, N. A. (2022). Show US the data. Harvard Data Science Review, 4(2).

Potok, N. (2023). Continuing implementation of the Foundations for Evidence-Based Policymaking Act of 2018: Who is using the data? Harvard Data Science Review, 5(4).

Rabiee, F. (2004). Focus-group interview and data analysis. Proceedings of the Nutrition Society63(4), 655–660.

Relentless v. Department of Commerce. (2024). Supreme Court pending docket No. 22-1219. https://

Rindfleisch, A., & Heide, J. B. (1997). Transaction cost analysis: Past, present, and future applications. Journal of Marketing61(4), 30–54.

Riordan, M. H., & Williamson, O. E. (1985). Asset specificity and economic organization. International Journal of Industrial Organization3(4), 365–378. 

Rossmann, A., Ranjan, K. R., & Sugathan, P. (2016). Drivers of user engagement in eWoM communication. Journal of Services Marketing30(5), 541–553.

Ryu, G., & Feick, L. (2007). A penny for your thoughts: Referral reward programs and referral likelihood. Journal of Marketing71(1), 84–94.

Sarin, P., Kar, A. K., & Ilavarasan, V. P. (2021). Exploring engagement among mobile app developers – Insights from mining big data in user generated content. Journal of Advances in Management Research18(4), 585–608.

Shareef, M. A., Mukerji, B., Dwivedi, Y. K., Rana, N. P., & Islam, R. (2019). Social media marketing: Comparative effect of advertisement sources. Journal of Retailing and Consumer Services46, 58–69.

Shelanski, H. A., & Klein, P. G. (1995). Empirical research in transaction cost economics: A review and assessment. The Journal of Law, Economics, and Organization11(2), 335–361. 

Singh, V., Gray, J., Thakar, A. R., Szalay, A. S., Raddick, J., Boroski, B., Lebedeva, S., & Yanny, B. (2006). SkyServer Traffic Report – The first five years. Microsoft Technical Report No. MSR-TR-2006-190.

Sloan Digital SkyServer. (2024). SkyServer: Explore the universe with the Sloan Digital SkyServer. Retrieved February 6, 2024, from

Szalay, A. S. (2018). From SkyServer to SciServer. The ANNALS of the American Academy of Political and Social Science675(1), 202–220. 

Szalay, A. S. (2023). Data-driven science in the era of AI: From patterns to practice. In A. Choudhary, G. Fox, & T. Hey (Eds.), Artificial intelligence for science: A deep learning revolution (pp. 29–52). World Scientific. 

Szalay, A. S., Kunszt, P. Z., Thakar, A., Gray, J., Slutz, D., & Brunner, R. J. (2000). Designing and mining multi-terabyte astronomy archives: The Sloan Digital Sky Survey. ACM SIGMOD Record29(2), 451–462. 

Taplin, D. H., & Clark, H. (2012). Theory of change basics: A primer on theory of change. Actknowledge.

U.S. Department of Agriculture. (2023). USDA / ERS & NASS Workshop on Data Usage Statistics Report. Internal report: unpublished.

Van Doorn, J., Lemon, K. N., Mittal, V., Nass, S., Pick, D., Pirner, P., & Verhoef, P. C. (2010). Customer engagement behavior: Theoretical foundations and research directions. Journal of Service Research13(3), 253–266.

Vatanasombut, B., Stylianou, A. C., & Igbaria, M. (2004). How to retain online customers. Communications of the ACM47(6), 64–70.

Wernerson, N., & Carlsson, E. S. (2019). Increasing user engagement and feature discoverability through user onboarding for business-to-business [Unpublished master’s thesis, Lund University].

Williamson, O. E. (1979). Transaction-cost economics: The governance of contractual relations. The Journal of Law and Economics22(2), 233–261. 

Williamson, O. E. (1981). The economics of organization: The transaction cost approach. American Journal of Sociology87(3), 548–577. 

Williamson, O. E. (1989). Transaction cost economics. In R. Schmalensee & R. Willig (Eds.), Handbook of industrial organization (Vol. 1, 135–182). North-Holland. 

Williamson, O. E. (2008). Outsourcing: Transaction cost economics and supply chain management. Journal of Supply Chain Management44(2), 5–16. 

Wolff, A., Gooch, D., Montaner, J. J. C., Rashid, U., & Kortuem, G. (2016). Creating an understanding of data literacy for a data-driven society. The Journal of Community Informatics12(3).

Yarkoni, T., Eckles, D., Heathers, J. A. J., Levenstein, M. C., Smaldino, P. E., & Lane, J. (2021). Enhancing and accelerating social science via automation: Challenges and opportunities. Harvard Data Science Review, 3(2).

©2024 Lauren Chenarides. This article is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the article.

No comments here
Why not start the discussion?