Skip to main content
SearchLoginLogin or Signup

A Value-Driven Approach to Building Data Infrastructures: The Example of the MidWest Collaborative

Published onJan 27, 2022
A Value-Driven Approach to Building Data Infrastructures: The Example of the MidWest Collaborative
·

Column Editor’s Note: In this Effective Policy Learning article, Jessica Cunningham, Executive Director of KYStats, Anna Hui, Director at the Missouri Department of Labor, Julia Lane, Director of the Coleridge Initiative, and George W. Putnam, Director of Labor Market Information at the Illinois Department of Employment Security, explain how a regional collaborative of states in the Midwest have joined forces to share education, training, and workforce data across states lines. Overcoming numerous legal and privacy hurdles, the states have established an on-going governance structure using a secure platform to greatly increase the value of their data to policy makers. In the process, they have built state capacity to continue to support evidence-based policymaking. The multi-state products that have been developed, along with the on-going training in data linkages have served the states particularly well as they have had to respond quickly to the need for accurate and timely data during the COVID pandemic.

Keywords: state employment data, MidWest Collaborative, state pandemic response, Administrative Data Research Facility, education and training data, evidence-based policy, state driven data products


The ‘evidence-based policy movement’ has created a great deal of interest in how to make better use of administrative government data across the world (Elias, 2018; Head, 2016). In the case of the United Sates, the passage of Public Law 115-435, known as the Foundations for Evidence-Based Policymaking Act (2018) or the ‘Evidence Act’ led to substantive changes in the ways in which federal data can be accessed and used. Many of these data are generated by state and local governments, who can use them to make better investments in education, training, health care, and criminal justice programs. Yet because the data contain confidential information on individuals participating in government programs, they have historically been difficult to access for informing policy. In particular, even though individuals live, go to school, and work in regional economies and societies, state data end at state lines. The result is data gaps that create substantial blind spots in producing high-quality state-level evidence to inform policy.

In this article we describe a new approach to building data infrastructures driven by state and local needs. We spell out how a collaborative effort among a subset of states managed to close data gaps by establishing a regional initiative grounded in producing value and focused on using evidence to improve education, training, and workforce outcomes. We hope that the lessons learned can help inform other entities seeking to share data, whether they be cities, counties, or nations. We describe each of the steps necessary to forming a collaborative as follows.

First, it was necessary to establish a technical environment within which confidential data could be hosted and that could meet the legal and privacy requirements to sharing data. The collaborative did so by making use of new technologies—specifically a secure cloud-computing based platform and a data stewardship application that tracks access to, and use of, confidential data. Many of the states had already used the platform to link their own state-level education and workforce data to inform evidence-based policy-making and programmatic investments (Kreuter et al., 2019). The infrastructure meant that it was easier to leverage both state governance structures and cross-agency data-sharing agreements to develop a state-owned and administered data environment in the Administrative Data Research Facility (ADRF). Each state could link their hashed data in the cloud with other states for agreed-upon projects, control access to and use of the data through the data stewardship application, and build the capacity of state analysts to answer such critical policy questions.

Second, it was necessary to build the capacity of agency staff to work with the data in that secure environment through training programs (Kreuter et al., 2019). The approach taken was to establish applied data analytics training programs so that state agency staff could work together to define common data models, improve measures, and apply advanced methodologies to describe and capture cross-state flows.

Third, data ecosystems form because value is created. Many efforts to develop secure and collaborative environments to share data fail because they don’t establish value quickly and from the bottom up, driven by the needs of the community. The approach taken was to establish two very successful pilots that made the case for rooting of evidence-based practices at local levels and establishing the rationale for collaboration. The first example described in this article shows the development of a tool that enables departments of education and labor to share data and quickly answer questions like ‘how many graduates in a specific area of study found work in another state.’ The second example describes a tool that is being used to provide timely local information about job loss and reemployment resulting from the massive labor market shock engendered by COVID-19.

Each of these steps provided the rationale and impetus to establish a governance structure that can achieve key goals: facilitate interstate collaboration on data, define a state-led data analytics infrastructure, build production-level technical capacity, address privacy concerns, establish a professional development curriculum, develop processes for the collective use of data for research and evaluation, and inform and shape the national evidence strategy.

Context

The establishment of the secure environment and the training programs is described in more detail elsewhere (Kreuter et al., 2019; Lane, 2020). The collaboration described in this article—the MidWest Collaborative—grew out of a 2018 meeting that was intended to capitalize on both the environment and the training (Kreuter et al., 2019). That meeting identified an agenda to build evidence that could be used to understand the returns to different investments in education. The decision was made to pilot training programs in Missouri, Ohio, and Indiana, and assess the results.

The training programs were extremely successful: over 80 agency staff were trained, 20 different possible projects developed, and strong cross-state and cross-agency networks established. As a result, nine states and nearly 50 individuals from 20 organizations reconvened in early March 2020 (Lane, 2020). That meeting resulted in the design of a more formal infrastructure to use data and evidence to support the health of the region’s interconnected economies and societies. The resulting agenda identified the products and analyses that could be created through a cross-state data collaborative that made use of a secure environment. The core products emphasized workforce and education outcome measures, particularly student and worker in-flows and out-flows within and across the states in the collaborative. Two discussion groups—“Governance” and “Data and Data Models”—were created. And the MidWest Collaborative was formed.

A Secure Platform to Share Data

The foundation of the collaborative was a secure cloud-based environment—the Coleridge Initiative’s ADRF. The ADRF itself developed initially as a pilot to inform the decision-making of the Commission on Evidence-Based Policymaking (Hart, 2018), and then grew to be a platform used by multiple agencies. The ADRF was established using a federally approved and standardized approach for the certification of secure cloud-based platforms called FedRAMP (Arbuckle & Ritchie, 2019). That certification depends on an Authorization to Operate (ATO), which must be issued by at least one federal agency. In the case of the ADRF, the ATO was issued by the U.S. Census Bureau, the U.S. Department of Agriculture, and the National Science Foundation. The data are hosted in the Amazon Web Services GovCloud, which complies with a wide range of security requirements, such as the Department of Justice’s Criminal Justice Information Systems Security Policy, the U.S. International Traffic in Arms Regulations, the Export Administration Regulations, and the Department of Defense Cloud Computing Security Requirements Guide for Impact Levels 2, 4, and 5. 

The operational approach is to apply the “five safes” framework (“safe projects,” “safe people,” “safe settings,” “safe data,” “safe exports“) to data protection (Arbuckle & Ritchie, 2019). That framework provides automated and structured workflows, which allow data owners to manage approvals (“safe projects”). It is designed to automate researcher onboarding by streamlining their search for and discovery of relevant data and their approvals for access. It tracks data access and use by analysts and researchers through an automated data stewardship application (“safe people”). The FedRAMP approval is the safe settings. A collaboration with Amazon Web Services resulted in the development of a standalone Windows application to simplify the hashing of data elements—the transformation of direct identifiers like name and Social Security number to a meaningless string of letters and numbers—prior to being transmitted to the ADRF1 (“safe data”). That application is open source and is provided free of charge to any interested users. The export module automates the export review process (“safe output”).

The experience of the states in the MidWest Collaborative was that the Coleridge data stewardship model as implemented in the ADRF reduced administrative burden, lowered cost, and provided stewards with essential reports on data access and usage (“safe use”) (Coleridge Initiative, 2021). The engagement of the research community was critical in the design, particularly with respect to the data security and privacy protocols. For example, we found that it was important to continuously reinforce users’ understanding of the access rules, which was done by requiring users to answer security questions before they could log into the ADRF.

The Training Program to Build Workforce Capacity

The applied data analytics training classes were established as an innovation sandbox that use modular active learning techniques to train participants to use the complicated data. Both agency staff and university researchers work together to address agency questions; the resulting networked training program reflects the spirit of the university agricultural extension programs established by the 1862 Morrill Act (Nevins, 1962). The combined approach has been successful because it: (1) develops teams of practitioners who can demonstrate the value of the new types of data for solving real-world practical problems and (2) creates a pipeline of new prototype products for stakeholders.

While the classes are expensive to develop, philanthropic foundations and federal agencies support the development of the programs; the subsequent per-unit costs for class participants are low enough that they can be covered through professional development (training) funds. The class training materials are open source and collaborating universities can modify and customize them to deliver their own applied data analytics training class.

The Value of the Collaborative: Develop Common State-Driven Reports

Cross-State Education and Workforce Transitions

One of the first opportunities for states to produce cross-state examining education to workforce transitions was the Multi-State Postsecondary Report (MSPSR) (Council on Postsecondary Education, 2021). The product evolved out of the Coleridge Initiative training at the Ohio State University.2 A team of Kentucky Center for Statistics (KYSTATS) state analysts worked with Ohio data in that class to study education to workforce transitions. Upon their return, they partnered with Ohio State University, and Ohio’s state workforce and education agencies to develop a cross-state dashboard. Currently, the MSPSR allows the user to filter by the credential level, academic major group, state of origin, and postsecondary institution to show employment and wages both in- and out-of-state for 1-, 3-, and 5-years out by Kentucky and Ohio postsecondary graduates.

The value-add of the MSPSR for Kentucky can be conceptualized by comparing the 3-year wage outcomes for the Kentucky Postsecondary Feedback Report (PSFR) (https://kystats.ky.gov/Latest/PSFR). While methodologies differ slightly between the PSFR and MSPSR, the addition of Ohio data reduced the unknown employment outcomes by about 15% across all credential levels and academic major groups relative to those using only Kentucky data. Wage estimates will also be more exact for the subset of people who worked in the state of the postsecondary institution as well as another included state during a given employment timeframe. Institutions located on state borders (like Northern Kentucky University, which serves the greater Cincinnati area), hence, are more likely to see their graduates get jobs in Ohio, and are able to document the labor market outcomes of almost twice as many of their graduates as when only Kentucky data are used. This information is critical to institutions when attempting to determine the performance of programs, as envisioned by proposed legislation like the College Transparency Act3 and actual Kentucky legislation like the Right to Know Act.4 It also can help provide transparent information to students about credential opportunities linked to employment outcomes (Marken, 2021). As state trainings continue, states are collaborating on more granular common postsecondary measures, such as time to degree and failure to complete within set time periods. Improved employment metrics are also being developed that capture information on employment stability, starting earnings, and earnings growth. Finally, states are developing standard measures of the characteristics of firms that hire and employ their graduates so that they can describe the demand for different types of skills.

The initial pilot is now expanding to include other states participating in the training—notably Indiana and Tennessee—to evolve into a dynamic dashboard. Although each state will have some unique components, the process of building the dashboard encourages common data mapping, methodology discussions, and ultimately cross-state collaboration. New Jersey, Tennessee, Arkansas, and Texas are hosting state trainings similar to Kentucky and Ohio where each state is piloting the potential to develop their own postsecondary to workforce reports. The eventual goal is to develop a state-specific interactive dashboard built on aggregated and pre-redacted data, displaying broader, regional employment and wage outcomes for the state’s postsecondary completers. In each case, participating states in the collaborative will develop the state-specific code and documentation necessary to continue refining this work in the future, kickstarting the capacity for cross-state collaboration.

The Characteristics of Unemployed Workers and Their Transitions

State agencies faced an immediate need in March of 2020. That need—to provide an effective, data-based response to the COVID-19 pandemic—has been unabated since then. Many states, inundated with millions of unemployment insurance (UI) claims, did not have the capacity to translate the claims data on transactions to the need felt by claimants. Fortunately, the March 2020 meeting provided the basis for action. The MidWest Collaborative moved swiftly to develop an unemployment-to-reemployment portal (a UI portal) to inform policymakers (Coleridge Initiative, 2021) —in essence responding to an urgent U.S. Government Accountability Office COVID imperative even before the imperative was issued. The structure of the portal highlights weekly (timely), county-based (local), and actionable information on UI claimant composition and transitions.

Why was access to these data so critical? The need for state and local data was never greater. Yet survey data were not granular enough at the local level, for subpopulations, or timely enough. They only captured point-in-time data, not the experiences of people over time. The local workforce boards were faced with devising effective interventions for worker populations in a whole new world. The sheer volume was overwhelming—initial claims for unemployment per 1,000 population increased 16-fold from the March 2020 convening to April 2020.

The occupational composition of the unemployed changed drastically—to a population unused to job loss. Within some industries, new occupational strata of the unemployed emerged; in others, the concentration of layoff activity intensified in traditional occupational groups, and, as well, there was greater concentration and geographic dispersion of some occupations than others. And a whole new category of workers became eligible for unemployment benefits: independent contractors and self-employed individuals.5 Finally, the very definition of separation from the labor market became more confounded during the COVID-19 recession. Faced with the prospect of losing a significant portion of their trained workforce, employers responded with a variety of furlough plans to incentivize return to work among former employees. Temporary unemployment (individuals who have been given a return-to-work date or expect to return within 6 months) rose to nearly 80% during the initial stages of the pandemic, in contrast to the peak of less than 20% during the recessions of the past 30 years.

There were important differential impacts by race as well. The inflow and concentration of African American UI claimants during the pandemic highlights local data patterns that suggest the need for strategic intervention. Data showed that Whites represented 60% of all claimants prior to the COVID-19 restrictions and African Americans only 20%. The disparate racial impact of the crises is such that 14 months later, the percentage distribution is 49.5% (Whites) and 33.3% (African American). Remediation strategies require understanding the demographic, industry, and occupational composition of the unemployed to better align resource allocation with local need.

This confluence of labor market dynamics, notably resource crowding for remediation, new unemployed populations, and ambiguous job attachment, exacerbated the need among local workforce board administrators for timely, local, and actionable information to develop effective remediation strategies, particularly for important subpopulations. The UI portal has become an important tool for workforce boards to not only identify target populations for remediation, but also distinguish cohort-based spell behavior of these populations. Boards can anchor the receipt of reemployment services or training by a particular subgroup of targeted unemployed to a specific point in time based on the spell behavior of the constituent unemployed. This person-based unemployment framework also affords the advantage of longitudinal linkage with employment and earnings records to construct highly granular aggregate statistics. The augmentation of the targeted unemployed populations with their pre-separation and reemployed workforce outcomes enhances the evaluation methodology for effective unemployment intervention strategies.

The Governance Structure and Future Agenda

The MidWest Collaborative has four components in its governance leadership structure: a policy council, a data stewardship board, an administering organization, and a platform organization. State representatives from the policy council and stewardship board serve on the executive committee that exercises final approval on all policy recommendations and project proposals.

The administering organization and platform organization serve in a supportive, advisory role. The National Association of State Workforce Agencies (NASWA) currently serves as the administering organization, and the Coleridge Initiative as the platform organization. NASWA is a trusted, established organization with a record of successful interstate collaboration, and has the expertise to effectively develop and implement final governance arrangements. The Coleridge Initiative ADRF has all the components necessary to serve as the common data platform for the MidWest Collaborative, in which participating states deposit data on a common platform with a shared security boundary, strong data stewardship, and collaborative and analytic capabilities.

In sum, the success of the MidWest Collaborative has made it clear that the well-known challenges to building a regional agenda can be surmounted (Goerge, 2018). There is now the potential for creating value while protecting privacy by linking needed data across state lines. The evidence can be used to improve people’s lives —by improving services to low-income welfare recipients, developing effective training programs for formerly incarcerated individuals, and examining racial and social disparities in programmatic take and use. As other collaboratives in the South and East take shape, there is a very real opportunity to build a sound evidence basis grounded in local needs that informs national policies.


Disclosure Statement

Jessica Cunningham, Anna Hui, Julia Lane, and George Putnam have no financial or non-financial disclosures to share for this article.


References

Arbuckle, L., & F. Ritchie. (2019). The five safes of risk-based anonymization. IEEE Security & Privacy, 17(5), 84–89. https://doi.org/10.1109/MSEC.2019.2929282

Coleridge Initiative. (2021). ADRF user guide.

Council on Postsecondary Education. (2021, March 29). First-of-its-kind analytics tool helps colleges eliminate blind spots in jobs, salaries, track success. Northern Kentucky Tribune. https://www.nkytribune.com/2021/03/first-of-its-kind-analytics-tool-helps-colleges-eliminate-blind-spots-in-jobs-salaries-track-success/

Elias, P. (2018). The UK administrative data research network: Its genesis, progress, and future. The ANNALS of the American Academy of Political and Social Science, 675(1), 184–201. https://doi.org/10.1177/0002716217741279

Foundations for Evidence-Based Policy Making Act of 2018, Pub. L. No. 115–435, 132 Stat. 5529 (2018).

Goerge, R. M. (2018). Barriers to accessing state data and approaches to addressing them. The ANNALS of the American Academy of Political and Social Science, 675(1), 122–137. https://doi.org/10.1177/0002716217741257

Hart, N. (2018, February). Recommendations of the US Commission on Evidence-Based Policymaking. Paper presented at the 2018 AAAS Annual Meeting, Austin, TX. American Association for the Advancement of Science.

Head, B. W. (2016). Toward more “evidence‐informed” policy making? Public Administration Review, 76(3), 472–484. https://doi.org/10.1111/puar.12475

Kreuter, F., R. Ghani, & J. Lane. (2019). Change through data: A data analytics training program for government employees. Harvard Data Science Review, 1(2). https://doi.org/10.1162/99608f92.ed353ae3

Lane, J. (2020). Democratizing our data: A manifesto. MIT Press.

Marken, S. (2021). Ensuring a more equitable future: Exploring the relationship between wellbeing and postsecondary value. Postsecondary Value Commission.

Nevins, A. (1962). The origins of the land-grant colleges and state universities: A brief account of the Morrill Act of 1862 and its results. Civil War Centennial Commission.


©2022 Jessica Cunningham, Anna Hui, Julia Lane, and George Putnam. This article is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the article.

Comments
0
comment
No comments here
Why not start the discussion?