Skip to main content
SearchLoginLogin or Signup

Private Numbers in Public Policy: Census, Differential Privacy, and Redistricting

Published onJun 24, 2022
Private Numbers in Public Policy: Census, Differential Privacy, and Redistricting
·

Abstract

The 2020 Decennial Census in the United States was released with a new disclosure avoidance system in place, putting differential privacy in the spotlight for a wide range of data users. We consider several key applications of census data in redistricting, developing tools and demonstrations for practitioners who are concerned about the impacts of this new noising algorithm called TopDown. Based on a close look at nine localities in Texas and Arizona, we find reassuring evidence that TopDown did not threaten the ability to balance districts, describe their demographic composition accurately, or detect signals of racial polarization.

Keywords: Census, TopDown, differential privacy, redistricting, Voting Rights Act


1. Introduction

This article is the culmination of a multiyear collaboration studying the ways that a new technique going by the name of ‘differential privacy’ will change the census data used to draw electoral districts around the United States. In July 2021, one of us was invited to present on our findings to the Arizona Independent Redistricting Commission (IRC), which was preparing to do just that—redraw the districts in that state. In the current political climate, the IRC determined that presentations should be balanced, with one person on each side of the issue. Our presentation gave some reassuring numbers (discussed further here), such as the finding that the discrepancies introduced in Native American population counts in small rural districts were likely to be under five people. The other presentation was not short on drama. Over a still image of a napalm-dropping helicopter from Apocalypse Now, the other presenter informed the commission and the public that the best way to understand the new practices was that “the Bureau found that they had to destroy the data that we had in order to protect it” (Arizona Independent Redistricting Commission, 2021).

What is going on here? An academic intervention in a fairly obscure bureaucratic tabulation practice has raised alarms—and sometimes alarmism—from all sides.1 This merits a serious look at the new idea, the justification for the shift, and the actual impacts on our ability to draw districts that are legally suited for holding fair elections.

At the center is the new disclosure avoidance system at the U.S. Census. The 2020 Decennial Census releases were run through a randomized algorithm called TopDown to protect against increasingly feasible “reconstruction attacks” that can disclose respondents’ records in full (Abowd et al., 2022). (The threats to privacy are discussed in Section 2.3.) In the recent past (1990, 2000, and 2010), the system of disclosure avoidance was ad hoc record swapping—literally exchanging pairs of census records in order to hide some features of the data. Like swapping, TopDown introduces noise into the aggregated data with the goal that that any detailed information reconstructed from the aggregates will not be reliable. But unlike swapping, TopDown satisfies a precise mathematical formulation for disclosure avoidance called differential privacy (DP), which provides a controlled balance between privacy and accuracy (Dwork, McSherry et al., 2006).

Figure 1. Differentially private (DP) algorithms give users the choice in a gradated trade-off between accurate/vulnerable data and noisy/protected data. The ‘privacy-loss budget’ can also be allocated or targeted to particular kinds of information to be released. Among privacy specialists in computer science, ε ≤ 1 is seen as providing strong privacy guarantees. This figure is a schematic only.

This privacy–accuracy tradeoff is managed through the use of a parameter called ε, which is sometimes called the privacy-loss parameter or the privacy-loss budget. The definition of differential privacy requires that published data sets incorporate some randomness, and guarantees that any particular feature of the published data would be almost as likely—by a factor that is a function of ε—even if any single input (like one person’s census responses) is completely changed. DP provides strong guarantees when ε is a small number. For instance, ε ≈ 1 guarantees that a single record can not alter event probabilities by more than a factor of 3. For ε much larger than 1 (as in the production settings for TopDown20), the privacy guarantees become exponentially weaker, but conversely the data becomes highly accurate. When ε \to\infty, the noise and uncertainty vanish, leaving the original data perfectly intact; when ε \to 0, the algorithm gives pure noise with no fidelity to the input data. And this budget can be carefully allocated to put the greatest accuracy where it is deemed to be most needed—in this case, within the complicated geographical and demographic constructions that make up the Census data.

Redistricting is the process of dividing a polity (like a state) into territorial pieces in which elections will be conducted. The Census has a special release—named the PL 94-171 after the law that requires it—that reports the number of residents by race, ethnicity, and voting age in every small geographic unit, or census block, in the country (U.S. Census Bureau, 2017). A districting plan is then built out of these atoms by assigning each block to a district. The 2020 data was published in August 2021, and as we write (in early 2022), redistricting is complete or well underway for many thousands of districts across the country: not only have lines for U.S. congressional districts been redrawn in the last year, but also for state legislatures, county commissions, city councils, and more.

There are two major worries about the effects of TopDown on redistricting. First is that if the numbers themselves are noisy, then various detailed counts applied to redistricting plans may be unreliable. Second is the concern that statistical tests run on the noisy numbers could be undermined. In particular, when the outputs of TopDown are used for redistricting, we may worry:

Is the population really equal across the districts?

The context for this is “One Person, One Vote” case law, which calls for balancing population across the electoral districts in a jurisdiction, whether they are small, like city council districts, or large, like congressional districts. The (counterintuitive) common practice is to require the strictest balance for the largest districts: most states balance congressional districts to within one person based on decennial census counts (National Conference of State Legislatures, 2020).

Does a seemingly majority-minority district actually pass the 50% threshold?

The most reliable legal tool against gerrymandering for the last 50-plus years has been the Voting Rights Act of 1965 (VRA), which safeguards the opportunity for racial, ethnic, and language minority groups to elect candidates of choice (Voting Rights Act, 1965). In order to advance a VRA lawsuit, plaintiffs must pass some threshold tests, including the creation of demonstration districts that have over 50% minority population. Sometimes the plaintiffs cut the margins quite close to 50% in their demonstrative plans and defendants may push back on whether the share is really past the all-important threshold.

And finally:

Could the noise make it harder to detect patterns, such as racially polarized voting?

VRA plaintiffs must also demonstrate racially polarized voting (RPV)—that is, they must show that the minority group in their lawsuit votes cohesively, while the majority is also cohesive and blocks minority-preferred candidates from election. Noisy data might suffer from an attenuation bias, diminishing signals of patterns in the raw data.

We give experimental evidence addressing these questions below for districting at a range of scales and with varied demographics (Sections 4-6). Together with a discussion of how to understand what makes data ‘fit for use’ in redistricting (Sections 1.2 and 2.2), this lets us conclude that, at least in our particular case studies, the answers to these motivating questions are reassuring.

1.1. Overview of Methods and Questions for Study

The Census Bureau has made efforts toward transparency in the development of TopDown, making working code publicly available along with documentation and research papers describing the algorithm. Since the complexity of the algorithm makes it extremely difficult to study analytically, researchers may well desire to run it on realistic data and make empirical conclusions. However, person-level census data remain confidential for 72 years after collection, so recent detailed input data for TopDown is not public. Data users who would like to understand its impacts might be hesitant to choose between decades-old data (from the 1940s and 1950s) or a limited set of demonstration outputs released by the Bureau. In this article, we get around the empirical obstacle by use of reconstructed block-level 2010 microdata. We then run TopDown code and study the effects on the detailed data.

We investigate questions about the numerical discrepancies created by TopDown in units of census geography and so-called ‘off-spine’ aggregations like districts and precincts. In order to create a varied collection of districts to study, we use a Markov chain algorithm called recombination to get a large supply of examples.

With these tools—reconstructed microdata, the ability to run the TopDown code, and a supply of randomly generated districts—we can study questions about counts and statistics in various geographical settings. We include several case studies, expanding on the Dallas County case that was the main focus of our previous work (Cohen et al., 2021).

  • Dallas County, Texas. This county had roughly 2.4 million residents in the 2010 census. We divide it into four districts (each roughly 600,000) or 175 districts (each roughly 13,000) with various allocations of ε = 1 from the state level to the block level. We study counts, demographics, and estimates of voter preference.

  • Smaller localities in Texas. We then turn to Bell, Brazoria, Cameron, Galveston, and Nueces counties, in addition to the City of Galveston, to examine the effects of noising on polarization estimates. The localities are chosen to exhibit a range of sizes and demographic compositions.

  • Pima County and Navajo County, Arizona. Pima is a large county containing the city of Tucson, while Navajo is small and rural. We examine TopDown impacts on both, with a particular attention to Native American/American Indian population effects.

1.2. Overview of Findings

Our conclusions may be useful for several communities. We preview the findings here and explain our methods and standards in greater depth below.

For voting rights attorneys:

  • Census data that was noised with differential privacy remained ‘fit for use’ with respect to population counts: across all instances studied, the practical effects of TopDown on relevant counts at the district level are on a scale that does not materially threaten any intended uses we considered.

  • Weighted ecological regression and standard ecological inference were found to be quite stable when raw data is run through TopDown, so differential privacy did not undermine the detection of racial polarization.

For redistricting practitioners (line-drawers and litigation experts):

  • Districts that are drawn from larger geographic building blocks (drafted from larger units and tuned with smaller units) accumulated less error than districts drawn freely from the smallest building blocks.

  • Ecological regression should always be weighted (by population or cast votes). Unweighted ER was a source of silent error before the introduction of differential privacy.

For the census disclosure avoidance team:

  • Equal allocation of the privacy-loss budget across geographic levels, or slightly higher allocation at the lower levels of geography, is well suited to the needs related to redistricting.

To put these conclusions in context requires attention to the role that census data play in the law around redistricting. “One Person, One Vote”—the legal principle described above declaring that districts must be balanced—was instituted in the 1960s, an era when some proposed state plans had districts that had more than 10 times the population of other districts, a difference amounting to 900% of ideal district size. Sixty years later, the pendulum has swung the other way, and the one-person deviation found in most states is in the neighborhood of 0.0001% of ideal district size.

This premium on mathematical equality is maintained even though we are collectively well aware that Census’s complete enumeration of the population is far from perfect. (Error magnitudes will be discussed in more detail in Section 2.2.) The Supreme Court explicitly recognizes these imperfections in Karcher v. Daggett (1983):

Even if one cannot say with certainty that one district is larger than another merely because it has a higher census count, one can say with certainty that the district with a larger census count is more likely to be larger than the other district than it is to be smaller or the same size. That certainty is sufficient for decisionmaking.

This theme is extended and amplified in Georgia v. Ashcroft (2003):

When the decennial census numbers are released, States must redistrict to account for any changes or shifts in population. But before the new census, States operate under the legal fiction that even 10 years later, the plans are constitutionally apportioned.

This last point is helpful for understanding the role of accuracy in the usability of the data for redistricting. It is possible, and in fact frequent, that districting plans get redrawn quite late in a decennial cycle due to litigation: for instance, new plans were put in place in 2018 and 2019 in Pennsylvania and North Carolina. That late in the cycle, the count data that was meant to be maximally accurate on ‘Census Day’—in this case, April 1, 2010—will be badly out of date, and there will be much more current population estimates available, even from other census products. Still, the law prefers the decennial data, and even late-cycle plans are balanced on that basis. This makes it clear that the practice of zero-balancing is meant to be a constraint on the line-drawers and not a reflection of a ground truth about the districts.2

In essence, decennial census data has numerous desirable features: it is based on a door-to-door enumeration, vetted by statistical experts, and released by a putatively apolitical government bureaucracy. It is accurate enough to prevent the blatant abuses and imbalances that were often seen before the Reapportionment Revolution of the 1960s. These properties, and the law’s need for a manageable and solid standard, are enough to make it the “best population data available.”3 As such, sound policy will result from treating it as perfectly accurate for 10 years—a known ‘legal fiction.’ Below we will study the effects of the new privacy protections on census data; we find that, in particular, discrepancies due to differential privacy are considerably smaller than known errors from other sources, and any group-level skews that can be detected are far subtler than known undercount patterns. Courts of law have robustly confirmed that previously documented sources of inaccuracy do not undermine the legal fiction of precision. The effects of differential privacy—which are smaller, less systematically skewed, and do not undermine the utility of inference techniques—are less threatening than the previously identified issues. With this reasoning, we find that the use of TopDown keeps the data fit to be used as the gold standard for redistricting and voting rights applications.

2. Background on Census and Redistricting

2.1. The Structure of Census Data and the Redistricting Data Products

Every 10 years the U.S. Census Bureau initiates a comprehensive collection of person-level data—called microdata—from every household in the country. The microdata records are confidential, and are only published in aggregated tables subject to disclosure avoidance controls. The decennial census records information on the sex, age, race, and ethnicity of each member of each household, using categories set by the Office of Management and Budget (U.S. Census Bureau, 2012). The 2020 Census used six primary racial categories: White, Black, American Indian, Asian, Native Hawaiian/Pacific Islander, and Some Other Race. An individual can select these in any combination but must choose at least one, creating 261=632^6-1=63 possible choices of race. Separately, ethnicity is represented as a binary choice of Hispanic/Latino or not.

Census data is structured in a hierarchy of nesting geographic units covering the whole country. The standard hierarchy has six levels: nation—state—county—tract—block group—block. TopDown allocates the total privacy-loss budget ε > 0 across the levels of this central spine, then adds noise at each level in a differentially private way with a magnitude controlled by the budget at that level. The algorithm continues with a postprocessing step that leaves an output data set that is designed to be suitable for public use.

While TopDown operates on the central spine, there are many ‘off-spine’ units within census geography (like cities or tribal areas); additionally, many important off-spine geographies are built from census geography after release (like electoral districts). For redistricting, it is the effect on population estimates aggregated on these off-spine units that ultimately matters most.

Figure 2. A geographical hierarchy can be represented as a tree, shown here in a simple three-level example, with the whole geography at the top and the smallest units at the bottom. A district (or any off-spine geography) is defined as a collection of bottom-level vertices—in this case, the starred census blocks—that is not necessarily built out of whole pieces from higher levels.

The tabular data containing aggregated counts on the geographic units is then used in an enormous range of official capacities, from the apportionment of seats in the U.S. House of Representatives to the allocation of many streams of federal and state funding. The redistricting (PL 94-171) data includes several such tables: H1, a table of housing units whose types are occupied/vacant; and four tables of population, P1 (63 races), P2 (Hispanic, and 63 races of non-Hispanic population), and P3/P4 (same as P1/P2 but for voting age population). Each table can be thought of as a histogram, with each included type constituting one histogram bin. For instance, in table P1 there is one person in the t=t=White+Asian bin in the Middlesex County, MA, block numbered 31021002.

Treating the 2010 tables as accurate, it is easy to infer information not explicitly presented in the tables. For instance, the same bin in the P3 table (race for voting age population) also has a count of 1, implying that there are no White+Asian people under 18 years old in block 31021002. This is the beginning of a reconstruction process that would enable an attacker, in principle, to learn much of the person-level microdata behind the aggregate releases.

2.2. Redistricting and the Question of ‘Fitness for Use’

The PL 94-171 tables (also known as the redistricting data) are the authoritative source of population counts for the purposes of apportionment to the U.S. House of Representatives, and with a very small number of exceptions also for within-state legislative apportionment. The most famous use of these counts is to decide how many members of the 435-seat House of Representatives are assigned to each state. In “One Person, One Vote” case law initiated in the Reynolds v. Sims case of 1964, the balancing of census population is required not only for congressional districts within a state but also for districts that elect to a state legislature, a county commission, a city council or school board, and so on (Avery v. Midland County, 1968; Reynolds v. Sims, 1964; Wesberry v. Sanders 1964).

Today, the congressional districts within a state usually balance the official total population extremely tightly: each of Alabama’s seven congressional districts drawn after the 2010 Census had a population of either 682,819 or 682,820 according to official definitions of districts and the Table P1 count, while Massachusetts districts all had a population of 727,514 or 727,515.4 Astonishingly, though no official rule demands it, more than half of the states maintain this ‘zero-balancing’ practice (no more than one-person deviation) for congressional districts (National Conference of State Legislatures, 2020). (For districts below the congressional level, the usual standard is that the range of district populations be no more than 10% of ideal district size.) If disclosure avoidance practices introduce some systematic bias—say by creating significant net redistribution toward rural and away from urban areas—then it becomes hard to control overall malapportionment, which could in principle trigger constitutional scrutiny. In the end, redistricters may not care very much how many people live in a single census block, but it could be quite important to have good accuracy at the level of a district. Practitioners’ ingrained habit of zero-balancing districts to protect from the possibility of a malapportionment challenge is the first source of worry in the redistricting sphere—but it is important for those practitioners to remember that the numbers are only treated as perfect for this purpose, even though they are well known to have some systematic errors.

To understand this better, we can look at magnitudes of errors that have been known and tolerated. The Bureau publishes a Post-Enumeration Survey (PES) after each decennial census assessing the accuracy of the population count. According to the 2020 PES point estimates, the Black, Hispanic, and Native American (on reservation) populations were undercounted by 3.30%, 4.99%, and 5.64% respectively, while the non-Hispanic White alone population was overcounted by 1.64% (U.S. Census Bureau, 2022). The level of disparity varies from cycle to cycle, but the phenomenon is not anomalous, with 1990 having seen a particularly severe undercount in the same minority groups. Since one percent of a congressional district is over 7,000 people, this can easily lead to errors in the thousands, particularly in a majority-minority district. Separately, the Census Bureau used simulations to give a conservative estimate of the count inaccuracy due to nonsampling variability and coverage errors—two issues faced in the data collection process itself (U.S. Census Bureau, 2021c). These simulations give a mean absolute error across all counties of 117 people and 964 people, respectively, with errors an order of magnitude higher for counties with over one million people. Since off-spine entities likely accrue these errors at the same rate as counties of similar size, this suggests that these same sources will lead to congressional district errors in the thousands.

Errors from TopDown are much smaller at these scales. In our experiments using TopDown18 on Dallas County districts, for instance, absolute population deviations were usually in the hundreds (Figure 4). Considering the many ways that the production version TopDown20 improved over TopDown18, we expect the actual errors in such districts to be much smaller. Indeed, in the 2010 Census demonstration data using the production parameters (see Section 2.5), the mean absolute error from differential privacy across all counties was just 1.751.75 people. The middle 90% range was -4 to 4 people (U.S. Census Bureau, 2021c). Separately, Wright and Irimata (2021, Tables 7v-12v) report outputs of 25 runs of TopDown using the production parameters on districts of different sizes in Rhode Island and Mississippi. Of all the districts they examined, the largest root mean square error (RMSE) is 237. For most, the error is in the tens. Since the RMSE upper-bounds the mean absolute error, this too lends support to our rough estimate that discrepancies due to TopDown20 were likely less than previous known errors by two orders of magnitude.

The second major locus of concern for redistricting practitioners is the enforcement of the Voting Rights Act. Here, histogram data is used to estimate the share of voting age population held by members of minority racial and ethnic groups.5Voting rights litigants must start by satisfying three threshold tests, without which no suit can go forward.

  • Gingles 1: the first ‘Gingles factor’ in VRA liability is satisfied by creating a demonstration district where the minority group makes up over 50% of the voting age population. At the end of the day, VRA-compliant districts do not have to be majority-minority. But to initiate litigation, you must show that such districts could have been drawn.

  • Gingles 2–3: the voting patterns in the disputed area must display racial polarization. The minority population must be cohesive in its candidates of choice, while bloc voting by the majority prevents these candidates from being elected. In practice, several now-standard inference techniques are used to estimate voting preferences by race.

We give a quick overview of RPV methods for Gingles 2–3. When elections are conducted by secret ballot, it is fundamentally impossible to precisely determine voting patterns by race from the reported outcomes alone. The standard methods for estimating these patterns use the cast votes at the precinct level, combined with the demographics by precinct, to infer racial polarization. Because the general aggregate-to-individual inference problem is called “ecological” (cf. ecological paradox, ecological fallacy), the leading techniques are called ecological regression (ER) and ecological inference (EI).

ER is a simple linear regression, fitting a line to the data points determined by the precincts on a demographics-vs-votes plot. A high slope (positive or negative) indicates a likely strong difference in voting preferences, which is necessary to provide the Gingles 2–3 showings for a VRA lawsuit. EI is a more complicated process to describe, in which MCMC methods are used to learn parameters for a probability distribution on voting behavior matched to the observed data. We focus on ER here because it lends itself to easily interpretable pictures, but we also present EI findings and discuss the relationship between the two methods in practice.

As we write, in April 2022, the U.S. Supreme Court has taken up a redistricting case from Alabama, putting the VRA on notice. So it is highly possible that the VRA goalposts are about to move again, but we will probably not learn that fate until Summer 2023. For now, the ability to meet the Gingles tests remains of paramount importance for fair redistricting. Because the VRA has been a powerful tool against gerrymandering for over 50 years, we might worry that even where the raw data would clear the Gingles preconditions, the noised data will tend toward uniformity—blocking deserving plaintiffs from a cause of action.

2.3. Disclosure Avoidance and TopDown

Title 13 of the U.S. Code requires the Bureau to take measures to protect the privacy of respondents’ data (13 U.S.C. 9, 1976). As discussed above, the swapping techniques from the last several censuses are no longer considered adequate to protect against more sophisticated (but mathematically straightforward) data attacks that seek to reconstruct the individual microdata. The practical risks of reconstruction are amplified by the relative ease of pairing the anonymous person-level database with commercially available data to match names, phone numbers, and addresses with census responses to achieve reidentification. But, importantly, it is the reconstruction itself that is the impermissible disclosure from the point of view of the Census Bureau, because it is the individual records themselves that they are required to keep confidential. For example, suppose that a landlord rents out units under a lease that limits the number of occupants, but a family living in poverty exceeds that limit. If one apartment complex makes up an entire census block, then a block-level reconstruction could disclose the number of residents per household to the landlord. The lease violation could be discovered as a direct consequence of the census response. This is precisely why features like the size of particular households need to be disguised, while maintaining the ability of government, courts, researchers, and the public to make accurate aggregate inferences for a wide range of applications. And this illustrates that some press coverage has badly missed the mark on the goals and threats that motivate differential privacy; see, for instance, the March 2022 Slate article “Why the Census Invented Nine Fake People in One House: The Bureau’s Zealous Efforts to Protect Americans’ Privacy Are Doing Bizarre Things to Its Data” (Lee).

With the reconstruction threat in mind, the bureau has developed the TopDown algorithm (Abowd et al., 2022), which begins with a noising step that is differentially private. Like a fiscal budget, the privacy budget can be allocated until it is fully spent, in this case by spending parts of the budget on particular queries and on levels of the hierarchy. The data is altered to thwart the accuracy of reconstructions.

TopDown is an algorithm for modifying population counts, and counts by type, in every geographic unit (block, tract, etc.) in the census hierarchy. The bureau has released the code for two versions, which we call TopDown18 and TopDown20, with the latter taking user feedback into account and increasing accuracy in numerous ways to be described below. TopDown works by taking an individual-level table of census data and creating a modified data set that will be used in its place to generate the PL 94-171 tables. In other words, TopDown starts with a histogram having bins for each person-type (i.e., a combination of race, sex, ethnicity, etc.) and outputs an altered version of the same histogram. We sketch the noising and postprocessing here. See Cohen et al. (2021) for more detail.

The algorithm proceeds in two stages. First, it noises the raw histogram counts Craw\mathsf{C}_{\mathsf{raw}}, adding random numbers chosen to ensure the required level of differential privacy (according to the budget ε) to produce new counts Cnoised\mathsf{C}_{\mathsf{noised}}. The choices are governed by a collection of histogram count queries called the workload. For each bin of each histogram in the workload and for each unit of the geographic hierarchy, TopDown adds random draws from different probability distributions (specifically, TopDown18 adds Geometric/Laplace noise; TopDown20 adds discrete Gaussian noise).

Second, TopDown executes postprocessing on the noisy histograms to satisfy a list of additional plausibility constraints. This amounts to solving a constrained optimization problem to find adjusted outputs Cfinal\mathsf{C}_{\mathsf{final}} that are close to Cnoised\mathsf{C}_{\mathsf{noised}} while enforcing other features. (If an exact solution cannot be found, a sophisticated secondary algorithm finds an approximate solution.) Among other things, postprocessing ensures that the resulting histograms contain only nonnegative integers, are hierarchically consistent, and agree with the raw input data on a handful of invariants.6 We will refer to the final data as processed, so that the overall trajectory is from raw to noisy to processed counts and data tables. TopDown is named after the iterative approach to postprocessing: one geographic level at a time, starting at the top (nation) and working down to the bottom (blocks).

Due to the complexity of the algorithm and the data, the overall guarantees of TopDown are poorly understood. The work in this article attempts to shed some light on the impacts through empirical study.

2.4. 2020 Updates to TopDown

The TopDown algorithm was under active development for several years, with the final version published in September 2021 just after the data itself was released in August. The most significant updates in TopDown20 with respect to TopDown18 are described here.

Discrete Gaussian noise. TopDown20 uses discrete Gaussian noise instead of Geometric (Laplace) noise, as in TopDown18. Qualitatively, discrete Gaussian noise has thinner tails—large deviations from the mean are less likely. The switch requires a different method of accounting the privacy loss, called approximate or (zero-)concentrated differential privacy (Bun & Steinke, 2016; Dwork, Kenthapadi et al., 2006). Technically, concentrated DP uses a different parameter ρ\rho, which was set to 2.56 for the PL data release (as announced in June 2021 [U.S. Census Bureau, 2021b]). To interpret this as an ε as described above requires another tuneable parameter δ\delta. (Very roughly speaking, δ\delta bounds the probability of failure for the ε privacy guarantee.) The value ρ\rho = 2.56 provides a family of valid ((ε, δ)\delta) pairs. The Census Bureau reported the ε corresponding to δ\delta = 10-10.

Modified spine. Instead of the usual central spine, TopDown20 uses a different hierarchy for noising (while leaving the reporting units used for the final data products unchanged). The modified hierarchy splits off American Indian and Alaska Native (AIAN) areas and also creates state-specific ‘optimized block groups’ in order to bring important geographies—Native American areas as well as municipalities and sometimes townships, depending on the state—closer to the spine. This is done so that the numbers for these geographies are kept more accurate in the application of noise.

Privacy-loss budget and split. Fixing δ\delta = 10-10, the production parameter corresponds to ε = 17.14 for the persons file, plus an additional ε = 2.47 for the housing units file, totaling ε = 19.61 (U.S. Census Bureau, 2021b). This is far higher than the values in earlier demonstration data products created by Census. The budget was divided among the geographic levels, with each getting a fraction of the total: a 104/4,099 fraction for the nation; 1,440/4,099 for states; 447/4,099 for counties; 687/4,099 for tracts; 1,256/4,099 for optimized block groups; and 165/4,099 for blocks. Most experiments below use an equal split of ε = 1 over the levels from state to block, except where noted.

Multi-pass postprocessing. TopDown20 performs postprocessing in multiple stages, solving for different attributes in different passes. For example, the PL 94-171 data first solves for the total population within a geo-unit, then solves for the remaining attributes for that geo-unit while treating the total population as fixed. According to the bureau, this helps mitigate the effects of sparsity, including the statistical bias introduced by nonnegativity (U.S. Census Bureau, 2021a).

As we will see below, TopDown18 at ε = 1 already gives quite reliable outputs, even better than the performance of a simplified hierarchical model that we introduce for comparison purposes. Each change described here improves the accuracy of the data outputs for TopDown20 relative to TopDown18. Thus the discrepancies introduced into the real census data were likely far smaller than the already modest discrepancies in the experiments here.

2.5. Related Work and Materials

We briefly survey the empirical work on TopDown released by the Census Bureau and by outside researchers. This includes an influential paper by Petti and Flaxman (2020) as well as more recent papers by Kenny et al. (2021), two by Santos-Lozada and coauthors (Mueller & Santos-Lozada, 2022; Santos-Lozada et al., 2020), and a presentation by McDonald (2019). In addition, there have been several white papers and workshop proceedings involving civil rights organizations and NGOs, including IPUMS National Historical Geographic Information System (2021), Mexican American Legal Defense and Educational Fund & Asian Americans Advancing Justice (2021), National Academies of Sciences, Engineering, and Medicine (n.d.) and National Conference of State Legislatures (2021); litigation filings such as Alabama v. Department of Commerce (2021); and analyses from within the Census Bureau (e.g., Abowd et al. [2022]; Abowd et al. [2021]; Wright & Irimata [2021]) and by an independent scientific advisory group (JASON, 2022).

Since 2019, the Census Bureau has periodically published demonstration data created by applying the then-current version of TopDown to 2010 Decennial Census responses and tabulating the results (IPUMS National Historical Geographic Information System, 2021). Most versions of the demonstration data used a total privacy-loss budget of ε = 4 with various allocations across the geographic hierarchy and query workload, though later demonstration data used ε = 4.5, 12.2, and 19.61 (the final PL 94-171 parameter). Except for our previous work (Cohen et al., 2021) and publications from the Disclosure Avoidance team at the Census Bureau, prior analyses of TopDown have relied on these demonstration data products. These analyses are then limited by the choices made in the data release; they can not vary the parameters of TopDown, execute additional runs, or layer in properties of the complex algorithm to isolate the impactful features.

Multiple groups—including civil rights groups (Mexican American Legal Defense and Educational Fund & Asian Americans Advancing Justice, 2021) and the state of Alabama in its lawsuit over the use of differential privacy (Alabama v. Department of Commerce, 2021)—have called for analyses of how TopDown will affect measurements of racially polarized voting (RPV) and funding formulas for government programs. To the best of our knowledge, only one other author carries out RPV analysis with TopDown data (McDonald, 2019). McDonald performs ecological regression and ecological inference in two contests using a run of TopDown from the demonstration data. In these examples, he finds that TopDown changes the numerical estimates by between 0 and 6 percentage points, usually by less than 1%. The current work significantly extends these findings. A paper by (Pujol et al., 2020) considers privacy impacts on funding formulas.

Perhaps the most thought-provoking finding from the prior literature is the observation by Petti and Flaxman that the errors introduced in demonstration TopDown outputs have a positive bias in areas where there are small values in the raw counts (at least in some population categories), and a negative bias in other areas (Executive Committee of the Legislative Council, Colorado General Assembly, 2020; Petti & Flaxman, 2020). They characterize this as a tendency for TopDown to inflate counts in demographically homogeneous areas (where some histogram bins are empty) and to reduce counts in more racially and ethnically diverse areas. Petti and Flaxman note that the biases arise from the combination of nonnegativity and hierarchical consistency constraints in the postprocessing stage of TopDown. That is, nonnegativity requires that negative-valued noise cannot be added if the raw counts is zero—thus, very low counts will tend to increase. As a result of hierarchical consistency, the tendency to increase low counts (sometimes informally called ‘bouncing off zero’) must be balanced by a tendency to decrease relatively large counts. This seems more persuasive to us than an alternative hypothesis advanced in (Kenny et al., 2021) that the choice of census accuracy targets is responsible for the phenomenon. We address worries about systematic changes to minority populations in several experiments below.

The central findings of several of the most critical studies reflect the fact that large geographies like counties and districts have lower relative count errors (i.e., smaller discrepancy as a share of population) than small geographies (Kenny et al., 2021; Mueller & Santos-Lozada, 2022; Santos-Lozada et al., 2020). Still, on the basis of these discrepancies, these papers conclude that noise introduced by TopDown may undermine the validity of existing techniques for using census data, so that the outputs may not be fit for use. Empirical findings in the present article, together with the context we provide for both the magnitudes of count discrepancies, offer an alternative perspective. When data practices seem very sensitive to small absolute discrepancies, those practices themselves are sometimes worth revisiting. As in the case of ecological regression (Section 4.2), we are optimistic that simple—and overdue—changes to techniques and practices will often suffice for statistically valid uses. That has been the case for each use case that we considered.

In one strongly worded criticism, Kenny et al. (2021) argue that TopDown “makes it impossible to follow the principle of One Person, One Vote, as it is currently interpreted by courts and policy-makers.” We disagree: line-drawers can continue to treat census data as exact, as they have done for the entire tenure of the U.S. Census. Kenny et al.’s reading of the case law conflates the responsibility of line-drawers with the responsibility of the Census Bureau. Notably, we find no contradictions between our respective quantitative results about error magnitudes due to TopDown. The core difference is that Kenny et al. (2021) effectively treat prior decennial census releases as being free of any error. This ignores the effects of swapping, whose impacts we are unable to even estimate, as well as all of the other documented sources of error known to courts for many years. This error-free viewpoint also affects the choice of experiments: a number of the paper’s analyses are particularly sensitive to small perturbations (Kenny et al., 2021, Figures 3, 4, 7), such as by using a bright-line notion of ‘invalid’ population deviation that can be triggered by even a tiny count discrepancy on districts that were close to the threshold. Later, the authors consider an inference technique called BISG, or Bayesian Improved Surname Geocoding, which is used in the measurement of racially polarized voting. As with our RPV investigations, they find that this technique is not impeded by differential privacy. But strikingly, rather than framing this as a reassuring finding, they argue that successful inference of race—using not only census data, but individuals’ names and locations—is itself an impermissible disclosure. As others have pointed out, this argument misunderstands the goals of statistical disclosure limitation ("Statistical Inference Is Not a Privacy Violation," 2021). In particular, the accuracy of BISG for determining an individual’s race does not depend at all on whether they even responded to the census, so it has no bearing on the privacy of an individual record.

3. Methods

In the work reported below, we repeatedly ran TopDown18 and a simplified hierarchical model in various configurations on reconstructed person-level data sets created by applying a reconstruction technique to the block-level data from the 2010 Census. Our collaborators Mark Hansen and Denis Kazakov provided reconstructed microdata for Texas containing block-level sex, age, ethnicity, and race information consistent with a collection of tables from 2010 Census Summary File 1, and we later extended this to examples in Arizona with an independent reconstruction.

3.1. Simplified Hierarchical Model

In our previous work (Cohen et al., 2021), we developed a simplified model that has some of the key structural features of TopDown but is simple enough to be amenable to exact mathematical analysis. As a ‘toy model’ for studying TopDown, we named it ToyDown.

The simplified model works by using the real census geographical hierarchy in a manner that is easy to describe: at each level, random noise values are added to counts of persons by type (drawn from a Laplace distribution parametrized by the ε allocation at that level). Then, working from top to bottom, the noisy counts Cnoised\mathsf{C}_{\mathsf{noised}} are replaced with the closest real numbers satisfying hierarchical consistency (and nonnegativity, if we choose to impose it) under the distance function given by mean squared error. (The postprocessing step can be thought of as closest-point projection to a polyhedral feasible region cut out by the equalities and inequalities that are enforced.) This is simple enough that solutions can often be obtained symbolically.

3.2. Budget Splits

We executed multiple runs of TopDown18 and the simplified model with a range of different allocations of the privacy budget across the five subnation levels of the census geographic hierarchy: ε=εstate+εcounty+εtract+εbg+εblock\varepsilon= \varepsilon_{\text{state}}+\varepsilon_{\text{county}}+\varepsilon_{\text{tract}}+\varepsilon_{\text{bg}}+\varepsilon_{\text{block}}. The allocations consist of five different splits across the levels (shown in Table 1) for various levels of ε. Note that TopDown operates on the six-level census hierarchy and requires specifying εnation\varepsilon_{\text{nation}}. To conform with those needed inputs, our experiments ran TopDown18 with εnation=10ε\varepsilon_{\text{nation}} = 10 - \varepsilon, fixing εtotal=10\varepsilon_{\text{total}} = 10. Because the nation-level budget is so much higher than the lower level budgets, its effect is extremely small and we omit further discussion of it.

Table 1. Designated budget splits used in the noising runs below, each with a budget of εnation\varepsilon_\text{nation} = 9 on the nation and a total of 1 allocated below the national level.

state
εstate\varepsilon_{\text{state}}

county
εcounty\varepsilon_{\text{county}}

tract
εtract\varepsilon_{\text{tract}}

BG
εbg\varepsilon_{\text{bg}}

block
εblock\varepsilon_{\text{block}}

Split name

equal

0.2

0.2

0.2

0.2

0.2

state-heavy

0.5

0.25

0.083

0.083

0.083

tract-heavy

0.083

0.167

0.5

0.167

0.083

BG-heavy

0.083

0.083

0.167

0.5

0.167

block-heavy

0.083

0.083

0.083

0.25

0.5

The workload used here was modeled after the workload used in the 2018 End-to-End test release. It used the ‘detailed’ histogram and the ‘voting-age by ethnicity by race’ histogram, allocated 10% and 90% of the budget respectively. We omit household invariants, and our workload omits additional household queries that are used in Census’s demonstration data products. Our configuration files, code, and data for all runs are available in this article’s accompanying repository (Matthews et al., 2021).

3.3. District Generation

At its core, TopDown introduces block-level noise in order to protect privacy. In principle, small block-level discrepancies could aggregate to large discrepancies for off-spine districts, depending on the degree of cancellation or correlation. We use an ensemble of randomly generated districts to understand the effects of off-spine aggregation.

In particular, we employ the Markov chain sampling algorithm called recombination (or ReCom), which iteratively fuses and randomly repartitions pairs of neighboring districts to create large collections of alternative configurations (DeFord et al., 2021).

Recombination can be performed at the level of various building blocks. In this article we consider both random districts built from large pieces (census tracts), medium pieces (block groups), and the smallest possible pieces (census blocks). Typical tract population is in the 1,000-8,000 person range, while many blocks have zero population and typical blocks have up to a few hundred residents. In several experiments, we compare random districts built from the largest units that fit the district scale to random districts built from the smallest available atoms.

Figure 3. Sample districts (yellow) in Dallas County, each within 2% of the ideal population for kk = 4 districts. These are drawn by tract ReCom and block ReCom, respectively. The recombination algorithm tends to produce districts that are realistically compact.

Some of the effects that we want to capture have to do with spatial autocorrelation: the fact that geographical units that are near each other are likely to be similar in other ways, like demographics. In order to distinguish the effects of mere aggregation from spatial effects, we also consider an unrealistic district-forming process called Disconn, which randomly assigns units with no attention to adjacency or proximity until a district-sized collection is obtained. Tract Disconn randomly assigns census tracts to the district, whereas block Disconn randomly assigns census blocks.

4. Dallas County

4.1. District-Level Counts Under TopDown

We begin with county commission districts in Dallas County, with kk = 4 seats. Since the 2010 population of Dallas County was roughly 2.4 million, the districts will have roughly 600,000 people, making them nearly as big as congressional districts. For these, we will be able to use tracts as large building blocks, compared to blocks as small ones. We also include divisions of the county into kk = 175 districts of between 13,000 and 14,000 people each for a small-district comparison. Figure 4 plots the data from our noising runs on a logarithmic scale. The simplified hierarchical model was run with a nonnegativity constraint in postprocessing.

Even with much noisier parameters (ε = 1) than those used in production, the realistic district-generation methods (tract ReCom and block ReCom, green and blue) give typical count discrepancies of under 1,000 people on districts of 600,000—less than two-tenths of a percent. Between the two methods, building districts from larger units fares significantly better than building from smallest atoms. On small districts, where population deviation can be larger, the discrepancies tend to be in the low hundreds, or about 1–2%.

Next, we compare ReCom to Disconn to understand the effects of spatiality. On blocks (blue versus orange in Figure 4), the difference is clear: the compact and connected districts have far less error than the random disconnected alternatives. This makes sense, because ReCom districts will tend to preserve larger geographical units intact, just by virtue of having a large interior, while the disconnected districts fragment far more units. Preserving larger units can be helpful for accuracy because postprocessing corrections will be imposed at multiple stages. These fragmentation ideas, which are related to the ‘off-spine distance’ investigations of Abowd et al. (2021), are explored more fully in our earlier article (Cohen et al., 2021).

Figure 4: These histograms show district-level population changes on a log scale for various combinations of budget splits (rows), district-drawing algorithms (colors), and noising algorithms (columns). We include both large districts (kk = 4) and small districts (kk = 175). Each histogram displays 400 values, one for each district drawn by the specified algorithm, plotting the mean observed district-level population error magnitude over 16 executions of the noising algorithm with ε = 1 and the specified budget allocation.

On districts built from tracts, the ReCom/Disconn(green versus red) difference is smaller, but for TopDown18 runs, the disconnected districts actually have lower error. At first this seems puzzling, because compact and connected districts are being punished by the geography-aware TopDown algorithm. But there is a very plausible explanation: the spatial autocorrelation, or demographic similarity of geographic units to nearby units, is causing the postprocessing corrections to move nearby tracts in the same direction, impeding the cancellation that makes counts usually more accurate on larger geographies. (Compare the Petti–Flaxman discussion from Section 2.5.) The unit-integrity benefits are smaller in this setting, since the process starts with pieces farther up the spine.

Overall, drawing districts that keep larger pieces whole (e.g., by building with tracts instead of blocks, or by using ReCom to create districts with fat interior) lowers error magnitude significantly in the best case and has little or no effect in the worst case.

In the end, the story that emerges from these investigations is that, with full TopDown, the best accuracy that can be observed for large districts occurs when they are made from whole tracts and the allocation is tract-heavy; an equal split is not much worse. For city council–sized districts with populations around 13,000, ε = 1 noising creates errors in the low hundreds for compact, connected districts, with the best performance for block-heavy allocations. Again, an equal split is not much worse, suggesting that this might be a good policy choice for accuracy in districts across many scales. Also, the simple hierarchical model does a very respectable job of capturing the effects across scenarios, with the exception of detecting an accuracy penalty on tract ReCom, discussed above.

4.2. Polarization in Dallas

Practitioners who use standard inference techniques to measure racially polarized voting have raised two questions regarding the effect of differential privacy: (1) How robust will the estimate be after the noising? (2) Will noising tend to systematically diminish the estimates of candidate support from minority populations? We analyzed the effects of TopDown on the 2018 Texas Democratic primary runoff election for governor, where Lupe Valdez was a clear minority candidate of choice in Dallas County. (We also examined the general elections for president in 2012 and comptroller in 2018, with similar findings.)

We first consider linear regression, with a setup typically used by redistricting practitioners to measure polarization in voting. (We focus on the point estimates and their variance, rather than the confidence or credible intervals, following the primary way these techniques are cited in expert reports.)

Results are shown in Table 2. We find that the noise introduces an attenuation bias that seems alarming at first. However, this turns out to call attention to a data vulnerability that was already present. Of the 827 precincts in Dallas County, 201 have fewer than 10 cast votes from the election in question—in fact, 99 precincts recorded zero cast votes. These precincts are a big driver of instability under DP. This is not surprising; modest levels of injected noise can cause a big percentage swing when the overall numbers are low. Filtering out these precincts—or down-weighting their effect using weighted regression—makes the estimates using processed data closely agree with the estimates from raw data.

Table 2: ER point estimates of support for Lupe Valdez in racial/ethnic groups and their complements, using VAP from the 2010 Census as raw data. The simplified hierarchical model and TopDown18 estimates are based on 16 runs with ε = 1 and an equal budget split. Empirical variance (marked *) is calculated over 16 runs of each noising algorithm and is reported in units of 10−8 to two significant digits. In the filtered precincts case, precincts with fewer than 10 cast votes are excluded. In the weighted precincts case, the line fit uses the number of cast votes to weight each term in the objective function.

Ecological regression point estimates of support for Lupe Valdez in racial/ethnic groups and their complements, using voting age population from the 2010 Census as raw data. The simplified hierarchical model and TopDown18 estimates are based on 16 runs with ε = 1 and an equal budget split. Empirical variance (marked *) is calculated over 16 runs of each noising algorithm and is reported in units of 10810^{-8} to two significant digits. In the filtered precincts case, precincts with fewer than 10 cast votes are excluded. In the weighted precincts case, the line fit uses the number of cast votes to weight each term in the objective function.

We can also visualize these findings. The top row of Figure 5 shows the corresponding ER scatterplots using un-noised data. The columns show the results of using all precincts with the same filtered and weighted alternatives described above. The blue data points are precincts, with each plotted according to its percentage of Hispanic voting age population or HVAP (xx-axis) and the share of cast votes that went to Lupe Valdez (yy-axis). Strong racial polarization would show up as a fit line of high slope. This linear regression produces a point estimate of Hispanic support for Valdez, found by intersecting the fit line with the x=1x=1 line, which represents the scenario of 100% Hispanic population. The point estimate of non-Hispanic support for Valdez is at the intersection of the fit line with x=0x=0. (As noted above, it is usually the point estimates themselves, and not error estimates or correlation coefficients, that drive expert reports in this area.)

Figure 5: Visualization of ER results. The blue dots in each plot are the same raw (un-noised) data, shown with a blue fit line; the pink dots show 16 noising runs (ε = 1, equal split) with red fit lines re-computed each time. Histograms show the point estimates of Latino (gold) and non-Latino (teal) support for Valdez from data noised by the simplified hierarchical model (lighter) and TopDown18 (darker). Unweighted regression shows an attenuation bias from noising, though it is much more modest for TopDown18 than for the simplified model. If precincts are filtered or weighted, neither of the noising alternatives destabilizes the measurement of racially polarized voting.

The second and third rows of Figure 5 display the same data after applying the respective noising algorithms 16 times. Red lines are the fits to the noised data. When using unweighted regression with all precincts, we see substantial variance in the slopes of the ER lines and the corresponding point estimates. When filtering or weighting, the variation all but vanishes.

We corroborate the finding that RPV methods are quite stable, this time with a more complex, also widely used technique called ecological inference in place of regression.7 The results are in Table 3.

Table 3. EI point estimates of support for Lupe Valdez, before and after data is noised by TopDown18 with ε = 1 and an equal split. Empirical variance over 16 noising runs (*) is reported in units of 10-8, to two significant digits.

To summarize the RPV findings: the weighted ER results are quite stable, and the EI results are quite stable, in the sense that the estimates based on processed TopDown18 data are very close to the estimates based on raw data. But the astute reader will have noticed that the two would-be ground truths derived from raw data do not closely agree. This is widely understood in the practitioner sphere, where ER is thought to often exaggerate differences between group preferences, even sometimes giving support estimates above 100% or below zero. In recent years, ER is most often used as qualitative corroboration for EI findings, lending support to a determination of whether polarization is present. So here we find that differential privacy leaves the status quo much as it was, letting both ER and EI operate as they did before, as long as we attend to the worst sources of data instability that were already present.

5. Smaller Texas Localities

Dallas County is a large, multiracial county, home to one of the largest cities in the country. It is worthwhile to turn to smaller localities and see if the results are similarly reassuring. To that end, we add five counties and a city in Texas, whose basic information with respect to overall and Latino demographics is recorded in Table 4.

Table 4. Demographics of Texas localities.

Locality

TOTPOP

HPOP

HPOP%

HVAP

HVAP%

All Precincts

Filtered Precincts

Bell County

310,235

67,010

21.6%

41,675

18.8%

49

44

Brazoria County

313,166

86,643

27.7%

55,099

24.4%

93

67

Cameron County

406,220

357,747

88.0%

231,515

85.1%

63

54

Galveston City

47,743

14,925

31.3%

10,668

27.7%

16

12

Galveston County

291,309

65,270

22.4%

42,649

19.6%

102

96

Nueces County

340,223

206,293

60.6%

142,995

56.7%

128

115

Figures 6–11 show the results of ER on these localities, varying how ε = 1 is allocated from state- to block-level and computing the fit line.

Figure 6. TopDown18 results for Bell County (TOTPOP 310,235, HVAP 18.8%) for ε = 1. Top rows show linear regression on 16 runs of noised data; bottom rows show the point estimates of Hispanic and non-Hispanic support for Valdez.

Figure 7. TopDown18 results for Brazoria County (TOTPOP 313,166, HVAP 24.4%) for ε = 1. Top rows show linear regression on 16 runs of noised data; bottom rows show the point estimates of Hispanic and non-Hispanic support for Valdez.

Figure 8. TopDown18 results for Cameron County (TOTPOP 406,220, HVAP 85.1%) for ε = 1. Top rows show linear regression on 16 runs of noised data; bottom rows show the point estimates of Hispanic and non-Hispanic support for Valdez.

Figure 9. TopDown18 results for Galveston City (TOTPOP 47,743, HVAP 27.7%) for ε = 1. Top rows show linear regression on 16 runs of noised data; bottom rows show the point estimates of Hispanic and non-Hispanic support for Valdez.

Figure 10. TopDown18 results for Galveston County (TOTPOP 291,309, HVAP 19.6%) for ε = 1. Top rows show linear regression on 16 runs of noised data; bottom rows show the point estimates of Hispanic and non-Hispanic support for Valdez.

Figure 11. TopDown18 results for Nueces County (TOTPOP 340,223, HVAP 56.7%) for ε = 1. Top rows show linear regression on 16 runs of noised data; bottom rows show the point estimates of Hispanic and non-Hispanic support for Valdez.

If regression is unweighted and unfiltered, it is not only remarkably unstable, but can even identify the wrong trend—see, for instance, the city of Galveston (Figure 9), where simple regression shows Latinos voting for Valdez at a higher rate, reversed from the filtered or weighted variants.8 But when TopDown18 has an equal budget split, weighted regression is consistently quite stable under noising, even when the locality is small and the polarization level is low. This reinforces the message that RPV measurement is not threatened by differential privacy.

6. Pima County and Navajo County, Arizona

Next we turn to Arizona, where the districts redrawn post-2010 ranged from congressional districts with ideal population of 710,224 to small local districts such as Navajo County commission districts, with an ideal population of 21,490.

We used 2010 data from two counties, one large and one small. The first is Pima County, home of Tucson, with total population 980,263 (55% White, 35% Hispanic, 2.5% Native American/American Indian). The second is rural Navajo County, total population 107,449 (44% White, 11% Hispanic, and 42% American Indian).

Reconstructing census records in Navajo County and Pima County took a matter of hours on a student-grade laptop, allowing us to recover a complete person-by-person list of location, ethnicity, sex, age, and race for every enumerated resident of Navajo County in 2010. The reconstructed table is 100% consistent with the aggregate numbers released by the Census; the only inaccuracies come from the existence of multiple solutions to the linear system. This could easily be extended to the full nation with a few days of runtime.

We then created random districts in Navajo County to particularly home in on the count discrepancies in American Indian (AMIN) population created by noising. We ran these repeatedly through the simplified noising algorithm at various ε settings and splits. We find that, particularly for an equal split, the typical count discrepancy for AMIN population is under 500 even at ε = 1; with ε = 19, the discrepancy is under 5 people. (See Figure 12, where the left-hand side compares count discrepancies in various noising runs.)

Figure 12. Here, each column within a plot is one of 16 noising runs on county commission districts in Navajo. The yy-axis shows the range of observed counts for 100 random districts. On the left we see that equal and block-heavy splits are best for small districts, and that count discrepancies are typically in the single digits for ε = 19 and equal allocation. On the right we see again that districts made from larger pieces fare better

We again find that district construction matters. The right-hand side of Figure 12 shows that districts randomly made from small units (blocks) have visibly larger discrepancies. At this scale, the largest suitable units are block groups. When block group–based districts are noised instead, the count discrepancies all but vanish. Note that block groups are the largest unit of census geography that can easily make these small districts with tolerable population balance.

For a fuller picture, we look at the effect of TopDown-style noising on the apparent demographic makeup of districts. We examine four individual districts with different demographic composition and measure the demographic composition after 16 noising runs of our simplified hierarchical DP model (see Figure 13). This shows, in particular, that the AMIN population share of a Navajo County district tends to change by less than .001 (a tenth of a percent of the district’s population) on a noising run with ε = 2 and equal split. And notably, even in the district at lower left, which has a narrow White majority of 50.7%, that measurement never crosses the 50% line. With the updated algorithm, full workload, modified spine, and production parameters, we would have even more confidence that this slim majority remains a majority after noise is applied.

Figure 13. The large pie charts show the demographic breakdown in four examples of small districts in Navajo County (population roughly 20,000) produced by a random process. The corresponding small pie charts show the same breakdown after noising by a simplified hierarchical differential privacy model with ε = 2 and an equal split. (We omit the noised numbers, so that the visual gives only an overall qualitative impression.) In a total of 64 runs across these four districts with very different demographic compositions, the maximum additive error in any group’s demographic share was 0.46%. Actual TopDown20 settings will retain even greater accuracy.

Finally, we execute another set of ecological regressions, estimating the support for Joe Biden in the presidential contest of 2020 for American Indian voters in Navajo County and Hispanic voters in Pima County. (See Table 5). As in the other trials, we find that weighted regression does not meaningfully attenuate polarization inferences.

Table 5. Point estimates from weighted ecological regression of racial/ethnic group support for Joe Biden in two Arizona counties. Noising uses the simplified hierarchical model with an equal ε split, and the reported values are the highest and lowest observed in 16 trials. AMIN = American Indian.

Navajo County

un-noised

min noised
(ε = 2)

max noised
(ε = 2)

min noised
(ε = 19)

max noised
(ε = 19)

AMIN

0.886

0.887

0.892

0.885

0.886

non-AMIN

0.169

0.167

0.170

0.169

0.169

Pima County

un-noised

min noised
(ε = 2)

max noised
(ε = 2)

min noised
(ε = 19)

max noised
(ε = 19)

Hispanic

0.661

0.653

0.663

0.659

0.662

non-Hispanic

0.573

0.572

0.575

0.572

0.573

7. Conclusion

Differential privacy offers meaningful protection from a real threat to census data—as illustrated in Section 6, we can produce 100% consistent microdata reconstructions in multiple counties in a matter of hours. The central goal of this study has been to take the concerns of redistricting practitioners seriously and to investigate potential destabilizing effects of TopDown on the status quo. A second major goal is to make actionable recommendations, both to the Disclosure Avoidance team at the Census Bureau and to the same practitioners—the attorneys, experts, and redistricting line-drawers in the field.

Our top-line conclusion is that, at least for the localities and election data we examined, TopDown18(and a simplified hierarchical alternative) already performed well in accuracy and signal detection for election administration and voting rights law. With its many improvements and an extremely high privacy-loss budget, the newer TopDown20 that was ultimately run in production was surely even less disruptive.

This work has led us to isolate several elements of common redistricting practice that lead to higher-variance outputs and more error under TopDown. The first example is using a full precinct data set, with no population weighting, in running racial polarization inference techniques. The second example is building with the smallest available blocks for districts, placing no particular priority on intactness for larger units of census geography. In both cases, these were already likely sources of silent error or false precision. Filtering or weighting the precincts in an inference model, on one hand, and building districts that prioritize preserving whole the largest units that are suited to district scale, on the other hand, are two examples of simple updates to redistricting practice. Besides being sound on first principles, these adjustments can insulate data users from DP-related distortions and help safeguard the important work of fair redistricting.


Acknowledgments

This is an extended and updated version of a paper that originally appeared in the Foundations of Responsible Computing conference (Cohen et al., 2021).

Authors are listed alphabetically. We thank our collaborators Denis Kazakov, Mark Hansen, and Peter Wayner. Kazakov developed a reconstruction algorithm (employed here in the Texas experiments) as a member of Hansen’s research group. Wayner guided our deployment of TopDown18 in AWS and was an invaluable team member for our earlier technical report. We thank Justin Levitt for illuminating conversations about the relevant case law.

AC is supported by the DARPA SIEVE program under Agreement No. HR00112020021 and the National Science Foundation under Grant No. 1915763. MD is supported by the National Science Foundation under Grant No. DMS-2005512. We received additional support from NSF OIA-1937095 (Convergence Accelerator), from the Alfred P. Sloan Foundation, and from the Arizona Independent Redistricting Commission, which invited us to extend our earlier study to Arizona counties for a presentation to the IRC. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of our funders.

Contributions

AC and MD led the conceptualization, investigation, methodology, formal analysis, original drafting, and editing. JM and BS led the data curation, analysis, software development, and visualization. MD led and AC supported the administration and supervision for the project. MD acquired funding for the project. All authors provided critical feedback, review, and editing, and all authors shaped the investigations and analysis.

Disclosure Statement

MD is serving as the Gingles 1 expert in the redistricting lawsuit Merrill v. Milligan.


References

Abowd, J. M., Ashmead, R., Cumings-Menon, R., Garfinkel, S., Heineck, M., Heiss, C., Johns, R., Kifer, D., Leclerc, P., Machanavajjhala, A., Moran, B., Sexton, W., Spence, M., & Zhuravlev, P. (2022). The 2020 Census disclosure avoidance system TopDown Algorithm. Harvard Data Science Review, (Special Issue 2). https://doi.org/10.1162/99608f92.529e3cb9

Abowd, J. M., Ashmead, R., Cumings-Menon, R., Kifer, D., Leclerc, P., Ocker, J., Ratcliffe, M., & Zhuravlev, P. (2021). Geographic spines in the 2020 Census dislcosure avoidance system TopDown Algorithm (tech. rep.). U.S. Census Bureau.

Arizona Independent Redistricting Commission. (2021, July 13). Reporter’s transcript of video-conference public meeting. https://irc.az.gov/sites/default/files/meeting-files/07-13-2021%5C%20IRC%5C%20Public%5C%20Session.pdf

Avery v. Midland County, 390 U.S. 474. (1968). https://www.oyez.org/cases/1967/39

Bouk, D., & boyd, d. (2021, March 18). Democracy’s data infrastructure: The technopolitics of the U.S. census. https://knightcolumbia.org/content/democracys-data-infrastructure

Bun, M., & Steinke, T. (2016). Concentrated differential privacy: Simplifications, extensions, and lower bounds. In M. Hirt & A. Smith (Eds.), Lecture notes in computer science: Vol. 9985. Theory of cryptography (pp. 635–658). Springer. https://doi.org/10.1007/978-3-662-53641-4_24

Cohen, A., Duchin, M., Matthews, J., & Suwal, B. (2021). Census TopDown: The impacts of differential privacy on redistricting. In K. Ligett & S. Gupta (Eds.), 2nd symposium on foundations of responsible computing (forc 2021) (5:1–5:22). Schloss Dagstuhl – Leibniz- Zentrum für Informatik. https://doi.org/10.4230/LIPIcs.FORC.2021.5

DeFord, D., Duchin, M., & Solomon, J. (2021). Recombination: A family of Markov Chains for redistricting. Harvard Data Science Review, 3(1). https://doi.org/10.1162/99608f92.eb30390f

Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I., & Naor, M. (2006). Our data, ourselves: Privacy via distributed noise generation. In S. Vaudenay (Ed.), Lecture notes in computer science: Vol. 404. Advances in cryptology - EUROCRYPT 2006 (pp. 486–503). Springer. https://doi.org/10.1007/11761679_29

Dwork, C., McSherry, F., Nissim, K., & Smith, A. D. (2006). Calibrating noise to sensitivity in private data analysis. In S. Halevi & T. Rabin (Eds.), Lecture notes in computer science: Vol. 3876. Theory of cryptography (pp. 265–284). Springer. https://doi.org/10.1007/11681878_14

Executive Committee of the Legislative Council, Colorado General Assembly. (2020). Letter to Dr. Steven Dillingham, Director, U.S. Census Bureau. https://www.ncsl.org/Portals/1/ Documents/Elections/CO_State_Legislative_Leadership_Letter.pdf ?ver=2020- 08- 04- 132435-780&timestamp=1596569177678

Georgia v. Ashcroft, 539 U.S. 461, 488 n. 2. (2003). https://www.oyez.org/cases/2002/02-182

IPUMS National Historical Geographic Information System. (2021). Privacy-protected 2010 census demonstration Data.

JASON. (2022). Consistency of data products and formal privacy methods for the 2020 Census (Panel Report JSR 21-02).

Karcher v. Daggett, 462 U.S. 725, 738. (1983). https://www.oyez.org/cases/1982/81-2057

Kenny, C. T., Kuriwaki, S., McCartan, C., Rosenman, E. T. R., Simko, T., & Imai, K. (2021). The use of differential privacy for census data and its impact on redistricting: The case of the 2020 U.S. Census. Science Advances, 7(41), Article eabk3283. https://doi.org/10.1126/sciadv.abk3283

Knudson, K. C., Schoenbach, G., & Becker, A. (2021). Pyei: A python package for ecological inference. Journal of Open Source Software, 6(64), Article 3397. https://doi.org/10.21105/joss.03397

Lee, T. B. (2022, March 7). Why the census invented nine fake people in one house. Slate. https://slate.com/technology/2022/03/privacy-census-fake-people.html

Matthews, J., Suwal, B., & Wayner, P. (2021). Accompanying GitHub repository. https://github.com/mggg/census-diff-privacy

McDonald, M. (2019). Redistricting and differential privacy [Video and slides]. https://www.nationalacademies.org/event/12-11-2019/workshop-on-2020-census-data-products-data-needs-and-privacy-considerations

Mexican American Legal Defense and Educational Fund, & Asian Americans Advancing Justice. (2021). Preliminary report: Impact of differential privacy & the 2020 Census on Latinos, Asian Americans and redistricting (tech. rep.). https://www.advancingjustice-aajc.org/report/preliminary-report-impact-differential-privacy-2020-census-latinos-asian-americans

Mueller, T., & Santos-Lozada, A. R. (2022, January 19). The 2020 U.S. Census differential privacy method introduces disproportionate error for rural and non-white populations (research brief). Population Research and Policy. https://doi.org/10.1007/s11113-022-09698-3

National Academies of Sciences, Engineering, and Medicine. (2020). 2020 Census data products: Data needs and privacy considerations: Proceedings of a workshop (D. L. Cork, C. F. Citro, & N. J. Kirkendall, Eds.). The National Academies Press. https://doi.org/10.17226/25978

National Conference of State Legislatures. (2020). 2010 redistricting deviation table. https://www.ncsl.org/research/redistricting/2010-ncsl-redistricting-deviation-table.aspx

National Conference of State Legislatures. (2021). Differential privacy for census data explained. https://www.ncsl.org/research/redistricting/differential-privacy-for-census-data-explained.aspx

Petti, S., & Flaxman, A. (2020). Differential privacy in the 2020 US Census: What will it do? quantifying the accuracy/privacy tradeoff [version 2; peer review: 2 approved]. Gates Open Research, 3(1722). https://doi.org/10.12688/gatesopenres.13089.2

Pujol, D., McKenna, R., Kuppam, S., Hay, M., Machanavajjhala, A., & Miklau, G. (2020). Fair decision making using privacy-protected data. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (pp. 189–199). https://doi.org/10.1145/3351095.3372872

Reynolds v. Sims, 377 U.S. 533. (1964). https://www.oyez.org/cases/1963/23

Santos-Lozada, A. R., Howard, J. T., & Verdery, A. M. (2020). How differential privacy will affect our understanding of health disparities in the United States. Proceedings of the National Academy of Sciences, 117(24), 13405–13412. https://doi.org/10.1073/pnas.2003714117

The State of Alabama et al v. United States Department of Commerce et al Docket No. 3:21-cv- 00211 (M.D. Ala. Mar 10, 2021). (2021).

Statistical inference is not a privacy violation. (2021). Retrieved 2021, from https://differentialprivacy.org/inference-is-not-a-privacy-violation/

U.S. Census Bureau. (2012). 2010 Census summary file 1. https://www.census.gov/data/datasets/2010/dec/summary-file-1.html

U.S. Census Bureau. (2017). Census P.L. 94-171 redistricting data. https://www.census.gov/programs-surveys/decennial-census/about/rdo/summary-files.html

U.S. Census Bureau. (2021a). Differential privacy 201 and the TopDown algorithm. https://www. census.gov/data/academy/webinars/2021/disclosure-avoidance-series/differential-privacy-201-and-the-topdown-algorithm.html

U.S. Census Bureau. (2021b). Disclosure avoidance for the 2020 Census: An introduction (tech. rep.). https://www2.census.gov/library/publications/decennial/2020/2020- census-disclosure-avoidance-handbook.pdf

U.S. Census Bureau. (2021c). Understanding disclosure avoidance-related variability in the 2020 Census redistricting data. https://www.census.gov/content/dam/Census/library/factsheets/2022/variability.pdf

U.S. Census Bureau. (2022). Post-enumeration survey and demographic analysis help evaluate 2020 Census results. https://www.census.gov/newsroom/press-releases/2022/2020-census-estimates-of-undercount-and-overcount.html

13 U.S. Code Section 9. (1976). https://www.govinfo.gov/app/details/USCODE- 2010- title13/ USCODE-2010-title13-chap1-subchapI-sec9

Voting Rights Act of 1965, Pub. L. 89-110, 79 Stat. 437. (1965). https://www.govinfo.gov/content/pkg/STATUTE-79/pdf/STATUTE-79-Pg437.pdf

Wesberry v. Sanders, 376 U.S. 1. (1964). https://www.oyez.org/cases/1963/22

Wright, T., & Irimata, K. (2021). Empirical study of two aspects of the TopDown Algorithm output for redistricting: Reliability & variability (August 5, 2021 update) (Study Series Statistics #2021-02). Center for Statistical Research & Methodology, Research and Methodology Directorate, U.S. Census Bureau.


©2022 Aloni Cohen, Moon Duchin, JN Matthews, and Bhushan Suwal. This article is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the article.

Comments
0
comment

No comments here

Why not start the discussion?