What Does Differential Privacy in Census Data Mean for the Task of Redistricting?

What Does Differential Privacy in Census Data Mean for the Task of Redistricting?

The law requires that any identifying information you give the Census Bureau be kept confidential for 75 yrs, but simply removing your information from what is published is no longer enough. Big data and powerful computing technology now allow almost anyone to “reconstruct” the seemingly anonymized information. That means it is increasingly possible to identify who you are, where you live, and other information from the census results. Here’s how the Census Bureau plans to combat that.

New for the 2020 census, the U.S. Census Bureau will be using a process called differential privacy to inject “statistical noise” into the database so that database reconstruction efforts will not work. (Click here for a simple, real-life example of database reconstruction and a detailed explanation of differential privacy.)

What is the practical effect of differential privacy for redistricting?

  • Total state population counts will be reported unchanged.
  • Census block-level total housing units and group quarter counts will also remain unchanged.
  • Statistical noise will be added to population, voting-age population and racial demographic information at every geographic reporting level -counties, tracts, census blocks, etc. – with the exception of population at the state level.
  • Local jurisdictions will be most affected by statistical noise, as well as smaller, racial and ethnic populations.

At the heart of the matter is the trade-off between how well privacy is maintained in census data versus the reduction in accuracy these measures create. When it comes to redistricting, the trade-off is even more complicated because census data has many more uses beyond redistricting. The chief scientist, at the Bureau, Dr. John Abowd, has acknowledged that publicly “tuning accuracy for any given analysis can reduce accuracy for other analysis.”

The biggest worry that states have at this point, is whether the redistricting data supplied by the Bureau is fit for the very precise task of redistricting. This is a key question since many of the other users of census data would not be as sensitive to “noise” as would a state, which needs to draw congressional districts equal in population within one or two people.

Measuring the effect of differential privacy

The Census Bureau has been gracious enough to partner with states and solicit feedback by offering demonstration or sample data tables so that states can see how treating existing census data with differential privacy, (or “disclosure avoidance,” the term oft-used by the Bureau) actually changes the data from the last census in 2010. The results have been striking.

One example is this map of the country’s current congressional districts that Caliper Corp. built using the 2010 census dataset. The sample file provided by the Census Bureau with the statistical noise produced a wildly malapportioned map. Many of the districts varied in population by the thousands. Most would say that this defeats the purpose of redistricting in the first place.

Caliper Corp. (click map to go to the interactive map at Caliper.com)

The Bureau is continuing to fine-tune the statistical noise, and will be releasing more refined sample files, but there is no high level of confidence at this point, that the Bureau can deploy differential privacy in a way suitable for the preciseness of redistricting at the very least.

Related Posts