Washington State Data Office: Privacy Protected Census Data it Sampled is “Unfit” to Use

Washington State Data Office: Privacy Protected Census Data it Sampled is “Unfit” to Use

“The majority of the data output from the DAS [disclosure avoidance system] appears to be unfit for most uses.” Is how a letter to the Census Bureau’s Disclosure Avoidance Team starts off. The letter, from the Washington State Office of Financial Management, which runs the state data center, sums up the results of the state’s usability test of census data treated with a disclosure avoidance technique called “differential privacy,” which introduces “statistical noise” into the dataset in order to maintain the privacy of individual data.

Sampling the Technique: If you are unfamiliar with the technique you can read more about it here. The Census Bureau has been applying “statistical noise” to the 2010 census data tables so that states are able to compare what the differential privacy data looks like in comparison to the actual 2010 census data that is the most current population count available until the 2020 census is finalized.

Most states must use this data to redistrict but there is much concern as to how a process like redistricting – which requires drawing districts of equal population – will be possible when the actual population numbers are so dramatically skewed.

Population Discreprencies: The Washington letter is just the latest in a series of feedback documents from other states. This latest letter goes into good detail describing just how the statistical noise will affect actual, on the ground decisions. Perhaps the most disturbing example from Washington’s analysis is that of the 76,000 or so census blocks with zero population in the real 2010 file, those same blocks have a total of over 15,000 people in them under the “privacy protected” file. This is a dramatic consequence for anyone who must redistrict; imagine adding blocks with population in them and including that in your population count for a congressional district – when in reality there is no population in those blocks. That means your “actual” district total is meaningfully lower than you would know.

Here are the highlights from Washington State’s letter:

  • the treated data decouples individuals, from their place of residence, and their demographic characteristics (age, sex, race, and ethnicity)
  • communities with similar racial characteristics are more dispersed geographically.
  • for one census block in which a women’s prison is located, the demonstration file reported a female population of just 12%, as opposed to the actual percentage of 99%.

There are various other redistricting related processes that will also be affected by this altered data, in particular, racial voting analysis under section 2 of the Voting Rights Act. The Census Bureau has not finalized any of its techniques for applying differential privacy, and it has explained that it is a work in progress, even though there are precious few months before the data must be released.

It can be expected that more states and other jurisdictions will list serious concerns with census data treated under the disclosure avoidance system, but the real debate will likely end up in the courts, and despite the problems with the level of accuracy in the data, the census dataset may still be the “best data available”. Karcher v. Daggett, 462 U.S. 725, 738 (1983)

Related Posts