On Monday, the U.S. Census Bureau announced that it has decided on the precise algorithm for its disclosure avoidance tool / differential privacy method. In short, the algorithm used to inject “noise” in the raw census data to protect privacy, will be set to inject a significantly lower level of noise into the data in comparison to previous samples released to the data user community over the past 18 months. Read the detailed release below:
Press Release – April 19, 2021: Based on the results of over 600 experimental data runs to optimize and tune the parameters of the new 2020 Census Disclosure Avoidance System (DAS) algorithm, the Census Bureau’s Data Stewardship Executive Policy Committee (DSEP) has chosen the privacy-loss budget (PLB) for the forthcoming set of demonstration data.
The global privacy-loss budget (PLB) for the persons file in the next demonstration data set will be 10.3 and the PLB for the housing units data will be 1.9. As discussed previously, the four demonstration products released to date used a PLB of 4.0 for persons and 0.5 for housing units—significantly lower than we anticipate using for the final 2020 Census data. Those earlier demonstration data were purposefully “tuned” to privacy and not “tuned” for producing highly accurate redistricting data. We held the PLB roughly the same across those four releases to allow us to compare the effects of incremental algorithmic improvements in the system.
While significantly larger than the PLB used in the previous data, the 10.3 PLB is still allocated in a manner that provides a level of protection for every census record and every published characteristic. For those of you trying to understand the increase in accuracy attributable to a shift from a PLB of 4 to a PLB of 10.3, it’s important to understand that the PLB is logarithmic—meaning every additional number in the PLB scale represents an exponential increase in the PLB. The forthcoming demonstration data, released as Privacy-Protected Microdata Files (PPMFs) will help data users see that increase in PLB reflected in the accuracy of population counts and demographic characteristics at various levels of geography.
The Census Bureau announced the new PLB in a declaration submitted in response to litigation on April 13, 2021.
New Demonstration Data Will Satisfy Redistricting Use Case Needs
In the same declaration we previewed some of the high-level results from the upcoming demonstration data release regarding the accuracy criteria established for the P.L. 94-171 redistricting data (see our previous newsletter for criteria details).
We report that the new demonstration data will fully satisfy those specialized accuracy criteria. Specifically, populations, voting-age populations, and the proportion of the largest OMB-designated race and ethnicity groups are all reliable for redistricting and Voting Rights Act scrutiny. Because new districts cannot be drawn before the 2020 P.L. 94-171 Redistricting Data Summary File is released, counties, block groups, minor civil divisions, incorporated places, and census designated places were all used as on- and off-spine geographic entities for tuning purposes.
Added Noise from Differential Privacy is Less Than Inherent Census Errors
The declaration also revealed high-level results of an analysis comparing the error caused by the new differentially private methods to the other sources of error that are inherent in census data (coverage error, measurement error, etc.) based on post-census analyses.
Our internal analyses have shown that 2010 Census operations resulted in an average county-level estimation of uncertainty in terms of total population of +/- 960 people (averaging 1.6% of the county census counts). The new demonstration data, by comparison, has an average error of only +/- 5 people at the county level (reflecting a mean absolute percent error of 0.04% of the counties’ population) as noise from differential privacy.
At the block level the differentially private data have an average population error of +/- 3 people, which includes both housing unit and group quarters populations. Compare that with the simulated error inherent in the census that puts the average uncertainty of block population counts at +/- 6 people.
We’ll share more information about these results in our next newsletter.
Next Steps
Per the calendar below, we will release the new demonstration data by April 30. We’ll release two versions to aide in your analysis: a version using the new (10.3, 1.9) PLB, and one using the earlier, development-focused PLB (4.0, 0.5).
We’ll look forward to your feedback after that release.
2021 Key Dates, Redistricting (P.L. 94-171) Data Product:
By April 30:
- Census Bureau releases new Privacy-Protected Microdata Files (PPMFs) and Detailed Summary Metrics.
- Two versions: Candidate strategy run using new PLB and old PLB.
By late May:
- Data users submit feedback.
Early June:
- The Census Bureau’s Data Stewardship Executive Policy (DSEP) Committee makes final determination of PLB, system parameters based on data user feedback for P.L. 94-171.
Late June:
- Final DAS production run and quality control analysis begins for P.L. 94-171 data.
By August 16:
- Release 2020 Census P.L. 94-171 data as Legacy Format Summary File*.
September:
- Census Bureau releases PPMFs and Detailed Summary Metrics from applying the production version of the DAS to the 2010 Census data.
- Census Bureau releases production code base for P.L. 94-171 redistricting summary data file and related technical papers.
By September 30:
- Release 2020 Census P.L. 94-171 data** and Differential Privacy Handbook.
* Released via Census Bureau FTP site.
** Released via data.census.gov.