Skip to main content
SearchLoginLogin or Signup

A Penny Synthesized is a Penny Earned? An Exploratory Analysis of Accuracy in the SIPP Synthetic Beta

Forthcoming. Now Available: Just Accepted Version.
Published onJan 05, 2024
A Penny Synthesized is a Penny Earned? An Exploratory Analysis of Accuracy in the SIPP Synthetic Beta


The Census Bureau has expressed interest in using modern synthetic data modeling techniques for privacy and confidentiality protection in future microdata releases. To aid understanding of the accuracy and usability of synthetic microdata going forward, we perform an exploratory analysis comparing results generated using an early synthetic microdata release known as the SIPP Synthetic Beta (SSB) to results from the same analyses using the corresponding confidential microdata. We compare numerous descriptive and model-based use cases of the data and discuss explanations for how performance of the synthetic data relates to modeling decisions by the data provider and methodology choices by the data user. We also summarize differences in confidence interval overlap and statistical conclusions. There is a strong association between the synthetic and confidential results in terms of both magnitudes and statistical conclusions, but the synthetic data is not a perfect replication of the confidential data. Finally, we discuss the implication of our results for the role of modeling decisions and user feedback when creating synthetic data, validation and verification options, and the evolving science of creating synthetic data. Importantly, we consider our findings to be something of a lower bound for the accuracy of future synthetic microdata because of improvement in synthetic data modeling since the SSB was created and the fact that we do not account for other sources of survey error when comparing the confidential data to the synthetic data.

Keywords: synthetic data, data privacy, data accuracy, statistical disclosure limitation, labor economics, applied microeconomics

01/05/2024: To preview this content, click below for the Just Accepted version of the article. This peer-reviewed version has been accepted for its content and is currently being copyedited to conform with HDSR’s style and formatting requirements.

©2024 Jordan Stanley and Evan Totty. This article is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the article.

No comments here
Why not start the discussion?