This piece is a commentary on the article: Coming To Our Census: How Social Statistics Underpin Our Democracy (And Republic)
I would like to congratulate Teresa A. Sullivan (2020) on such a well laid out, thought-provoking, and topical article. I largely agree with her propositions but have made a few personal comments toward the end of this note.
Her underlying concern is whether a successful census can be conducted in 2020 given what has happened in the lead-up to the census. A successful census definitely does rely on public trust in the census-taking institution and its past record in maintaining the confidentiality of personal data. A high level of public trust in the government of the day is certainly helpful, but it is not sufficient to guarantee the success of the census. For example, some countries have not run successful censuses for decades even though there is a reasonable level of trust in the governments. This seems to be due to misuse of personal data (e.g., population register data rather than census data in the case of the Netherlands) during World War II. In Europe, there has been a widespread move to using linked administrative data as a substitute for the census because of the increasing cost and difficulty of running traditional censuses and the availability of alternative data.
On the other hand, it is possible to run a successful census even though public trust in the government, or politicians more generally, is low. However, I do agree with Sullivan that mistrust of government could be a major obstacle to a successful census, but it may not be insurmountable. It does require,
A clear separation in the mind of the public between the census-taking institution and the government;
Trust in the census-taking institution and, on this point, I agree with Sullivan that professional integrity of the statisticians is the best defense of the census; and
A record of maintaining the confidentiality of personal data.
It is the last point that may cause particular concern for the forthcoming census. Why do I have this concern? As outlined in the Sullivan (2020) article (page 7), it has been “deduced that the Bureau had in fact collaborated to help identify Japanese-American citizens and Japanese resident aliens and their locations” during World War II. Although this happened a long time ago and there appears to have been no significant confidentiality issues since then, the incident has been identified relatively recently in Seltzer and Anderson (2007) so it could be raised in the context of a somewhat controversial census. The very significant public debate associated with the potential citizenship question is likely to raise the issue of privacy in the American psyche. More broadly, privacy is becoming a greater concern as personal data from a range of sources (e.g., Facebook, loyalty schemes) is being used without consent. Although the citizenship question was not ultimately included, there will still be privacy concerns among noncitizens that the census data might be used to identify them with subsequent impacts. I expect it will be much more difficult to have an accurate enumeration of these people and a significant undercount of them is likely. I do note, as outlined in the Sullivan (2020) article, that “Since 1947, the Census Bureau and all of its personnel are required by statute not to share any individually identifiable information and the Bureau has repeatedly made strong public affirmations of confidentiality.”
Another factor that may have an impact is the perception that the federal government is not agile enough and skilled enough to prevent the hacking of census data.
A less accurate enumeration will affect the so-called intercensal discrepancy. That is, there is likely to be a larger difference between the population estimates based on the 2010 census and those measured by the 2020 census than was the case with previous censuses. This will result in a tricky exercise of reconciling the two measures. It will be made more difficult because undercoverage will vary by age, ethnicity, and region. It is not beyond the realm of possibilities that the 2020 population census cannot be used or only partially used to rebase the population estimates. We certainly hope that this is not the case.
There will need to be a special public relations campaign targeted at noncitizens. It will need to use advocates that are trusted by these people who explain the importance of the Census to them and provide assurances that their census data will be safe. This will be easier if identified census data is not retained any longer than necessary.
1. Adjustment for the Undercount
The 1976 Australian census was the most difficult since World War II. It resulted in a relatively large undercount that was going to impact the accuracy of the population estimates. The 5-yearly census was the main source for rebasing the population estimates. Following analysis of the census data, it was decided to make an adjustment for the undercount using data from the Post-Enumeration Survey (PES). As director of methodology at the time, I was consulted and strongly supported the decision. It proved to be noncontroversial, was strongly supported by the demography community, and raised no comment from the government or opposition.
In Australia, it is the population estimates that are used to determine the number of electoral seats to be distributed to each state. Likewise, it is population estimates that are used in the formula determining the amount of funds to be distributed to states and local government areas within states (although they are not the only variable used). Unless there was an adjustment, those states with a higher undercount would be at a disadvantage. It is disadvantaged people who are more likely to be undercounted.
This has the arrangement for every subsequent census. That is, there is an adjustment for the undercount determined by the demographers using PES data when they rebase the population estimates following the census. Although Australia was the first country to make such an adjustment, several other countries are now doing so. I believe it would be appropriate for the United States as well if the political opportunity arises. It is very difficult when political advantage seems to be the major consideration in determining whether to do this or not. There is a brief description of the methodology in Appendix B of Harding et al. (2017).
2. The 2016 Australian Population Census
This faced some very strong headwinds that might have impacted the accuracy of the census.
There was some uncertainty about whether the census would be properly funded and some soundings on the feasibility of changing the law that guaranteed a census every 5 years. A late decision to proceed limited the time for preparation.
There was a much stronger push for an online census (digital first) with a 65% online target.
There were other significant methodology changes, including the use of a newly established address register for the first time.
There were difficulties identifying which dwellings were occupied on census night as a result of a smaller census field force.
There was a very public debate about the privacy of the census given proposals to link census data with other data sets using census identifying information. This was to create linked data sets for research purposes, although microdata sets released to accredited researchers would be deidentified.
The online census form had to be switched off on census night because of cybersecurity concerns.
Nevertheless, the census was successful with a high level of public cooperation. Because of public doubts about the accuracy of the census, an Independent Quality Assurance Panel was established to report publicly on the accuracy of the census—see Harding et al. (2017). The Australian Bureau of Statistics plans to establish a similar panel for the 2021 census.
The privacy concerns did have some impacts even though it was judged that the overall impact on the census was not great. More people did not report their names or provided first names only. There was a significant increase in the number of persons reporting age rather than date of birth and there was a large decrease in the number of persons agreeing to have their census form archived to be available for public release in 99 years time. It confirms that there is increasing concern about the privacy of the census and something that will need to be addressed for future censuses.
3. Do We Still Need a Census?
By law, there has to be a census in Australia every 5 years. This law was determined in the 1970s and was mostly concerned with the accuracy of the population estimates used for electoral purposes. Australia has high external migration but it is an island, so its borders are easier to manage and there is data collected at entry and exit points that enable external migration to be determined with a reasonable level of accuracy. However, the issue is with internal migration. The Australian population is highly mobile. Furthermore, many migrants do not know where they will settle at the time of entry. For these reasons, it was decided by law to run a 5-yearly census to ensure population estimates are accurate given their importance for electoral and other purposes. Also, Australia does not have a system of population registration like many European countries.
In Australia we are a long way from being able to conduct a census using linked administrative data and I suspect that is the situation in United States as well. You need a data source or data sources that covers the whole population or close to the whole population. We don’t have that in Australia at this time. We go close for the adult population with a combination of taxation and income support data but children are missing, as well as a significant number of adults. Nevertheless, administrative data can be used to conduct a more efficient and better quality census. Examples include address lists, identification of dwellings that should have been enumerated in the census, and imputation for nonresponse including difficult to enumerate populations.
Some of these data sources contain geographic information and, together with big data sources, it might be possible to get sufficiently accurate internal migration data to reduce the frequency of the census to 7 yearly or even 10 yearly say.
4. Concluding Remarks
To conclude, I will provide short responses to the statement and questions posed by Sullivan (2020) in her article.
Democracy requires numbers for its proper functioning. I agree strongly and those who weaken the capacity to provide the numbers are actually weakening the democracy.
There is a social statistics infrastructure. Agree, and as noted in the article, this goes beyond official statistics.
The U.S. Census is the cornerstone of the social statistics infrastructure. This has certainly been the case in the past, but it may become less so in the future with the increasing availability of other data sources, including big data.
How mistrust of government threatens the quality of census data. Mistrust of government certainly makes it more difficult to conduct a successful census, but it is possible unless other factors are in play that reduce trust in the census.
The citizenship question. The political debate surrounding the possible inclusion of this question would have reduced trust in the census if it had been included. Interestingly, citizenship is a standard question in every Australian census and has been without controversy.
Is there an alternative to the census? This varies by country and, as noted above, I don’t believe there is an alternative yet for Australia. Other data sources may assist in the 2030 U.S. Census, and reduce the cost, but I would suggest a core census data collection will still be required.
Can and will the government keep the census data confidential? This is the key question here. Perceptions are important. It is the Census Bureau rather than the government that holds the data, but many will not make that distinction. Also, there may be a feeling that the Census Bureau might weaken under pressure from the government. Cybersecurity is another consideration. I believe this is the biggest risk area for the U.S. Census and there will need to be a range of strategies in place to address the risk.
What about all the errors already in data? All statistical data collections contain errors or areas of uncertainty. This is true of the census. The census should be trusted more if there is transparency around errors and uncertainties and documentation that allows census data users to interpret the suitability of census data for their purposes.
Errors of coverage and accuracy reinforce mistrust of government. This is because there are no adjustments for deficiencies in coverage and the reasons for the lack of adjustment are largely political. All censuses have coverage deficiencies. As explained above, the Australian population estimates include adjustments for census coverage deficiencies and it is noncontroversial.
Neither checks nor balances preserve the integrity of the census. The integrity of the census will be much greater if there is certainty about the operating environment, adequate funding, and with the Census Bureau having a clear authority to conduct the census and publish the results.
Statisticians with integrity are the best defenders of this pillar of democracy. I agree strongly. There are many documents guiding how statisticians should behave. One of significant influence is the Fundamental Principles of National Official Statistics (see United Nations, 2014), which have been endorsed by the General Assembly of the United Nations. Of particular relevance is Principle 2,
To retain trust in official statistics, the statistical agencies need to decide according to strictly professional considerations, including scientific principles and professional ethics, on the methods and procedures for the collection, processing, storage and presentation of statistical data.
Read invited commentary by:
Margo Anderson (University Wisconsin Milwaukee)
Thomas Belin (UCLA)
Ray Chambers (University of Wollongong)
Connie Citro (National Academies of Sciences, United States)
Reynolds Farley (University of Michigan)
Howard Hogan (US Census Bureau)
Karen Kafadar (University of Virginia)
Dudley L. Poston, Jr. (Texas A&M University)
Read a rejoinder by: Teresa A. Sullivan
Harding, S., Jackson Pulver, L., McDonald, P., Morrison, P., Trewin, D., & Voss, A. (2017). Report on the quality of the 2016 census data. Retrieved from http://www.abs.gov.au
Seltzer, W., & Anderson, M. (2007). Census confidentiality under the Second War Powers Act (1942–1947). Paper prepared for the Annual Meeting of the Population Association of America. New York, March 30.
Sullivan, T. A. (2020). Coming to our census: How social statistics underpin our democracy (and republic). Harvard Data Science Review, 2(1).
United Nations. (2014). Fundamental principles of national official statistics. Retrieved from http://www.unstats.un.org
This article is © 2020 by Dennis Trewin. The article is licensed under a Creative Commons Attribution (CC BY 4.0) International license (https://creativecommons.org/licenses/by/4.0/legalcode), except where otherwise indicated with respect to particular material included in the article. The article should be attributed to the author identified above.