For a given field of scientific research and technological application, there is a natural evolution driven by changes in stakeholder needs; in the underlying technical infrastructure and empirical results; in the related operating environment; and in complex connections with other fields that can have different cultures of intellectual inquiry, evidence, and communication. Many features of that natural evolution have arisen in recent developments in the production, dissemination, and interpretation of statistical information for public use.
Some of the material in Bailey (2023) and related publications can be viewed from this general perspective. The following discussion will focus on context related to: (1) connections with several dimensions of the literature on survey methodology and related areas of data science; (2) a few questions on (conditional) bias and other statistical properties of some classes of estimators commonly used with data from sample surveys and related information sources; and (3) communication with multiple stakeholder groups regarding statistical concepts, methodology, empirical results, and limitations thereof. Review of the landscape defined by (1)–(3) will not resolve all of the statistical and data science issues identified by Bailey (2023), but can provide traction on a range of approaches developed previously to address some of those issues; and can provide context and nuance regarding prospective changes in paradigms for statistical information ecosystems.
I do not have any expertise in opinion polling as such, so this discussion will not include particular comments on the polling-specific aspects of Bailey (2023).
Bailey (2023) directs substantial attention toward the importance of data quality. In considering that discussion, it is good to bear in mind several features of the extensive general literature on survey quality.
First, that literature often highlights the importance of balanced consideration of multiple dimensions of data quality, for example, accuracy, comparability, cross-sectional and temporal granularity, punctuality, relevance, interpretability, and accessibility. See, for example, Brackstone (1999), Eurostat (2015), Committee on National Statistics (2017), Federal Committee on Statistical Methodology (2020) and references cited therein. Some of those dimensions (e.g., relevance, granularity, and accessibility) are often viewed as inherently qualitative in nature. Those qualitative dimensions can have very important effects on practical decisions about the methodology and production processes for statistical information, and about value conveyed to key stakeholders and the general public.
Second, the quantitative dimension of ‘accuracy’ often receives especially prominent attention in the literature, frequently within the framework of ‘total survey error’ (TSE) models, and related extensions. TSE models are intended to provide a systematic approach to evaluation of multiple error sources that can affect accuracy. Sources of principal interest include those attributable to important features of the design and operations (defined broadly) of the statistical information production process, and often include frame error, sample selection error, nonresponse, and measurement error, as well as the effects of processing steps like editing, imputation, and weighting. See, for example, Amaya et al. (2020), Andersen et al. (1979), Biemer et al. (2017), Brown (1967), Deming (1944), Groves and Lyberg (2010), and references cited therein. Some notable approaches to total survey error consider complementary processes under the general headings of ‘measurement’ and ‘representation,’ respectively (e.g., Groves & Lyberg, 2010, Figure 3). For the current discussion, special attention may center on parts of the TSE literature on ‘representation’ that address issues with frame coverage error, sampling error, and nonresponse effects.
Third, challenges arising from declining response rates, questions related to nonprobability samples, and opportunities flowing from expanded availability of information from administrative records and other nonsurvey data sources, have led to numerous methodological developments that extend customary survey methodology and TSE models in ways that are potentially applicable to the issues highlighted in Bailey (2023). Taken as a whole, these developments provide a broad perspective on (a) a superpopulation model that leads to a given finite population realization thereof; (b) imperfections in frames, and related issues with frame enhancements and multiple-frame surveys; (c) sampling error (including cases with problematic selection mechanisms); (d) unit noncontact and nonresponse; (e ) nonresponse follow-up methods, including use of incentives and various forms of adaptive and responsive designs, as well as related uses of paradata to guide the applications of those follow-up methods; (f) measurement error; (g) effects of edit and imputation procedures; and (h) effects of various weighting procedures. See, for example, Beaumont (2020), Chen et al. (2020), Citro (2014), Coffey et al. (2023), Elliott and Valliant (2017), Harter et al. (2016), Imbens and Rubin (2015), Lohr (2021), Lohr and Raghunathan (2017), Mahalanobis (1944), Pfeffermann (2015), Rao and Molina (2015), Rosenblum et al. (2019), Tourangeau (2017), Tourangeau et al. (2017), van Berkel et al. (2020), Wagner et al. (2020), Yang and Kim (2020), and references cited therein.
In-depth analyses and applications of these approaches depend heavily on the availability of—or limitations on—empirical information regarding (i) the principal sources of variability in both the underlying survey variables
It would be of interest to explore in depth some ways in which the above-mentioned literature may offer additional perspectives on the concerns stated in Bailey (2023), and the related comments in Bailey (2023) regarding applications of methodology from Meng (2018), Sun et al. (2018), Jackman and Spahn (2019), and Bradley et al. (2021).
Much of the material in Bailey (2023) centered on that article’s expressions (1) and (2):
where the individual terms in these expressions are presented in Section 3 of that article. Several features of these expressions may warrant further discussion.
The author’s interpretation of expression (1) appears to consider some of its principal terms as random, and others as equal to certain expectations or conditional expectations, for example, a covariance term and a variance term. Consequently, it would be of interest to clarify the sources of variability under consideration, as well as the form of conditioning involved in a given expectation term, and the extent to which some of those features may be identical to, or different from, the specifications included in Meng (2018) and Bradley (2021), where applicable. In addition, it can be useful to align the above-mentioned sources of random variability with specific terms and models considered in the literature on sample design, nonprobability sampling, and the total survey error framework described in the section Connections With Other Literature.
For example, practical interpretation of the ‘data defect correlation’
Additional discussion of these sources of variability, and related specification of conditioning and integration used in moment expressions, also can help to clarify the Bailey (2023) description of
Depending on the conditioning under consideration, the term
On a related note, some of the discussion of expression (1) appears to involve convergence results, and it would be of interest to specify in somewhat more detail the asymptotic framework under consideration. This might involve, for example, standard triangular-array conditions for a sequence of finite-population realizations of a superpopulation model, along with a corresponding sequence of sample designs with increasing sample sizes, plus related sequences of collection and adjustment methods. As usual with asymptotics, the technical specifications are not of principal interest as formalities as such, but instead as potential guidance to connect underlying population, design, and operational features with approximations of competing effects for specific applications in, for example, Bailey (2023).
Similar comments apply to expression (2) in Bailey (2023). Notable cases include clarification of the intended statistical definition of ‘random contact’; nuanced distinctions among frame problems, sample inclusion, unit contact, partial or complete unit response, wave-level response, and (where applicable) proxy response. This in turn can shed additional light on related conditioning questions, can help in placing the proposed approach within the broader TSE framework, and can highlight connections with previous literature on noncontact nonresponse (e.g., Groves & Couper, 1998).
Clarification on the above-mentioned statistical questions can help to provide traction in developing and refining methodology to address issues with incomplete data and measurement error, in applying that methodology to specific cases, and in evaluating the strengths and limitations of those specific applications. For example, some previous literature on total survey error and adaptive design may lead to suggestions on prospective models for some specific components of the overall ‘data defect correlation’ term
Bailey (2023) also expressed a range of concerns regarding stakeholder communication in some polling work. In exploring these issues in additional detail, three general points may be of interest.
First, professional associations and other independent third parties can help to establish, socialize, incentivize, and enhance transparent and practical stakeholder communication on methodology and on related indicators of the quality of substantive empirical results. Salient examples include the Transparency Initiative of the American Association for Public Opinion Research (2023) and Committee on National Statistics (2021, 2022), and references cited therein. Comparison of quality standards and stakeholder communication approaches across government, academia, and the private sector also may be mutually beneficial, although some specifics naturally may vary across environments. Some examples from the government sector include: Statistics Canada (2019), United Kingdom Office of National Statistics (2023), United Nations Statistics Division (2019), U.S. Bureau of Labor Statistics (2023), U.S. Census Bureau (2023), and references cited therein. In addition, Biemer et al. (2014) provides an insightful discussion of internal organizational efforts to improve quality, aligned with external reporting on the results of those efforts. Also, Miller et al. (2020) present a careful review of numerous empirical analyses of nonresponse biases, with emphasis on analyses carried out to address requirements in U.S. Office of Management in Budget (2006). Some extensions of those analytic approaches may help one address some of the bias issues highlighted in Bailey (2023).
Second, other prominent areas of statistical applications also have encountered important controversies involving potential biases from the effects of nonrandom unit selection and unit participation. For example, Keiding and Louis (2016) provide a wide-ranging analysis of such issues in epidemiology. It will be of strong interest to compare and contrast the substantive features, and related communication issues, encountered in such controversies across different applications.
Third, Bailey (2023) uses some graphical methods to illustrate some of his principal points. In exploration of the issues highlighted in the sections Clarification of Some Mathematical Results and Statistical Properties of Some Classes above, one could consider use of additional visualization tools to communicate central concepts and empirical results in ways that resonate with key stakeholders and the general public. For example, exploration of multiple sources of random variability, as well as related complex multidimensional relationships and conditioning, can be enhanced through scatterplot matrices or other high-dimensional multivariate displays of estimated response propensities, regression residuals, and related terms. Similarly, visualization tools can help to convey information regarding complex measures of uncertainty (e.g., Weiskopf, 2022, and references cited therein) and related sensitivity analyses (e.g., Friendly, 2013; Guan, 2023; Jin et al, 2023).
The views expressed in this discussion are those of the author and do not represent the policies of the United States Census Bureau. The discussant thanks Wendy Martinez, Tommy Wright and Paul Beatty for insightful comments that improved the exposition of this material.
The author has no financial or non-financial disclosures to report for this article.
Abbasi A. M., Shad, M. Y., & Ahmed, A. (2022). On partial randomized response model using ranked set sampling. PLoS ONE, 17(11), Article e0277497. https://doi.org/10.1371/journal.pone.0277497
Amaya, A., Biemer, P. P., & Kinyon, D. (2020). Total error in a big data world: Adapting the TSE framework to big data. Journal of Survey Statistics and Methodology, 8(1), 89–119. https://doi.org/10.1093/jssam/smz056
American Association for Public Opinion Research. (2023). Transparency Initiative. https://aapor.org/standards-and-ethics/transparency-initiative/
Andersen, R., Kasper, J., & Frankel, M. R. (1979). Total survey error. Jossey-Bass.
Bailey, M. A. (2023). A new paradigm for polling. Harvard Data Science Review, 5(3). https://doi.org/10.1162/99608f92.9898eede
Beaumont, J.-F. (2020). Are probability surveys bound to disappear for the production of official statistics? Survey Methodology, 46(1), 1–28. https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2020001/article/00001-eng.pdf?st=1Xq2z4b2
Biemer, P. P. (2010). Total survey error: Design, implementation and evaluation. Public Opinion Quarterly, 74(5), 817–848.
Biemer, P. P., de Leeuw, E., Eckman, S., Edwards, B., Kreuter, F., Lyberg, L. E., Tucker, N. C., & West, B. T., eds. (2017). Total survey error in practice. Wiley.
Biemer, P P., Trewin, D., Bergdahl, H., & Japec, L. (2014). A system for managing the quality of official statistics. Journal of Official Statistics, 30(3), 381–415. https://doi.org/10.2478/jos-2014-0022
Brackstone, G. (1999). Managing data quality in a statistical agency. Survey Methodology, 25(2), 139–149. https://www150.statcan.gc.ca/n1/en/pub/12-001-x/1999002/article/4877-eng.pdf?st=oBwQYqFr
Bradley, V. C., Kuriwaki, S., Isakov , M., Sejdinovic, D., Meng, X.-L., & Flaxman, S. (2021). Unrepresentative big surveys significantly overestimated U.S. vaccine uptake. Nature, 600(7890), 695–700. https://doi.org/10.1038/s41586-021-04198-4
Brown, R. V. (1967). Evaluation of total survey error. Journal of the Royal Statistical Society Series D: The Statistician, 17(4), 335–356. https://doi.org/10.2307/2987089
Chen, Y., Li, P., & Wu, C. (2020) Doubly robust inference with nonprobability survey samples. Journal of the American Statistical Association, 115 (532), 2011–2021. https://doi.org/10.1080/01621459.2019.1677241
Citro, C. F. (2014). From multiple modes for surveys to multiple sources for estimates. Survey Methodology Journal, 40(2), 137–161.
Coffey, S. M., Damineni, J., Eltinge, J. L., Mathur, A., Varela, K. M., & Zotti, A. (2023). Some open questions on multiple-source extensions of adaptive-survey design concepts and methods. Working Paper 2023-03. Center for Economic Studies, U.S. Census Bureau. https://www.census.gov/library/working-papers/2023/adrm/CES-WP-23-03.html
Committee on National Statistics. (2017). Federal statistics, multiple data sources, and privacy protection: Next steps. The National Academies Press. https://nap.nationalacademies.org/catalog/24893/federal-statistics-multiple-data-sources-and-privacy-protection-next-steps
Committee on National Statistics. (2021). Principles and practices for a federal statistical agency. (7th ed.). The National Academies Press. https://www.nationalacademies.org/our-work/7th-edition-of-principles-and-practices-for-a-federal-statistical-agency
Committee on National Statistics. (2022). Transparency and reproducibility of federal statistics for the National Center for Science and Engineering Statistics. The National Academies Press. https://www.nationalacademies.org/our-work/transparency-and-reproducibility-of-federal-statistics-for-the-national-center-for-science-and-engineering-statistics
Deming, W. E. (1944). On errors in surveys. American Sociological Review, 9(4), 359–369. https://www.jstor.org/stable/2085979#metadata_info_tab_contents
Elliott, M. R., & Valliant, R. (2017). Inference for nonprobability samples. Statistical Science, 32(2), 249–264.
Eurostat. (2015). Quality Assurance Framework of the European Statistical System. https://ec.europa.eu/eurostat/documents/64157/4392716/ESS-QAF-V1-2final.pdf/bbf5970c-1adf-46c8-afc3-58ce177a0646
Federal Committee on Statistical Methodology. (2020). A framework for data quality (FCSM-20-04). https://nces.ed.gov/fcsm/pdf/FCSM.20.04_A_Framework_for_Data_Quality.pdf
Friendly, M. (2013). The generalized ridge trace plot: Visualizing bias and precision. Journal of Computational and Statistical Graphics, 22(1), 50–68. https://doi.org/10.1080/10618600.2012.681237
Groves, R. M., & Couper, M. P. (1998). Nonresponse in household interview surveys. Wiley. https://doi.org/10.1002/9781118490082
Groves, R. M., & Lyberg, L. (2010). Total survey error: Past, present and future. Public Opinion Quarterly, 74(5), 849–879.
Guan, L. (2023). Localized conformal prediction: A generalized inference framework for conformal prediction. Biometrika, 110(1), 33–50. https://doi.org/10.1093/biomet/asac040
Harter, R., Battaglia, M. P., Buskirk, T. D., Dillman, D. A., English, N., Fahimi, M., Frankel, M. R., Kennel, T., McMichael, J. P., McPhee, C. B., Montaquila, J., Yancey, T., & Zukerberg, A. L. (2016). Report of the AAPOR Task Force on address-based sampling. https://www.aapor.org/Education-Resources/Reports/Address-based-Sampling.aspx
Imbens, G. W., & Rubin, D. B. (2015). Causal inference for statistics, social and biomedical sciences: An introduction. Cambridge University Press.
Jackman, S., & Spahn, B. (2019). Why does the American National Election Study overestimate voter turnout? Political Analysis, 27(2), 193–207. https://doi.org/10.1017/pan.2018.36
Jin, Y., Ren, C., & Candès, E. J. (2023). Sensitivity analysis of individual treatment effects: A robust conformal inference approach. PNAS, 120(6), Article e2214889120. https://doi.org/10.1073/pnas.2214889120
Keiding, N., & Louis, T. A. (2016). Perils and potentials of self‐selected entry to epidemiological studies and surveys. Journal of the Royal Statistical Society Series A: Statistics in Society, 179(2), 319–376. https://doi.org/10.1111/rssa.12136
Lohr, S. L. (2021). Multiple-frame surveys for a multiple-data-source world. Survey Methodology, 47(2), 229–263. https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2021002/article/00008-eng.pdf?st=zAhbJs4d
Lohr, S. L., & Raghunathan, T. E. (2017). Combining survey data with other data sources. Statistical Science, 32(2), 293–312.
Mahalanobis, P.C. (1944). On large-scale sample surveys. Philosophical Transactions of the Royal Society B, 231(584), 329–451. https://royalsocietypublishing.org/toc/rstb1934/1944/231/584
Meng, X.-L. (2018). Statistical paradises and paradoxes in big data (1): Law of large populations, big data paradox, and the 2016 presidential election. The Annals of Applied Statistics, 12(2), 685–726. https://doi.org/10.1214/18-AOAS1161SF
Miller, P., Fakhouri, T. H., Earp, M., Downey Piscopo, K., Frenk, S. M., Christopher, E., & Madans, J. (2020). A systematic review of nonresponse bias studies in federally sponsored surveys. Federal Committee on Statistical Methodology (FCFM 20-02). https://nces.ed.gov/fcsm/pdf/A_Systematic_Review_of_Nonresponse_Bias_Studies_Federally_Sponsored_SurveysFCSM_20_02_032920.pdf
Pfeffermann, D. (2015). Methodological issues and challenges in the production of official statistics: 24th Annual Morris Hansen Lecture. Journal of Survey Statistics and Methodology, 3(4), 425–483. https://doi.org/10.1093/jssam/smv035
Rao, J. N. K., & I. Molina (2015). Small area estimation (2nd ed.) Wiley.
Rosenblum, M., Miller, P., Reist, B., Stuart, E., Thieme, M., & Louis, T. (2019). Adaptive design in surveys and clinical trials: Similarities, differences, and opportunities for cross-fertilization. Journal of the Royal Statistical Society Series A: Statistics in Society, 182(3), 963–982. https://doi.org/10.1111/rssa.12438
Statistics Canada. (2019). Statistics Canada Quality Guidelines (6th ed.). https://www150.statcan.gc.ca/n1/pub/12-539-x/12-539-x2019001-eng.htm
Sun, B., Liu, L., Miao, W., Wirth, K., Robins, J., & Tchetgen-Tchetgen, E. J. (2018). Semiparametric estimation with data missing not at random using an instrumental variable. Statistica Sinica, 28, 1965–1983. https://doi.org/10.5705%2Fss.202016.0324
Tourangeau, R. (2017). Presidential address: Paradoxes of nonresponse. Public Opinion Quarterly, 81(3), 803–814. https://doi.org/10.1093/poq/nfx031
Tourangeau, R. J., Brick, M., Lohr, S., & Li, J. (2017). Adaptive and responsive survey designs: A review and assessment. Journal of the Royal Statistical Society Series A: Statistics in Society, 180(1), 203–223.
United Kingdom Office of National Statistics. (2023). Quality in official statistics. Office for National Statistics. https://www.ons.gov.uk/methodology/methodologytopicsandstatisticalconcepts/qualityinofficialstatistics
United Nations Statistics Division. (2019). United Nations National Quality Assurance Frameworks manual for official statistics. https://unstats.un.org/unsd/methodology/dataquality/un-nqaf-manual/
U.S. Bureau of Labor Statistics. (2023). Handbook of methods. https://www.bls.gov/opub/hom/
U.S. Census Bureau. (2023). Statistical quality standards. https://www.census.gov/about/policies/quality/standards.html
U.S. Office of Management and Budget. (2006). Standards and guidelines for statistical surveys. https://georgewbush-whitehouse.archives.gov/omb/inforeg/statpolicy/standards_stat_surveys.pdf
van Berkel, K., van der Doef, S., & Schouten, B. (2020). Implementing adaptive survey design with an application to the Dutch health survey. Journal of Official Statistics, 36(3), 609–629.
Wagner, J., West, B. T., Coffey, S. M., & Elliott, M. R. (2020). Comparing the ability of regression modeling and Bayesian additive regression trees to predict costs in a responsive survey design context. Journal of Official Statistics, 36(4), 907–931. https://doi.org/10.2478/jos-2020-0043
Warner, S. (1965). Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309), 63–69.
Weiskopf, D. (2022). Uncertainty visualization: Concepts, methods, and applications in biological data visualization. Frontiers in Bioinformatics, 2. https://doi.org/10.3389/fbinf.2022.793819
Yang, S., & Kim, J. K. (2020). Statistical data integration in survey sampling: A review. Japanese Journal of Statistics and Data Science, 3, 625–650. https://doi.org/10.1007/s42081-020-00093-w
No rights reserved. This work was authored as part of the Contributor’s official duties as an Employee of the United States Government and is therefore a work of the United States Government. In accordance with 17 U.S.C. 105, no copyright protection is available for such works under U.S. law.