Skip to main content
SearchLoginLogin or Signup

Methodological Context and Communication on Data Quality: Discussion of Bailey (2023)

Published onOct 06, 2023
Methodological Context and Communication on Data Quality: Discussion of Bailey (2023)
key-enterThis Pub is a Commentary on


For a given field of scientific research and technological application, there is a natural evolution driven by changes in stakeholder needs; in the underlying technical infrastructure and empirical results; in the related operating environment; and in complex connections with other fields that can have different cultures of intellectual inquiry, evidence, and communication. Many features of that natural evolution have arisen in recent developments in the production, dissemination, and interpretation of statistical information for public use.

Some of the material in Bailey (2023) and related publications can be viewed from this general perspective. The following discussion will focus on context related to: (1) connections with several dimensions of the literature on survey methodology and related areas of data science; (2) a few questions on (conditional) bias and other statistical properties of some classes of estimators commonly used with data from sample surveys and related information sources; and (3) communication with multiple stakeholder groups regarding statistical concepts, methodology, empirical results, and limitations thereof. Review of the landscape defined by (1)–(3) will not resolve all of the statistical and data science issues identified by Bailey (2023), but can provide traction on a range of approaches developed previously to address some of those issues; and can provide context and nuance regarding prospective changes in paradigms for statistical information ecosystems.

I do not have any expertise in opinion polling as such, so this discussion will not include particular comments on the polling-specific aspects of Bailey (2023).

Connections With Other Literature on Survey Methodology and Data Science

Bailey (2023) directs substantial attention toward the importance of data quality. In considering that discussion, it is good to bear in mind several features of the extensive general literature on survey quality.

First, that literature often highlights the importance of balanced consideration of multiple dimensions of data quality, for example, accuracy, comparability, cross-sectional and temporal granularity, punctuality, relevance, interpretability, and accessibility. See, for example, Brackstone (1999), Eurostat (2015), Committee on National Statistics (2017), Federal Committee on Statistical Methodology (2020) and references cited therein. Some of those dimensions (e.g., relevance, granularity, and accessibility) are often viewed as inherently qualitative in nature. Those qualitative dimensions can have very important effects on practical decisions about the methodology and production processes for statistical information, and about value conveyed to key stakeholders and the general public.

Second, the quantitative dimension of ‘accuracy’ often receives especially prominent attention in the literature, frequently within the framework of ‘total survey error’ (TSE) models, and related extensions. TSE models are intended to provide a systematic approach to evaluation of multiple error sources that can affect accuracy. Sources of principal interest include those attributable to important features of the design and operations (defined broadly) of the statistical information production process, and often include frame error, sample selection error, nonresponse, and measurement error, as well as the effects of processing steps like editing, imputation, and weighting. See, for example, Amaya et al. (2020), Andersen et al. (1979), Biemer et al. (2017), Brown (1967), Deming (1944), Groves and Lyberg (2010), and references cited therein. Some notable approaches to total survey error consider complementary processes under the general headings of ‘measurement’ and ‘representation,’ respectively (e.g., Groves & Lyberg, 2010, Figure 3). For the current discussion, special attention may center on parts of the TSE literature on ‘representation’ that address issues with frame coverage error, sampling error, and nonresponse effects.

Third, challenges arising from declining response rates, questions related to nonprobability samples, and opportunities flowing from expanded availability of information from administrative records and other nonsurvey data sources, have led to numerous methodological developments that extend customary survey methodology and TSE models in ways that are potentially applicable to the issues highlighted in Bailey (2023). Taken as a whole, these developments provide a broad perspective on (a) a superpopulation model that leads to a given finite population realization thereof; (b) imperfections in frames, and related issues with frame enhancements and multiple-frame surveys; (c) sampling error (including cases with problematic selection mechanisms); (d) unit noncontact and nonresponse; (e ) nonresponse follow-up methods, including use of incentives and various forms of adaptive and responsive designs, as well as related uses of paradata to guide the applications of those follow-up methods; (f) measurement error; (g) effects of edit and imputation procedures; and (h) effects of various weighting procedures. See, for example, Beaumont (2020), Chen et al. (2020), Citro (2014), Coffey et al. (2023), Elliott and Valliant (2017), Harter et al. (2016), Imbens and Rubin (2015), Lohr (2021), Lohr and Raghunathan (2017), Mahalanobis (1944), Pfeffermann (2015), Rao and Molina (2015), Rosenblum et al. (2019), Tourangeau (2017), Tourangeau et al. (2017), van Berkel et al. (2020), Wagner et al. (2020), Yang and Kim (2020), and references cited therein.

In-depth analyses and applications of these approaches depend heavily on the availability of—or limitations on—empirical information regarding (i) the principal sources of variability in both the underlying survey variables YY and the related error terms (including, but not limited to, coverage and response indicators); (ii) auxiliary variables (e.g., demographic or geographical factors, as well as paradata related to the data capture processes) that may be related to (i); (iii) the extent to which we reasonably can characterize, measure, model, and predict phenomena related to (i) and (ii); (iv) reliable production-level availability of auxiliary variables from (ii) at the right level of granularity and vintage; and (v) the degree to which the results for (i)–(iv) are consistent across different (sub)populations, target estimands, and methodological features.

It would be of interest to explore in depth some ways in which the above-mentioned literature may offer additional perspectives on the concerns stated in Bailey (2023), and the related comments in Bailey (2023) regarding applications of methodology from Meng (2018), Sun et al. (2018), Jackman and Spahn (2019), and Bradley et al. (2021).

Statistical Properties of Some Classes of Survey-Related Estimators

Much of the material in Bailey (2023) centered on that article’s expressions (1) and (2):

Yn YN= ρR, Y{n1(Nn)}1/2σY(1){\overline{Y}}_{n} - \ {\overline{Y}}_{N} = \ \rho_{R,\ Y}\left\{ n^{- 1}(N - n) \right\}^{1/2}\sigma_{Y} \tag{1}

Yn YN= ρR, Y{pr1(1pr)}1/2σY(2){\overline{Y}}_{n} - \ {\overline{Y}}_{N} = \ \rho_{R,\ Y}\left\{ p_{r}^{- 1}\left( 1 - p_{r} \right) \right\}^{1/2}\sigma_{Y} \tag{2}

where the individual terms in these expressions are presented in Section 3 of that article. Several features of these expressions may warrant further discussion.

The author’s interpretation of expression (1) appears to consider some of its principal terms as random, and others as equal to certain expectations or conditional expectations, for example, a covariance term and a variance term. Consequently, it would be of interest to clarify the sources of variability under consideration, as well as the form of conditioning involved in a given expectation term, and the extent to which some of those features may be identical to, or different from, the specifications included in Meng (2018) and Bradley (2021), where applicable. In addition, it can be useful to align the above-mentioned sources of random variability with specific terms and models considered in the literature on sample design, nonprobability sampling, and the total survey error framework described in the section Connections With Other Literature.

For example, practical interpretation of the ‘data defect correlation’ ρR, Y\rho_{R,\ Y} as ‘the correlation in the population between RR and YY,’ would be enhanced by clarifying which of the above-mentioned sources of variability are conditioned on, and integrated over, respectively, for the specific applications considered by Bailey (2023). In addition, if the conditioning involves particular models, it would be of interest to specify the model forms and conditioning variables, as well as related empirical results on model development and validation for a given application. The latter may be of particular interest in the use of joint models for survey outcomes YY and for selection, contact, or response indicators.

Additional discussion of these sources of variability, and related specification of conditioning and integration used in moment expressions, also can help to clarify the Bailey (2023) description of σY\sigma_{Y} as “the square root of the variance of YY,” with the related label “data difficulty” from Meng (2018).

Depending on the conditioning under consideration, the term σY\sigma_{Y} often can be viewed as dependent on both the variability of the underlying true YY terms (as considered through either a finite-population approach or a related superpopulation model), and on measurement error terms (generally modeled through some supplementary superpopulation model, possibly conditional on specified auxiliary variables). Standard survey methodology then seeks to reduce σY\sigma_{Y} through (a) formation of strata at the stage of initial design; (b) improved questionnaires and related field methods to reduce confusion by survey respondents, and thus reduce measurement error biases or variances; and, in some cases, (c) poststratification after data collection has taken place. Case (c) is of special interest for extensions of result (1) to individual weighting cells.

On a related note, some of the discussion of expression (1) appears to involve convergence results, and it would be of interest to specify in somewhat more detail the asymptotic framework under consideration. This might involve, for example, standard triangular-array conditions for a sequence of finite-population realizations of a superpopulation model, along with a corresponding sequence of sample designs with increasing sample sizes, plus related sequences of collection and adjustment methods. As usual with asymptotics, the technical specifications are not of principal interest as formalities as such, but instead as potential guidance to connect underlying population, design, and operational features with approximations of competing effects for specific applications in, for example, Bailey (2023).

Similar comments apply to expression (2) in Bailey (2023). Notable cases include clarification of the intended statistical definition of ‘random contact’; nuanced distinctions among frame problems, sample inclusion, unit contact, partial or complete unit response, wave-level response, and (where applicable) proxy response. This in turn can shed additional light on related conditioning questions, can help in placing the proposed approach within the broader TSE framework, and can highlight connections with previous literature on noncontact nonresponse (e.g., Groves & Couper, 1998).

Impact of Mathematical Conditions on Properties of Prospective Adjustment Methods

Clarification on the above-mentioned statistical questions can help to provide traction in developing and refining methodology to address issues with incomplete data and measurement error, in applying that methodology to specific cases, and in evaluating the strengths and limitations of those specific applications. For example, some previous literature on total survey error and adaptive design may lead to suggestions on prospective models for some specific components of the overall ‘data defect correlation’ term ρR, Y\rho_{R,\ Y} under a given set of conditions. In addition, clarification on the topics outlined in the section Clarification of Some Mathematical Results may offer additional insights into the properties of the observational-data methods, random response instrument methods, and doubly robust estimation methods considered in Bailey (2023, Section 4), as well as alternative methods considered by other authors, including those cited in the section Connections With Other Literature above. That clarification also may help to flesh out the comparison and contrast on the conceptual foundations and implementation details of, respectively, the ‘random response’ methodology considered in Bailey (2023), and previous work on ‘random response’ methods intended to protect respondent privacy and to enhance cooperation of prospective respondents, for example, Warner (1965) and Abbasi (2022) and references cited therein.

Enhancing Clear, Convincing, and Nuanced Communication With Key Stakeholder Groups

Bailey (2023) also expressed a range of concerns regarding stakeholder communication in some polling work. In exploring these issues in additional detail, three general points may be of interest.

First, professional associations and other independent third parties can help to establish, socialize, incentivize, and enhance transparent and practical stakeholder communication on methodology and on related indicators of the quality of substantive empirical results. Salient examples include the Transparency Initiative of the American Association for Public Opinion Research (2023) and Committee on National Statistics (2021, 2022), and references cited therein. Comparison of quality standards and stakeholder communication approaches across government, academia, and the private sector also may be mutually beneficial, although some specifics naturally may vary across environments. Some examples from the government sector include: Statistics Canada (2019), United Kingdom Office of National Statistics (2023), United Nations Statistics Division (2019), U.S. Bureau of Labor Statistics (2023), U.S. Census Bureau (2023), and references cited therein. In addition, Biemer et al. (2014) provides an insightful discussion of internal organizational efforts to improve quality, aligned with external reporting on the results of those efforts. Also, Miller et al. (2020) present a careful review of numerous empirical analyses of nonresponse biases, with emphasis on analyses carried out to address requirements in U.S. Office of Management in Budget (2006). Some extensions of those analytic approaches may help one address some of the bias issues highlighted in Bailey (2023).

Second, other prominent areas of statistical applications also have encountered important controversies involving potential biases from the effects of nonrandom unit selection and unit participation. For example, Keiding and Louis (2016) provide a wide-ranging analysis of such issues in epidemiology. It will be of strong interest to compare and contrast the substantive features, and related communication issues, encountered in such controversies across different applications.

Third, Bailey (2023) uses some graphical methods to illustrate some of his principal points. In exploration of the issues highlighted in the sections Clarification of Some Mathematical Results and Statistical Properties of Some Classes above, one could consider use of additional visualization tools to communicate central concepts and empirical results in ways that resonate with key stakeholders and the general public. For example, exploration of multiple sources of random variability, as well as related complex multidimensional relationships and conditioning, can be enhanced through scatterplot matrices or other high-dimensional multivariate displays of estimated response propensities, regression residuals, and related terms. Similarly, visualization tools can help to convey information regarding complex measures of uncertainty (e.g., Weiskopf, 2022, and references cited therein) and related sensitivity analyses (e.g., Friendly, 2013; Guan, 2023; Jin et al, 2023).

Acknowledgments and Disclaimer

The views expressed in this discussion are those of the author and do not represent the policies of the United States Census Bureau. The discussant thanks Wendy Martinez, Tommy Wright and Paul Beatty for insightful comments that improved the exposition of this material.

Disclosure Statement

The author has no financial or non-financial disclosures to report for this article.


Abbasi A. M., Shad, M. Y., & Ahmed, A. (2022). On partial randomized response model using ranked set sampling. PLoS ONE, 17(11), Article e0277497.

Amaya, A., Biemer, P. P., & Kinyon, D. (2020). Total error in a big data world: Adapting the TSE framework to big data. Journal of Survey Statistics and Methodology, 8(1), 89–119.

American Association for Public Opinion Research. (2023). Transparency Initiative.

Andersen, R., Kasper, J., & Frankel, M. R. (1979). Total survey error. Jossey-Bass.

Bailey, M. A. (2023). A new paradigm for polling. Harvard Data Science Review, 5(3).

Beaumont, J.-F. (2020). Are probability surveys bound to disappear for the production of official statistics? Survey Methodology, 46(1), 1–28.

Biemer, P. P. (2010). Total survey error: Design, implementation and evaluation. Public Opinion Quarterly, 74(5), 817–848.

Biemer, P. P., de Leeuw, E., Eckman, S., Edwards, B., Kreuter, F., Lyberg, L. E., Tucker, N. C., & West, B. T., eds. (2017). Total survey error in practice. Wiley.

Biemer, P P., Trewin, D., Bergdahl, H., & Japec, L. (2014). A system for managing the quality of official statistics. Journal of Official Statistics, 30(3), 381–415.

Brackstone, G. (1999). Managing data quality in a statistical agency. Survey Methodology, 25(2), 139–149.

Bradley, V. C., Kuriwaki, S., Isakov , M., Sejdinovic, D., Meng, X.-L., & Flaxman, S. (2021). Unrepresentative big surveys significantly overestimated U.S. vaccine uptake. Nature, 600(7890), 695–700.

Brown, R. V. (1967). Evaluation of total survey error. Journal of the Royal Statistical Society Series D: The Statistician, 17(4), 335–356.

Chen, Y., Li, P., & Wu, C. (2020) Doubly robust inference with nonprobability survey samples. Journal of the American Statistical Association115 (532), 2011–2021.

Citro, C. F. (2014). From multiple modes for surveys to multiple sources for estimates. Survey Methodology Journal, 40(2), 137–161.

Coffey, S. M., Damineni, J., Eltinge, J. L., Mathur, A., Varela, K. M., & Zotti, A. (2023). Some open questions on multiple-source extensions of adaptive-survey design concepts and methods. Working Paper 2023-03. Center for Economic Studies, U.S. Census Bureau.

Committee on National Statistics. (2017). Federal statistics, multiple data sources, and privacy protection: Next steps. The National Academies Press.

Committee on National Statistics. (2021). Principles and practices for a federal statistical agency. (7th ed.). The National Academies Press.

Committee on National Statistics. (2022). Transparency and reproducibility of federal statistics for the National Center for Science and Engineering Statistics. The National Academies Press.

Deming, W. E. (1944). On errors in surveys. American Sociological Review, 9(4), 359–369.

Elliott, M. R., & Valliant, R. (2017). Inference for nonprobability samples. Statistical Science, 32(2), 249–264.

Eurostat. (2015). Quality Assurance Framework of the European Statistical System.

Federal Committee on Statistical Methodology. (2020). A framework for data quality (FCSM-20-04).

Friendly, M. (2013). The generalized ridge trace plot: Visualizing bias and precision. Journal of Computational and Statistical Graphics, 22(1), 50–68.

Groves, R. M., & Couper, M. P. (1998). Nonresponse in household interview surveys. Wiley.

Groves, R. M., & Lyberg, L. (2010). Total survey error: Past, present and future. Public Opinion Quarterly, 74(5), 849–879.

Guan, L. (2023). Localized conformal prediction: A generalized inference framework for conformal prediction. Biometrika, 110(1), 33–50.

Harter, R., Battaglia, M. P., Buskirk, T. D., Dillman, D. A., English, N., Fahimi, M., Frankel, M. R., Kennel, T., McMichael, J. P., McPhee, C. B., Montaquila, J., Yancey, T., & Zukerberg, A. L. (2016). Report of the AAPOR Task Force on address-based sampling.

Imbens, G. W., & Rubin, D. B. (2015). Causal inference for statistics, social and biomedical sciences: An introduction. Cambridge University Press.

Jackman, S., & Spahn, B. (2019). Why does the American National Election Study overestimate voter turnout? Political Analysis, 27(2), 193–207.

Jin, Y., Ren, C., & Candès, E. J. (2023). Sensitivity analysis of individual treatment effects: A robust conformal inference approach. PNAS, 120(6), Article e2214889120.

Keiding, N., & Louis, T. A. (2016). Perils and potentials of self‐selected entry to epidemiological studies and surveys. Journal of the Royal Statistical Society Series A: Statistics in Society, 179(2), 319–376.

Lohr, S. L. (2021). Multiple-frame surveys for a multiple-data-source world. Survey Methodology, 47(2), 229–263.

Lohr, S. L., & Raghunathan, T. E. (2017). Combining survey data with other data sources. Statistical Science, 32(2), 293–312.

Mahalanobis, P.C. (1944). On large-scale sample surveys. Philosophical Transactions of the Royal Society B, 231(584), 329–451.

Meng, X.-L. (2018). Statistical paradises and paradoxes in big data (1): Law of large populations, big data paradox, and the 2016 presidential election. The Annals of Applied Statistics, 12(2), 685–726.

Miller, P., Fakhouri, T. H., Earp, M., Downey Piscopo, K., Frenk, S. M., Christopher, E., & Madans, J. (2020). A systematic review of nonresponse bias studies in federally sponsored surveys. Federal Committee on Statistical Methodology (FCFM 20-02).

Pfeffermann, D. (2015). Methodological issues and challenges in the production of official statistics: 24th Annual Morris Hansen Lecture. Journal of Survey Statistics and Methodology, 3(4), 425–483.

Rao, J. N. K., & I. Molina (2015). Small area estimation (2nd ed.) Wiley.

Rosenblum, M., Miller, P., Reist, B., Stuart, E., Thieme, M., & Louis, T. (2019). Adaptive design in surveys and clinical trials: Similarities, differences, and opportunities for cross-fertilization. Journal of the Royal Statistical Society Series A: Statistics in Society, 182(3), 963–982.

Statistics Canada. (2019). Statistics Canada Quality Guidelines (6th ed.).

Sun, B., Liu, L., Miao, W., Wirth, K., Robins, J., & Tchetgen-Tchetgen, E. J. (2018). Semiparametric estimation with data missing not at random using an instrumental variable. Statistica Sinica, 28, 1965–1983.

Tourangeau, R. (2017). Presidential address: Paradoxes of nonresponse. Public Opinion Quarterly, 81(3), 803–814.

Tourangeau, R. J., Brick, M., Lohr, S., & Li, J. (2017). Adaptive and responsive survey designs: A review and assessment. Journal of the Royal Statistical Society Series A: Statistics in Society, 180(1), 203–223.

United Kingdom Office of National Statistics. (2023). Quality in official statistics. Office for National Statistics.

United Nations Statistics Division. (2019). United Nations National Quality Assurance Frameworks manual for official statistics.

U.S. Bureau of Labor Statistics. (2023). Handbook of methods.

U.S. Census Bureau. (2023). Statistical quality standards.

U.S. Office of Management and Budget. (2006). Standards and guidelines for statistical surveys.

van Berkel, K., van der Doef, S., & Schouten, B. (2020). Implementing adaptive survey design with an application to the Dutch health survey. Journal of Official Statistics, 36(3), 609–629.

Wagner, J., West, B. T., Coffey, S. M., & Elliott, M. R. (2020). Comparing the ability of regression modeling and Bayesian additive regression trees to predict costs in a responsive survey design context. Journal of Official Statistics, 36(4), 907–931.

Warner, S. (1965). Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309), 63–69.

Weiskopf, D. (2022). Uncertainty visualization: Concepts, methods, and applications in biological data visualization. Frontiers in Bioinformatics, 2.

Yang, S., & Kim, J. K. (2020). Statistical data integration in survey sampling: A review. Japanese Journal of Statistics and Data Science, 3, 625–650.

No rights reserved. This work was authored as part of the Contributor’s official duties as an Employee of the United States Government and is therefore a work of the United States Government. In accordance with 17 U.S.C. 105, no copyright protection is available for such works under U.S. law. 

1 of 11
Another Commentary on A New Paradigm for Polling
Another Commentary on A New Paradigm for Polling
Another Commentary on A New Paradigm for Polling
No comments here
Why not start the discussion?