Skip to main content

# Introduction

# Meng’s Equation and Random Variables

# Advantages of Probability Sampling

# Weighting and Nonresponse Modeling

# Randomized Response Instruments

#### Table 1. Fictional data from an experiment with randomly assigned sampling protocols.

# Improving Polls

# Acknowledgments

# Disclosure Statement

# References

##### Connections

1 of 11

Assuming a Nonresponse Model Does Not Make It True

Published onOct 05, 2023

Assuming a Nonresponse Model Does Not Make It True

Bailey (2023) proposes a new paradigm for polling based on an algebraic identity for the error in a sample mean discussed by Meng (2018). I too appreciated the clear exposition of survey error in Meng’s article, and relied heavily on insights in Meng (2018) and Rao (2021) when writing the new chapter on nonprobability sampling in the latest edition of my sampling textbook (Lohr, 2022, chap. 15). Although it has long been known that nonresponse bias for estimating a population mean depends on the correlation between survey participation and the characteristic being measured, Meng discussed the implications of that correlation in the era of big data. Meng provided a framework for addressing questions such as: “Which one should I trust more: a 1% survey with 60% response rate or a self-reported administrative dataset covering 80% of the population?” (Meng, 2018, p. 685). One of the important conclusions in Meng’s article is the tremendous value of information from probability samples relative to other types of samples, and he gave an example in which the mean squared error (MSE) from a convenience sample of *half* of a population of one billion people (with expected correlation 0.05) is larger than the MSE from a simple random sample (SRS) of size 400.

Bailey (2023), arguing that today’s low poll response rates mean that “random sampling is dead,” proposes replacing inference based on probability sampling with efforts to measure the correlations between survey participation and specific outcome variables. Many of his conclusions and recommendations, however, rest on strong implicit assumptions about participation mechanisms. In this discussion I explore some of those assumptions and examine their implications for inference.

The discussion is organized as follows. I begin by reviewing Meng’s equation and defining random variables that may be used for statistical inference. I then discuss the special characteristics of probability samples and argue that there are benefits to selecting the initial sample using probability methods even if the response rate is low. After considering the role of auxiliary information for reducing bias and examining assumptions underlying randomized response instrument models, I conclude with some suggestions for moving forward.

Let

$\begin{aligned}
\ \ \
\bar{y} - \bar{y}_{\mathcal{U}} &= \frac{\sum_{i=1}^N R_i (y_i - \bar{y}_{\mathcal{U}})}{\sum_{i=1}^N R_i} \ \ \ &(1)
\\
\ \ \
&= \textrm{Corr}(R,y) \times \frac{N-1}{N}\frac{S_R}{\bar{R}_{\mathcal{U}}} \times S_y \ \ \ &(2)
\\
\ \ \
&= \textrm{Corr}(R,y) \times \sqrt{\frac{N-1}{n} \left(1-\frac{n}{N}\right)} \times S_y, \ \ \ &(3)
\end{aligned}$

where

$\begin{aligned}
S^2_y = \frac{1}{N-1} \sum_{i=1}^N (y_i - \bar{y}_{\mathcal{U}})^2
\end{aligned}$

and

$\begin{aligned}
S^2_R &= \frac{1}{N-1} \sum_{i=1}^N (R_i - \bar{R}_{\mathcal{U}})^2
=\frac{n}{N-1} \left( 1 - \frac{n}{N} \right)
\end{aligned}$

denote the population variances of

$\begin{aligned}
\textrm{Corr}(R,y) &= \frac{\textrm{Cov}(R,y)}{S_y S_R} = \frac{\sum_{i=1}^N (R_i -\bar{R}_{\mathcal{U}}) (y_i - \bar{y}_{\mathcal{U}})}{(N-1) S_y S_R}
\end{aligned}$

is the population correlation.

The formulation of survey error in Equation (3), involving the correlation between the outcome variable of interest and the response mechanism, has been known for a long time. Hartley and Ross (1954) used a similar algebraic identity when deriving the bias of the ratio estimator, and numerous authors have noted that nonresponse bias depends on the correlation between response indicators or propensities and

Equation (3) is an algebraic identity for the difference between the mean of the particular sample collected (the population members with

In model-based inference, the values

In design-based sampling theory—perhaps we should call it participation-based sampling theory in this context since not all samples are designed—the quantities

For a sample with some degree of self-selection, whether a convenience sample of volunteers or a probability sample with nonresponse, the values of

Many of the conclusions in Bailey (2023) appear to be predicated on an unstated assumption that the distribution of the random variables

There is no reason to believe, however, that the distribution of the latent variables

In a probability sample with full response, participation is described by *representative* in the technical sense of the word because it allows construction of confidence intervals that have the claimed coverage probability “*whatever the unknown properties of the population*” (Neyman, 1934, p. 586, emphasis in original).

The development of probability sampling theory was a tremendous breakthrough for the age-old problem of how to generalize from a sample we have seen to population members we have not seen. Probability sampling allows this generalization—along with an assessment of the accuracy of estimates—because the probability distribution of the participation indicator is fully under the control of the sampler. The pioneers of probability sampling, however, were well aware that the validity of inferences under probability sampling theory depends on having full response. For example, Deming (1950, p. 35) wrote: “A sample is no longer a probability sample if it is ruined by nonresponse or any other difficulty of execution,” and argued that even a nonresponse rate of 5% could seriously affect results.

When there is nonresponse within a probability sample, the participation indicator can be written as

$\begin{aligned}
\ \ \ \ \ \
E\left[\bar{y}_r - \bar{y}_{\mathcal{U}} \right]
&=
E_T\left[\textrm{Corr}(T,y) \times \frac{S_T}{\bar{T}_{\mathcal{U}}} \times S_y \right] + O\left(\frac{1}{n}\right). \ \ \
&(4)
\end{aligned}$

Equation (4) has approximately the same form as the expected value of Equation (2), with

Consider two survey designs. The first sends a broadcast invitation to everyone on the email list (with participation indicator

The bias will be reduced for the SRS if the pollster modifies the recruitment procedure to obtain a lower expected value of

Probability sampling by itself does not cure the problem of nonresponse bias, but it allows the pollster to concentrate resources on obtaining high-quality data from a selected sample instead of spreading those efforts among the whole population. Lohr (2022) argued that starting with a probability sample has several advantages even when response rates are low:

The sampling frame for a probability sample is well defined, and many frames used in practice have high coverage. When coverage is incomplete by design (for example, when the frame excludes persons in institutions) the probability sampler can limit inference to the frame population. Many nonprobability samples lack a sampling frame, which makes it challenging to assess coverage.

The sampler can devote more resources toward persuading members of the selected sample to participate. This changes the distribution of the response indicators

$T_i$ . Nonresponse follow-up can concentrate on units that have low response propensities for the initial contact method, which may reduce the variance of the final response propensities.It is more difficult for a malevolent actor to influence a probability sample. A population member can participate at most once in a probability survey, thus preventing situations in which one person or machine clicks the ‘take this survey’ button 10,000 times. Even when pollsters have methods for preventing or detecting multiple responses, a nonprobability sample might be manipulable. For example, a malevolent actor knowledgeable about survey weighting could arrange for a large group of people with a variety of claimed demographic characteristics to sign up for an online opt-in poll where participants are recruited by broadcast advertisement. None of the auxiliary information that would be available for weighting or nonresponse modeling would be able to compensate for the bias caused because all of these demographically diverse poll participants say they have opinion X.

A would-be malevolent actor cannot arrange for people with opinion X to flood a probability sample because the initial selection is random.The probability sample often has more information available that can be used for weighting, imputation, or other types of nonresponse modeling. For example, a sample drawn from a list of registered voters may have information on age, political party, and past voting behavior that can be used to study and adjust for nonresponse bias.

If

$\begin{aligned}
\bar{y}_w &= \sum_{i=1}^N V_i y_i / \sum_{i=1}^N V_i,
\end{aligned}$

where

$\begin{aligned}
\bar{y}_w - \bar{y}_{\mathcal{U}} &= \frac{\sum_{i=1}^N V_i (y_i - \bar{y}_{\mathcal{U}})}{\sum_{i=1}^N V_i}
= \textrm{Corr}(V,y) \times \frac{N-1}{N}\frac{S_V}{\bar{V}_{\mathcal{U}}} \times S_y ,
\end{aligned}$

where

Weighting is a powerful tool for reducing bias, and high-quality auxiliary information can improve estimates even from surveys with a high degree of selection bias. For example, Bailey (2023) cites the 1936 *Literary Digest* survey as an example of a sampling “fiasco” and indeed the unweighted estimates were poor. Fifty-four percent of the *Digest*’s sample of nearly 2.4 million respondents said they were supporting Landon for president, but Roosevelt ended up winning the election with more than 60% of the popular vote. If, however, the *Digest* editors had used available auxiliary information to weight the data, the weighting would have partially corrected for the overrepresentation of Republican voters in the sample and led the editors to predict that Roosevelt would win the election (Lohr & Brick, 2017).

As Bailey (2023) mentions, however, most inferences based on weighted estimates rely on an assumption that the response mechanism is missing at random (MAR) and that the weighting model removes the nonresponse bias. If the MAR assumption is not true, then confidence intervals based on formulas from probability sampling no longer have the stated coverage probability.

The problem is that one cannot assess the MAR assumption from the sample itself—one needs external information such as independent estimates of the means of outcome variables. For example, Mercer et al. (2018), comparing weighted estimates from online opt-in surveys to benchmarks from high-quality federal surveys, found that even the most effective weighting adjustments were able to remove only about 30% of the bias from the unweighted estimates, and thus the response mechanisms for those surveys cannot be considered to be MAR. They concluded that the quality of the auxiliary information mattered more than the particular statistical method for using such information, and advocated obtaining a richer set of auxiliary variables that go beyond core demographics.

Because weighted estimates can still be biased, there is a large body of research exploring not-missing-at-random (NMAR) models in which the response mechanism depends on the values of unobserved data as well as on observed data. The statistical properties for estimators calculated from these models depend on the validity of assumptions made about the nonparticipants—theorems often have the form ‘If assumptions A, B, and C hold, then our proposed estimate is approximately unbiased with approximate variance given in Equation D.’

NMAR models are useful for exploring potential nonresponse mechanisms and are an important part of nonresponse bias analyses. Users can assess whether the explicitly stated assumptions of a particular NMAR model apply to their own circumstances. But, as Molenberghs et al. (2008) pointed out, “the correctness of the alternative model can only be verified in as far as it fits the *observed* data” (p. 372, emphasis in original). Molenberghs et al. (2008) showed that every NMAR model has a MAR counterpart that fits the observed data equally well. MAR and NMAR models with equivalent fits may give different estimates of

Bailey (2023) suggests collecting additional auxiliary information for modeling nonresponse through use of ‘randomized response instruments’ in which persons in the selected sample are randomly assigned to different survey protocols that are designed to have different response rates. Many pollsters use randomized experiments to test questions and explore methods for improving response rates, but Bailey (2023) claims that assigning some of the selected sample members to a protocol that is known to achieve a low response rate can “allow us to assess whether the response mechanism is ignorable or not” and “create population estimates that [are] purged of the malign effects of

This sounds like it is too good to be true, and I believe it is. Every method of nonresponse adjustment or bias estimation requires assumptions about the nonrespondents. In this case, the very strong assumptions required for randomized response instrument methods are implicit in the model identifiability conditions.

Consider the fictional data in Table 1 from an experiment with two sampling protocols. An SRS of size 10,000 is selected from the population. Sample members who are randomly assigned to the group with

Respondents | |||||

Protocol | Nonrespondents | ||||

5,000 | 4,000 | 350 | 650 | 0.65 | |
---|---|---|---|---|---|

5,000 | 3,000 | 1,000 | 1,000 | 0.50 | |

All | 10,000 | 7,000 | 1,350 | 1,650 | 0.55 |

If there were no nonresponse bias, we would expect the means of the respondents from the two protocols in Table 1 to be approximately equal because the sample members were randomly assigned. Thus, Table 1 clearly exhibits a problem with nonresponse bias in one or both

Figure 1 depicts three of the many possible relationships between response probabilities and

The saturated logistic regression model relating

$\begin{aligned}
\ \ \ \ \ \ \ \textrm{logit}\, P(R=1 | x,y) &= \theta_0 + \theta_1 x + \theta_2 y + \theta_3 xy \ \ \ \ \ \ \ \
&(5)
\end{aligned}$

but Sun et al. (2018) showed in their Example 1 that this saturated model is not identifiable and therefore cannot be fit. Thus, an assumption must be made to reduce the number of parameters. Sun et al. (2018) fixed

$\begin{aligned}
\ \ \ \ \ \ \ \ \ \ \
\textrm{logit}\, P(R=1 | x,y) &= \theta_0 + \theta_1 x + \theta_2 y \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
&(6)
\end{aligned}$

is identifiable.

When

The assumption that

The protocols for many probability surveys have been refined through years of research and experimentation, and often nonresponse follow-up efforts are designed to raise the response rates for subpopulations that are underrepresented after initial contact efforts. If a well-researched protocol 1 has a moderately high response rate, and protocol 0 modifies protocol 1 so as to reduce the response rate (for example, by skipping nonresponse follow-up), I think assuming a structure similar to Figure 1(a), where protocol 1 results in MAR data, is more reasonable than assuming a structure such as (c) that forces the interaction to be zero. And if the structure in (a) holds, then one should take the entire sample using protocol 1 because estimates from that protocol are unbiased. Why would a sampler want to allocate half of the sample to an inferior protocol with a lower response rate and then adjust the weights for the superior protocol according to the results from the inferior protocol?

In this example, I assumed no other auxiliary information was available. If other information is available, the weighting adjustments from the randomized response instrument method are likely to be less extreme because some of the difference between the protocols will be explained by the other auxiliary variables. But anyone using this method must still assume that, after conditioning on other covariates, the response instrument does not interact with

In general, different sampling protocols are likely to have different relationships between *expect* to see an interaction. An incentive will often raise response rates more for some population subgroups than others. One would not expect to be able to learn about the expected correlation between

The current paradigm used by pollsters relies on strong assumptions that missing data are MAR and nonresponse weighting removes the bias from estimates. Assumptions for the new paradigm are not stated in Bailey (2023), but they appear to be equally strong if not stronger. External information about the nonrespondents is needed to assess the validity of the assumptions in either paradigm for a particular survey.

Equations (1)–(3) are algebraically equivalent, and thus the information that can be learned from each expression for a particular survey is identical. Nevertheless, looking at the expression for survey error in multiple ways can provide additional insights for survey planning, and many statisticians have studied nonresponse bias through examining the correlation of response propensities and outcome variables. Bethlehem (1988), for example, used the expected value of Equation (2) to provide guidance for constructing poststrata designed to minimize nonresponse bias, and his proposed methods are now standard practice.

Modern-day polls are not representative in the sense defined by Neyman (1934). Confidence intervals calculated under the assumption of unbiasedness have too-low coverage probability when there is nonresponse bias. These problems are well known, and there is no magic cure for low response rates. To date, full-response probability sampling is the only method guaranteed to produce unbiased estimates and accurate margins of error.

There are, however, a number of steps that pollsters can take to improve methodology and acknowledge the limitations of estimates. Transparency is paramount, and every poll should be accompanied by a full methodology report that describes how the sample was recruited, how concepts were measured, and how estimates were calculated. This report should include breakdowns of the response rate and a discussion of how nonresponse might affect the results of estimates for the full population and for subgroups.

Montaquila and Olson (2012) provided numerous practical tools for conducting nonresponse bias analyses, including fitting statistical models to look at the relationship between

Pollsters can take a cue from the language of mathematics and state (in lay language, of course) the conditions under which the estimates are approximately unbiased and the margin of error describes the uncertainty. Some poll reports (see, for example, Lopes et al., 2023) refer to ‘margin of sampling error’ and emphasize that other sources of error may also affect the estimates, and I think this is a much better practice than simply stating a margin of error without explaining that it measures only one type of uncertainty.

Finally, pollsters can be guided by solid mathematical and empirical research on methods for estimation and reducing nonresponse bias. There is a huge amount of excellent work, including recent research on the nature of statistical information and bias (Meng, 2018), statistical properties of estimates from nonprobability samples (Rao, 2021; Wu, 2022), empirical investigations comparing estimates from different sampling protocols and with different types of weighting variables (Dutwin & Buskirk, 2017; Mercer et al., 2018), investigation of new variables that can be used for weighting (Peytchev et al., 2018), and investigations into respondents who provide deliberately misleading answers to polls (Kennedy et al., 2020). Couper (2017) described recent methodological and technological advances that can improve survey quality and Jamieson et al. (2023) provided additional suggestions for protecting the integrity of survey research. While surveys face a number of major challenges, I am optimistic that the survey research community is up to the task.

I would like to thank Professor Meng for inviting me to provide this discussion, and I am grateful to J.N.K. Rao and Mike Brick for many helpful discussions on inferential issues related to big data, nonprobability samples, and data integration.

Sharon L. Lohr has no financial or non-financial disclosures to share for this article.

Bailey, M. A. (2023). A new paradigm for polling. *Harvard Data Science Review*, *5*(3). https://doi.org/10.1162/99608f92.9898eede

Bethlehem, J. G. (1988). Reduction of nonresponse bias through regression estimation. *Journal of* *Official Statistics*, *4*(3), 251–260.

Brick, J. M. (2013). Unit nonresponse and weighting adjustments: A critical review. *Journal of* *Official Statistics*, *29*(3), 329–353. https://doi.org/10.2478/jos-2013-0026

Couper, M. P. (2017). New developments in survey data collection. *Annual Review of Sociology*, *43*, 121–145. https://doi.org/10.1146/annurev-soc-060116-053613

Deming, W. E. (1950). *Some theory of sampling*. John Wiley & Sons.

Dutwin, D., & Buskirk, T. D. (2017). Apples to oranges or Gala versus Golden Delicious? Comparing data quality of nonprobability internet samples to low response rate probability samples. *Public Opinion Quarterly*, *81*(S1), 213–239. https://doi.org/10.1093/POQ%2FNFW061

Hartley, H., & Ross, A. (1954). Unbiased ratio estimators. *Nature*, *174*(4423), 270–271. https://doi.org/10.1038/174270a0

Haziza, D., & Lesage, É. (2016). A discussion of weighting procedures for unit nonresponse. *Journal* *of Official Statistics*, *32*(1), 129–145. https://doi.org/10.1515/jos-2016-0006

Jamieson, K. H., Lupia, A., Amaya, A., Brady, H. E., Bautista, R., Clinton, J. D., Dever, J. A., Dutwin, D., Goroff, D. L., Hillygus, D. S., Kennedy, C., Langer, G., Lapinski, J. S., Link, M., Philpot, T., Prewitt, K., Rivers, D., Vavreck, L., Wilson, D. C., & McNutt, M. K. (2023). Protecting the integrity of survey research. *PNAS Nexus*, *2*(3), Article pgad049. https://doi.org/10.1093/pnasnexus/pgad049

Kennedy, C., Hatley, N., Lau, A., Mercer, A., Keeter, S., Ferno, J., & Asare-Marfo, D. (2020). *Assessing the risks to online polls from bogus respondents*. Pew Research. https://www.pewresearch.org/methods/wp-content/uploads/sites/10/2020/02/PM_02.18.20_dataquality_FULL.REPORT.pdf

Lohr, S. L. (2022). *Sampling: Design and analysis* (3rd ed.). CRC Press.

Lohr, S. L., & Brick, J. M. (2017). Roosevelt predicted to win: Revisiting the 1936 *Literary Digest *poll. *Statistics, Politics and Policy*, *8*(1), 65–84. https://doi.org/10.1515/spp-2016-0006

Lopes, L., Kearney, A., Washington, I., Valdes, I., Yilma, H., & Hamel, L. (2023, August). *KFF health misinformation tracking poll pilot*. KFF. https://www.kff.org/coronavirus-covid-19/poll-finding/kff-health-misinformation-tracking-poll-pilot/

Meng, X.-L. (2018). Statistical paradises and paradoxes in big data (I): Law of large populations, big data paradox, and the 2016 US presidential election. *The Annals of Applied Statistics*, *12*(2), 685–726. https://doi.org/10.1214/18-AOAS1161SF

Mercer, A., Lau, A., & Kennedy, C. (2018). *For weighting online opt-in samples, what matters most?* Pew Research. https://www.pewresearch.org/methods/2018/01/26/for-weighting-online-opt-in-samples-what-matters-most/

Molenberghs, G., Beunckens, C., Sotto, C., & Kenward, M. G. (2008). Every missingness not at random model has a missingness at random counterpart with equal fit. *Journal of the Royal Statistical Society Series B: Statistical Methodology*, *70*(2), 371–388. https://doi.org/10.1111/j.1467-9868.2007.00640.x

Montaquila, J., & Olson, K. M. (2012). *Practical tools for nonresponse bias studies [Webinar]*. Survey Research Methods Section of the American Statistical Association; American Association of Public Opinion Research. https://community.amstat.org/surveyresearchmethodssection/programs/new-item2

Neyman, J. (1934). On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection. *Journal of the Royal Statistical* *Society*, *97*(4), 558–625. https://doi.org/10.2307/2342192

Peytchev, A., Presser, S., & Zhang, M. (2018). Improving traditional nonresponse bias adjustments: Combining statistical properties with social theory. *Journal of Survey Statistics and Methodology*, *6*(4), 491–515. https://doi.org/10.1093/jssam/smx035

Platek, R. (1980). Causes of incomplete data, adjustments and effects. *Survey Methodology*, *6*(2), 93–132. https://www150.statcan.gc.ca/n1/en/pub/12-001-x/1980002/article/54945-eng.pdf?st=k9d-jaTD

Rao, J. N. K. (2021). On making valid inferences by integrating data from surveys and other sources. *Sankhyā Series B*, *83*(1), 242–272. https://doi.org/10.1007/s13571-020-00227-w

Särndal, C.-E., & Lundström, S. (2005). *Estimation in surveys with nonresponse*. John Wiley & Sons.

Sun, B., Liu, L., Miao, W., Wirth, K., Robins, J., & Tchetgen, E. J. T. (2018). Semiparametric estimation with data missing not at random using an instrumental variable. *Statistica* *Sinica*, *28*(4), 1965–1983. https://doi.org/10.5705%2Fss.202016.0324

Wu, C. (2022). Statistical inference with non-probability survey samples (with discussion). *Survey Methodology*, *48*(2), 283–373. https://www150.statcan.gc.ca/n1/pub/12-001-x/2022002/article/00002-eng.pdf

©2023 Sharon L. Lohr. This article is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the article.

Another Commentary on
A New Paradigm for Polling

Is It Time for a New Paradigm in Biodiversity Monitoring? Lessons From Opinion Polling

Another Commentary on
A New Paradigm for Polling

The “Law of Large Populations” Does Not Herald a Paradigm Shift in Survey Sampling

Another Commentary on
A New Paradigm for Polling

Paradigm Lost? Paradigm Regained? Comment on “A New Paradigm for Polling” by Michael A. Bailey