Skip to main content
SearchLoginLogin or Signup

Assuming a Nonresponse Model Does Not Make It True

Published onOct 05, 2023
Assuming a Nonresponse Model Does Not Make It True
·
key-enterThis Pub is a Commentary on

Introduction

Bailey (2023) proposes a new paradigm for polling based on an algebraic identity for the error in a sample mean discussed by Meng (2018). I too appreciated the clear exposition of survey error in Meng’s article, and relied heavily on insights in Meng (2018) and Rao (2021) when writing the new chapter on nonprobability sampling in the latest edition of my sampling textbook (Lohr, 2022, chap. 15). Although it has long been known that nonresponse bias for estimating a population mean depends on the correlation between survey participation and the characteristic being measured, Meng discussed the implications of that correlation in the era of big data. Meng provided a framework for addressing questions such as: “Which one should I trust more: a 1% survey with 60% response rate or a self-reported administrative dataset covering 80% of the population?” (Meng, 2018, p. 685). One of the important conclusions in Meng’s article is the tremendous value of information from probability samples relative to other types of samples, and he gave an example in which the mean squared error (MSE) from a convenience sample of half of a population of one billion people (with expected correlation 0.05) is larger than the MSE from a simple random sample (SRS) of size 400.

Bailey (2023), arguing that today’s low poll response rates mean that “random sampling is dead,” proposes replacing inference based on probability sampling with efforts to measure the correlations between survey participation and specific outcome variables. Many of his conclusions and recommendations, however, rest on strong implicit assumptions about participation mechanisms. In this discussion I explore some of those assumptions and examine their implications for inference.

The discussion is organized as follows. I begin by reviewing Meng’s equation and defining random variables that may be used for statistical inference. I then discuss the special characteristics of probability samples and argue that there are benefits to selecting the initial sample using probability methods even if the response rate is low. After considering the role of auxiliary information for reducing bias and examining assumptions underlying randomized response instrument models, I conclude with some suggestions for moving forward.

Meng’s Equation and Random Variables

Let yˉU=i=1Nyi/N\bar{y}_{\mathcal{U}} = \sum_{i=1}^N y_i/N denote the population mean of an outcome variable yy for a population (universe) of size NN, and let yˉ\bar{y} denote the sample mean of a sample from the population. The particular sample being considered is determined by the participation indicators RiR_i, where Ri=1R_i = 1 if population unit ii participates in the sample and 0 otherwise. The identity in Meng (2018) follows from simple algebra:

   yˉyˉU=i=1NRi(yiyˉU)i=1NRi   (1)   =Corr(R,y)×N1NSRRˉU×Sy   (2)   =Corr(R,y)×N1n(1nN)×Sy,   (3)\begin{aligned} \ \ \ \bar{y} - \bar{y}_{\mathcal{U}} &= \frac{\sum_{i=1}^N R_i (y_i - \bar{y}_{\mathcal{U}})}{\sum_{i=1}^N R_i} \ \ \ &(1) \\ \ \ \ &= \textrm{Corr}(R,y) \times \frac{N-1}{N}\frac{S_R}{\bar{R}_{\mathcal{U}}} \times S_y \ \ \ &(2) \\ \ \ \ &= \textrm{Corr}(R,y) \times \sqrt{\frac{N-1}{n} \left(1-\frac{n}{N}\right)} \times S_y, \ \ \ &(3) \end{aligned}


where n=i=1NRi=NRˉUn = \sum_{i=1}^N R_i = N \bar{R}_{\mathcal{U}} is the number of participants,

Sy2=1N1i=1N(yiyˉU)2\begin{aligned} S^2_y = \frac{1}{N-1} \sum_{i=1}^N (y_i - \bar{y}_{\mathcal{U}})^2 \end{aligned}

and

SR2=1N1i=1N(RiRˉU)2=nN1(1nN)\begin{aligned} S^2_R &= \frac{1}{N-1} \sum_{i=1}^N (R_i - \bar{R}_{\mathcal{U}})^2 =\frac{n}{N-1} \left( 1 - \frac{n}{N} \right) \end{aligned}

denote the population variances of yy and RR, and

Corr(R,y)=Cov(R,y)SySR=i=1N(RiRˉU)(yiyˉU)(N1)SySR\begin{aligned} \textrm{Corr}(R,y) &= \frac{\textrm{Cov}(R,y)}{S_y S_R} = \frac{\sum_{i=1}^N (R_i -\bar{R}_{\mathcal{U}}) (y_i - \bar{y}_{\mathcal{U}})}{(N-1) S_y S_R} \end{aligned}

is the population correlation.

The formulation of survey error in Equation (3), involving the correlation between the outcome variable of interest and the response mechanism, has been known for a long time. Hartley and Ross (1954) used a similar algebraic identity when deriving the bias of the ratio estimator, and numerous authors have noted that nonresponse bias depends on the correlation between response indicators or propensities and yy (see, for example, Bethlehem, 1988; Brick, 2013; Platek, 1980). Meng’s insight was to apply these results to large nonprobability samples and compare the relative amount of information from different types of samples.

Equation (3) is an algebraic identity for the difference between the mean of the particular sample collected (the population members with Ri=1R_i = 1) and the unknown population mean. By itself, it does not give a framework for inference to the population. For statistical inferences to be made, one or both of {yi}\{y_i\} and {Ri}\{R_i\} must be considered to be random variables. I prefer the formulation in Equation (2) to that in Equation (3) for inferential purposes, because the former makes clear the dependence on the random variables RiR_i that describe participation. Statistical properties of yˉ\bar{y} depend on the joint probability distribution of the participation indicators RiR_i and the outcome variables yiy_i. The bias of yˉ\bar{y} is the expected value of the quantity in Equation (2) and the MSE is the expected value of the square of the quantity in Equation (2).

In model-based inference, the values yiy_i for nonparticipants are random variables and the assumed probability distribution of yxy \mid {\mathbf x}, where x{\mathbf x} is a vector of auxiliary information, provides the basis for inference. Models can also be adopted to study variability in answers that can be ascribed to interviewers, survey modes, or the tendency of some respondents to give different answers on repeated administrations of a survey instrument. The accuracy of a model-based estimate depends on how well the auxiliary information predicts yiy_i for units not in the sample.

In design-based sampling theory—perhaps we should call it participation-based sampling theory in this context since not all samples are designed—the quantities yiy_i are considered to be fixed constants. The participation indicators are random variables with P(Ri=1)=ϕiP(R_i = 1) = \phi_i. The values of ϕi\phi_i are known for probability samples with full response.

For a sample with some degree of self-selection, whether a convenience sample of volunteers or a probability sample with nonresponse, the values of ϕi\phi_i depend on the protocol used to recruit people for the sample and are usually unknown. Some researchers view RiR_i as a dichotomization of a latent variable RiR_i^*, where Ri=1R_i = 1 if and only if RikR_i^* \geq k for a particular cutoff kk. The ability to reach sample members, and sampled persons’ decisions to participate in a survey, depend on many factors including number and type of contact attempts, survey mode, survey topic, and characteristics of interviewers. Two surveys with different recruitment protocols would thus be expected to have different probability distributions for the latent variables RiR_i^* describing participation and different sets of {ϕi}\{\phi_i\}. The response propensity ϕi\phi_i is not an immutable intrinsic characteristic of person ii, but depends on the sampling protocol and on the interaction between person ii’s attitudes or characteristics and the sampling protocol. If Mabel never answers the telephone when pollsters call, she has ϕi0\phi_i \approx 0 for a telephone survey. But Mabel may be willing to participate in a mail survey, and thus would have ϕi>0\phi_i > 0 for that survey. And Mabel may have ϕi1\phi_i \approx 1 for an administrative data set such as income tax records if her income exceeds the filing threshold.

Many of the conclusions in Bailey (2023) appear to be predicated on an unstated assumption that the distribution of the random variables RiR_i^* is the same for different sampling protocols. Figure 1 in Bailey (2023) graphs yiy_i versus a latent variable RiR_i^* for a large population of size N1N_1 and for a small population of size N2N_2. The two graphs exhibit the same relationship between yy and RR^*, and the only difference is that the cutoff kk for sample inclusion moves to the left for the population of size N2N_2 so that both samples have size nn.

There is no reason to believe, however, that the distribution of the latent variables RR^* would be the same for the two samples. The two surveys can only achieve the same sample size if the response rate for the small population is N1/N2N_1/N_2 times greater than the response rate for the large population. The pollster must do something different to achieve that higher response rate—for example, changing the survey invitation or survey mode, doing nonresponse follow-up, or offering an incentive. Those extra efforts will usually result in a different relationship between RiR_i^* and yy. In fact, most pollsters design those efforts to result in a different relationship. A protocol yields unbiased estimates if ϕi\phi_i is the same for all units, so many nonresponse follow-up efforts and adaptive sampling protocols are attempts to reduce the variation of the ϕi\phi_is as much as possible. When comparing two surveys using Equations (1)–(3), we need to consider separate sets of random variables, {Ri1}\{R_{i1}\} and {Ri2}\{R_{i2}\}, with corresponding latent variables that have distinct probability distributions.

Advantages of Probability Sampling

In a probability sample with full response, participation is described by Ri=ZiR_i =Z_i, where Zi=1Z_i = 1 if unit ii is included in the sample selected under the probability sampling design and 0 otherwise. The probability sampling design determines the probability distribution of Z1,,ZNZ_1,\ldots,Z_N. For example, for an SRS of size nn, each population subset of size nn has probability n!(Nn)!/N!n! (N-n)!/N! of being chosen as the sample and consequently an SRS has E[Corr(Z,y)]=0E[\textrm{Corr}(Z,y)] = 0 and E[Corr2(Z,y)]=1/(N1)E[\textrm{Corr}^2(Z,y)] =1/(N-1) for any variable yy. A probability sample is representative in the technical sense of the word because it allows construction of confidence intervals that have the claimed coverage probability “whatever the unknown properties of the population(Neyman, 1934, p. 586, emphasis in original).

The development of probability sampling theory was a tremendous breakthrough for the age-old problem of how to generalize from a sample we have seen to population members we have not seen. Probability sampling allows this generalization—along with an assessment of the accuracy of estimates—because the probability distribution of the participation indicator is fully under the control of the sampler. The pioneers of probability sampling, however, were well aware that the validity of inferences under probability sampling theory depends on having full response. For example, Deming (1950, p. 35) wrote: “A sample is no longer a probability sample if it is ruined by nonresponse or any other difficulty of execution,” and argued that even a nonresponse rate of 5% could seriously affect results.

When there is nonresponse within a probability sample, the participation indicator can be written as Ri=ZiTiR_i = Z_i T_i, where Zi=1Z_i = 1 if unit ii is in the probability sample and Ti=1T_i = 1 if unit ii will respond to the survey if selected for the sample. The mean of the respondents is yˉr=(i=1NZiTiyi)/(i=1NZiTi)\bar{y}_r = \left( \sum_{i=1}^N Z_i T_i y_i \right)/\left( \sum_{i=1}^N Z_i T_i \right). If the sampling design is an SRS of size nn, ET[i=1NTi/N]αE_T[\sum_{i=1}^N T_i/N] \to \alpha for some constant α>0\alpha >0, and ZiZ_i and TiT_i are independent, standard results on ratio estimation imply that

      E[yˉryˉU]=ET[Corr(T,y)×STTˉU×Sy]+O(1n).   (4)\begin{aligned} \ \ \ \ \ \ E\left[\bar{y}_r - \bar{y}_{\mathcal{U}} \right] &= E_T\left[\textrm{Corr}(T,y) \times \frac{S_T}{\bar{T}_{\mathcal{U}}} \times S_y \right] + O\left(\frac{1}{n}\right). \ \ \ &(4) \end{aligned}


Equation (4) has approximately the same form as the expected value of Equation (2), with TT substituted for RR. In both equations the middle factor depends on the coefficient of variation for the response indicator.

Consider two survey designs. The first sends a broadcast invitation to everyone on the email list (with participation indicator Ri=Ti1R_i = T_{i1}), and the second sends the broadcast invitation to an SRS of nn email addresses on the list (with participation indicator Ri=ZiTi2R_i = Z_i T_{i2}). One might expect that since the same recruitment protocol is used for both samples, the distribution of Ti1T_{i1} would be the same as the distribution of Ti2T_{i2} so that the response rate is the same for the two surveys. Equation (4) implies that if the response mechanism is the same for the two designs, so is the bias. The SRS, however, is expected to have many fewer respondents than the broadcast invitation to the population and hence the mean of the SRS respondents will have a larger variance.

The bias will be reduced for the SRS if the pollster modifies the recruitment procedure to obtain a lower expected value of Corr(T,y)ST/TˉU\textrm{Corr}(T,y) S_T/\bar{T}_{\mathcal{U}} for the variables Ti2T_{i2} than would be possible for the variables Ti1T_{i1}. The pollster might be able devote more resources in the SRS to increasing the response rate through nonresponse follow-up or providing incentives to respond, or attempt to reduce the variance of final response propensities through special efforts in low-propensity subpopulations. These efforts, however, result in {Ti2}\{T_{i2}\} having a different joint distribution, and likely a different correlation with yy, than {Ti1}\{T_{i1}\}. The statement in Bailey (2023) that “population size matters,” which is conditioned on having a fixed number of respondents, might be alternatively expressed as ‘response rate and sampling protocols matter.’

Probability sampling by itself does not cure the problem of nonresponse bias, but it allows the pollster to concentrate resources on obtaining high-quality data from a selected sample instead of spreading those efforts among the whole population. Lohr (2022) argued that starting with a probability sample has several advantages even when response rates are low:

  • The sampling frame for a probability sample is well defined, and many frames used in practice have high coverage. When coverage is incomplete by design (for example, when the frame excludes persons in institutions) the probability sampler can limit inference to the frame population. Many nonprobability samples lack a sampling frame, which makes it challenging to assess coverage.

  • The sampler can devote more resources toward persuading members of the selected sample to participate. This changes the distribution of the response indicators TiT_i. Nonresponse follow-up can concentrate on units that have low response propensities for the initial contact method, which may reduce the variance of the final response propensities.

  • It is more difficult for a malevolent actor to influence a probability sample. A population member can participate at most once in a probability survey, thus preventing situations in which one person or machine clicks the ‘take this survey’ button 10,000 times. Even when pollsters have methods for preventing or detecting multiple responses, a nonprobability sample might be manipulable. For example, a malevolent actor knowledgeable about survey weighting could arrange for a large group of people with a variety of claimed demographic characteristics to sign up for an online opt-in poll where participants are recruited by broadcast advertisement. None of the auxiliary information that would be available for weighting or nonresponse modeling would be able to compensate for the bias caused because all of these demographically diverse poll participants say they have opinion X.


    A would-be malevolent actor cannot arrange for people with opinion X to flood a probability sample because the initial selection is random.

  • The probability sample often has more information available that can be used for weighting, imputation, or other types of nonresponse modeling. For example, a sample drawn from a list of registered voters may have information on age, political party, and past voting behavior that can be used to study and adjust for nonresponse bias.

Weighting and Nonresponse Modeling

If ϕi\phi_i is unknown, survey samplers often attempt to estimate it from auxiliary information xi{\mathbf x}_i that is known for respondents and nonrespondents to a survey (see Haziza & Lesage, 2016; Särndal & Lundström, 2005, for descriptions of types of auxiliary information that might be available and how to use them). Standard practice is to form weights wi=1/ϕ^iw_i = 1/\hat{\phi}_i and to estimate the population mean by

yˉw=i=1NViyi/i=1NVi,\begin{aligned} \bar{y}_w &= \sum_{i=1}^N V_i y_i / \sum_{i=1}^N V_i, \end{aligned}


where Vi=RiwiV_i = R_i w_i. Then, from Equations (1) and (2), the error is

yˉwyˉU=i=1NVi(yiyˉU)i=1NVi=Corr(V,y)×N1NSVVˉU×Sy,\begin{aligned} \bar{y}_w - \bar{y}_{\mathcal{U}} &= \frac{\sum_{i=1}^N V_i (y_i - \bar{y}_{\mathcal{U}})}{\sum_{i=1}^N V_i} = \textrm{Corr}(V,y) \times \frac{N-1}{N}\frac{S_V}{\bar{V}_{\mathcal{U}}} \times S_y , \end{aligned}


where VˉU=i=1NVi/N\bar{V}_{\mathcal{U}} = \sum_{i=1}^N V_i/N and SV2=i=1N(ViVˉU)2/(N1)S^2_V = \sum_{i=1}^N (V_i - \bar{V}_{\mathcal{U}})^2/(N-1). The Cauchy-Schwarz inequality implies that SV2/VˉU2SR2/RˉU2S^2_V/\bar{V}_{\mathcal{U}}^2 \geq S^2_R/\bar{R}_{\mathcal{U}}^2, so the ability of weighting to reduce the MSE depends on how much the weighting decreases the squared correlation factor relative to how much the weight variation inflates the variance.

Weighting is a powerful tool for reducing bias, and high-quality auxiliary information can improve estimates even from surveys with a high degree of selection bias. For example, Bailey (2023) cites the 1936 Literary Digest survey as an example of a sampling “fiasco” and indeed the unweighted estimates were poor. Fifty-four percent of the Digest’s sample of nearly 2.4 million respondents said they were supporting Landon for president, but Roosevelt ended up winning the election with more than 60% of the popular vote. If, however, the Digest editors had used available auxiliary information to weight the data, the weighting would have partially corrected for the overrepresentation of Republican voters in the sample and led the editors to predict that Roosevelt would win the election (Lohr & Brick, 2017).

As Bailey (2023) mentions, however, most inferences based on weighted estimates rely on an assumption that the response mechanism is missing at random (MAR) and that the weighting model removes the nonresponse bias. If the MAR assumption is not true, then confidence intervals based on formulas from probability sampling no longer have the stated coverage probability.

The problem is that one cannot assess the MAR assumption from the sample itself—one needs external information such as independent estimates of the means of outcome variables. For example, Mercer et al. (2018), comparing weighted estimates from online opt-in surveys to benchmarks from high-quality federal surveys, found that even the most effective weighting adjustments were able to remove only about 30% of the bias from the unweighted estimates, and thus the response mechanisms for those surveys cannot be considered to be MAR. They concluded that the quality of the auxiliary information mattered more than the particular statistical method for using such information, and advocated obtaining a richer set of auxiliary variables that go beyond core demographics.

Because weighted estimates can still be biased, there is a large body of research exploring not-missing-at-random (NMAR) models in which the response mechanism depends on the values of unobserved data as well as on observed data. The statistical properties for estimators calculated from these models depend on the validity of assumptions made about the nonparticipants—theorems often have the form ‘If assumptions A, B, and C hold, then our proposed estimate is approximately unbiased with approximate variance given in Equation D.’

NMAR models are useful for exploring potential nonresponse mechanisms and are an important part of nonresponse bias analyses. Users can assess whether the explicitly stated assumptions of a particular NMAR model apply to their own circumstances. But, as Molenberghs et al. (2008) pointed out, “the correctness of the alternative model can only be verified in as far as it fits the observed data” (p. 372, emphasis in original). Molenberghs et al. (2008) showed that every NMAR model has a MAR counterpart that fits the observed data equally well. MAR and NMAR models with equivalent fits may give different estimates of yˉU\bar{y}_{\mathcal{U}}, but the data provide no information for preferring one over the other.

Randomized Response Instruments

Bailey (2023) suggests collecting additional auxiliary information for modeling nonresponse through use of ‘randomized response instruments’ in which persons in the selected sample are randomly assigned to different survey protocols that are designed to have different response rates. Many pollsters use randomized experiments to test questions and explore methods for improving response rates, but Bailey (2023) claims that assigning some of the selected sample members to a protocol that is known to achieve a low response rate can “allow us to assess whether the response mechanism is ignorable or not” and “create population estimates that [are] purged of the malign effects of ρ\rho [Corr(R,y)\textrm{Corr}(R,y)].” The only assumption Bailey (2023) mentions is that the response instrument should not directly affect yy, and he refers the reader to Sun et al. (2018) for “a formal proof of the conditions under which population quantities are statistically identified.” The implication is that these conditions are technicalities that are easily met.

This sounds like it is too good to be true, and I believe it is. Every method of nonresponse adjustment or bias estimation requires assumptions about the nonrespondents. In this case, the very strong assumptions required for randomized response instrument methods are implicit in the model identifiability conditions.

Consider the fictional data in Table 1 from an experiment with two sampling protocols. An SRS of size 10,000 is selected from the population. Sample members who are randomly assigned to the group with x=0x=0 receive a sampling protocol that is expected to achieve a low response rate, and members of the group with x=1x=1 receive a protocol that is expected to achieve a higher response rate. For example, protocol 0 might be an internet survey while protocol 1 is face-to-face, or protocol 0 might include no incentive while protocol 1 includes an incentive of $20. For simplicity, assume that no other auxiliary information is available—Table 1 contains all the data we have.

Table 1. Fictional data from an experiment with randomly assigned sampling protocols.

Respondents

Protocol

nn

Nonrespondents

yi=0y_i = 0

yi=1y_i = 1

yˉr\bar{y}_r

x=0x=0

5,000

4,000

350

650

0.65

x=1x=1

5,000

3,000

1,000

1,000

0.50

All

10,000

7,000

1,350

1,650

0.55

If there were no nonresponse bias, we would expect the means of the respondents from the two protocols in Table 1 to be approximately equal because the sample members were randomly assigned. Thus, Table 1 clearly exhibits a problem with nonresponse bias in one or both xx groups. But the data have little information on the nature of the bias. We know that the sample mean for the selected sample of 5,000 persons in the x=0x=0 group is somewhere between 0.13 (assuming all nonrespondents have yi=0y_i = 0) and 0.93 (assuming all nonrespondents have yi=1y_i = 1) and the sample mean for the selected sample of 5,000 persons in the x=1x=1 group is somewhere between 0.2 and 0.8, but anything beyond that requires assumptions about the nature of the nonresponse.

Figure 1 depicts three of the many possible relationships between response probabilities and yy that are consistent with the observed data in Table 1. For each picture, the mean of the selected sample with protocol x=0x=0 equals the mean of the selected sample with protocol x=1x=1, as one would expect since xx is assigned randomly. In Figure 1(a), the response probabilities for the group assigned sampling protocol 1 are the same for y=0y=0 and y=1y=1 but differ by yy for the group with protocol 0. The estimate yˉr\bar{y}_r from the group with x=1x=1 is thus unbiased and the estimate from the x=0x=0 group is biased. A situation similar to this might occur if protocol 1 has nonresponse follow-up or follows an adaptive sampling design that reduces variation in response rates. In Figure 1(b), sampling protocol 1 has larger bias than protocol 0 despite having a higher overall response rate—perhaps yy is related to income, and offering an incentive led to a disproportionate number of lower income respondents in the x=1x=1 group. Figure 1(c) is consistent with the implicit assumption in Figure 4 of Bailey (2023) that the two protocols share the same set of latent response variables RiR_i^* but the cutoff kk is lower for protocol 1. In Figure 1(c), protocol 1 captures a higher percentage of the people with y=0y=0 than protocol 0, but the majority of the people with y=0y=0 are still nonrespondent.

Figure 1. Three possible relationships between mean response probabilities and yy.

The saturated logistic regression model relating P(R=1)P(R=1), xx, and yy is

       logitP(R=1x,y)=θ0+θ1x+θ2y+θ3xy        (5)\begin{aligned} \ \ \ \ \ \ \ \textrm{logit}\, P(R=1 | x,y) &= \theta_0 + \theta_1 x + \theta_2 y + \theta_3 xy \ \ \ \ \ \ \ \ &(5) \end{aligned}


but Sun et al. (2018) showed in their Example 1 that this saturated model is not identifiable and therefore cannot be fit. Thus, an assumption must be made to reduce the number of parameters. Sun et al. (2018) fixed θ3=0\theta_3 = 0, and showed that the model

           logitP(R=1x,y)=θ0+θ1x+θ2y               (6)\begin{aligned} \ \ \ \ \ \ \ \ \ \ \ \textrm{logit}\, P(R=1 | x,y) &= \theta_0 + \theta_1 x + \theta_2 y \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ &(6) \end{aligned}

is identifiable.

When θ3\theta_3 is assumed to be 0, as in Equation (6), the predicted probabilities for the data in Table 1 are those in Figure 1(c) (the lines are parallel if graphed in logit scale). Inverse propensity weights for observations with y=0y=0 are higher than those for observations with y=1y=1 in both xx groups, leading to an estimated population mean of approximately 0.25. This is lower than the mean of the respondents from each sampling protocol considered separately, reflecting the strong assumption that protocol 1 increases all response probabilities (in logit scale) by a constant amount.

The assumption that θ3=0\theta_3 = 0 is arbitrary, however, and fixing θ3\theta_3 as 0.7693-0.7693, 1.32-1.32, or another value also leads to identifiability for model (5) but will result in different predicted probabilities and a different estimate of yˉU\bar{y}_{\mathcal{U}}. All three models in Figure 1 provide perfect fits to the data in Table 1. Without additional external information, there is no reason to believe the predicted probabilities in Figure 1(c) are more accurate than those in Figure 1 (a) or (b), which both have nonzero values of θ3\theta_3, or that the estimate of yˉU\bar{y}_{\mathcal{U}} from Figure 1(c) is purged of nonresponse bias.

The protocols for many probability surveys have been refined through years of research and experimentation, and often nonresponse follow-up efforts are designed to raise the response rates for subpopulations that are underrepresented after initial contact efforts. If a well-researched protocol 1 has a moderately high response rate, and protocol 0 modifies protocol 1 so as to reduce the response rate (for example, by skipping nonresponse follow-up), I think assuming a structure similar to Figure 1(a), where protocol 1 results in MAR data, is more reasonable than assuming a structure such as (c) that forces the interaction to be zero. And if the structure in (a) holds, then one should take the entire sample using protocol 1 because estimates from that protocol are unbiased. Why would a sampler want to allocate half of the sample to an inferior protocol with a lower response rate and then adjust the weights for the superior protocol according to the results from the inferior protocol?

In this example, I assumed no other auxiliary information was available. If other information is available, the weighting adjustments from the randomized response instrument method are likely to be less extreme because some of the difference between the protocols will be explained by the other auxiliary variables. But anyone using this method must still assume that, after conditioning on other covariates, the response instrument does not interact with yy in affecting the response propensities. There may be response instruments for which this assumption is reasonable, but the assumption needs to be justified.

In general, different sampling protocols are likely to have different relationships between RiR_i (or RiR_i^*) and yy—in other words, we would often expect to see an interaction. An incentive will often raise response rates more for some population subgroups than others. One would not expect to be able to learn about the expected correlation between RR and yy for the random variables associated with the Current Population Survey’s sampling protocol by using data from a convenience sample. If the randomized response instrument model’s strong assumptions are not met, estimates from this approach may well have more bias than would occur if the entire sample were collected using the preferred protocol.

The current paradigm used by pollsters relies on strong assumptions that missing data are MAR and nonresponse weighting removes the bias from estimates. Assumptions for the new paradigm are not stated in Bailey (2023), but they appear to be equally strong if not stronger. External information about the nonrespondents is needed to assess the validity of the assumptions in either paradigm for a particular survey.

Improving Polls

Equations (1)–(3) are algebraically equivalent, and thus the information that can be learned from each expression for a particular survey is identical. Nevertheless, looking at the expression for survey error in multiple ways can provide additional insights for survey planning, and many statisticians have studied nonresponse bias through examining the correlation of response propensities and outcome variables. Bethlehem (1988), for example, used the expected value of Equation (2) to provide guidance for constructing poststrata designed to minimize nonresponse bias, and his proposed methods are now standard practice.

Modern-day polls are not representative in the sense defined by Neyman (1934). Confidence intervals calculated under the assumption of unbiasedness have too-low coverage probability when there is nonresponse bias. These problems are well known, and there is no magic cure for low response rates. To date, full-response probability sampling is the only method guaranteed to produce unbiased estimates and accurate margins of error.

There are, however, a number of steps that pollsters can take to improve methodology and acknowledge the limitations of estimates. Transparency is paramount, and every poll should be accompanied by a full methodology report that describes how the sample was recruited, how concepts were measured, and how estimates were calculated. This report should include breakdowns of the response rate and a discussion of how nonresponse might affect the results of estimates for the full population and for subgroups.

Montaquila and Olson (2012) provided numerous practical tools for conducting nonresponse bias analyses, including fitting statistical models to look at the relationship between yy and the level of effort in obtaining a response, and using two-phase sampling to study nonresponse follow-up strategies. Randomized experiments are a powerful tool for comparing questionnaires and improving polling methodologies.

Pollsters can take a cue from the language of mathematics and state (in lay language, of course) the conditions under which the estimates are approximately unbiased and the margin of error describes the uncertainty. Some poll reports (see, for example, Lopes et al., 2023) refer to ‘margin of sampling error’ and emphasize that other sources of error may also affect the estimates, and I think this is a much better practice than simply stating a margin of error without explaining that it measures only one type of uncertainty.

Finally, pollsters can be guided by solid mathematical and empirical research on methods for estimation and reducing nonresponse bias. There is a huge amount of excellent work, including recent research on the nature of statistical information and bias (Meng, 2018), statistical properties of estimates from nonprobability samples (Rao, 2021; Wu, 2022), empirical investigations comparing estimates from different sampling protocols and with different types of weighting variables (Dutwin & Buskirk, 2017; Mercer et al., 2018), investigation of new variables that can be used for weighting (Peytchev et al., 2018), and investigations into respondents who provide deliberately misleading answers to polls (Kennedy et al., 2020). Couper (2017) described recent methodological and technological advances that can improve survey quality and Jamieson et al. (2023) provided additional suggestions for protecting the integrity of survey research. While surveys face a number of major challenges, I am optimistic that the survey research community is up to the task.


Acknowledgments

I would like to thank Professor Meng for inviting me to provide this discussion, and I am grateful to J.N.K. Rao and Mike Brick for many helpful discussions on inferential issues related to big data, nonprobability samples, and data integration.

Disclosure Statement

Sharon L. Lohr has no financial or non-financial disclosures to share for this article.


References

Bailey, M. A. (2023). A new paradigm for polling. Harvard Data Science Review, 5(3). https://doi.org/10.1162/99608f92.9898eede

Bethlehem, J. G. (1988). Reduction of nonresponse bias through regression estimation. Journal of Official Statistics, 4(3), 251–260.

Brick, J. M. (2013). Unit nonresponse and weighting adjustments: A critical review. Journal of Official Statistics, 29(3), 329–353. https://doi.org/10.2478/jos-2013-0026

Couper, M. P. (2017). New developments in survey data collection. Annual Review of Sociology, 43, 121–145. https://doi.org/10.1146/annurev-soc-060116-053613

Deming, W. E. (1950). Some theory of sampling. John Wiley & Sons.

Dutwin, D., & Buskirk, T. D. (2017). Apples to oranges or Gala versus Golden Delicious? Comparing data quality of nonprobability internet samples to low response rate probability samples. Public Opinion Quarterly, 81(S1), 213–239. https://doi.org/10.1093/POQ%2FNFW061

Hartley, H., & Ross, A. (1954). Unbiased ratio estimators. Nature, 174(4423), 270–271. https://doi.org/10.1038/174270a0

Haziza, D., & Lesage, É. (2016). A discussion of weighting procedures for unit nonresponse. Journal of Official Statistics, 32(1), 129–145. https://doi.org/10.1515/jos-2016-0006

Jamieson, K. H., Lupia, A., Amaya, A., Brady, H. E., Bautista, R., Clinton, J. D., Dever, J. A., Dutwin, D., Goroff, D. L., Hillygus, D. S., Kennedy, C., Langer, G., Lapinski, J. S., Link, M., Philpot, T., Prewitt, K., Rivers, D., Vavreck, L., Wilson, D. C., & McNutt, M. K. (2023). Protecting the integrity of survey research. PNAS Nexus, 2(3), Article pgad049. https://doi.org/10.1093/pnasnexus/pgad049

Kennedy, C., Hatley, N., Lau, A., Mercer, A., Keeter, S., Ferno, J., & Asare-Marfo, D. (2020). Assessing the risks to online polls from bogus respondents. Pew Research. https://www.pewresearch.org/methods/wp-content/uploads/sites/10/2020/02/PM_02.18.20_dataquality_FULL.REPORT.pdf

Lohr, S. L. (2022). Sampling: Design and analysis (3rd ed.). CRC Press.

Lohr, S. L., & Brick, J. M. (2017). Roosevelt predicted to win: Revisiting the 1936 Literary Digest poll. Statistics, Politics and Policy, 8(1), 65–84. https://doi.org/10.1515/spp-2016-0006

Lopes, L., Kearney, A., Washington, I., Valdes, I., Yilma, H., & Hamel, L. (2023, August). KFF health misinformation tracking poll pilot. KFF. https://www.kff.org/coronavirus-covid-19/poll-finding/kff-health-misinformation-tracking-poll-pilot/

Meng, X.-L. (2018). Statistical paradises and paradoxes in big data (I): Law of large populations, big data paradox, and the 2016 US presidential election. The Annals of Applied Statistics, 12(2), 685–726. https://doi.org/10.1214/18-AOAS1161SF

Mercer, A., Lau, A., & Kennedy, C. (2018). For weighting online opt-in samples, what matters most? Pew Research. https://www.pewresearch.org/methods/2018/01/26/for-weighting-online-opt-in-samples-what-matters-most/

Molenberghs, G., Beunckens, C., Sotto, C., & Kenward, M. G. (2008). Every missingness not at random model has a missingness at random counterpart with equal fit. Journal of the Royal Statistical Society Series B: Statistical Methodology, 70(2), 371–388. https://doi.org/10.1111/j.1467-9868.2007.00640.x

Montaquila, J., & Olson, K. M. (2012). Practical tools for nonresponse bias studies [Webinar]. Survey Research Methods Section of the American Statistical Association; American Association of Public Opinion Research. https://community.amstat.org/surveyresearchmethodssection/programs/new-item2

Neyman, J. (1934). On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection. Journal of the Royal Statistical Society, 97(4), 558–625. https://doi.org/10.2307/2342192

Peytchev, A., Presser, S., & Zhang, M. (2018). Improving traditional nonresponse bias adjustments: Combining statistical properties with social theory. Journal of Survey Statistics and Methodology, 6(4), 491–515. https://doi.org/10.1093/jssam/smx035

Platek, R. (1980). Causes of incomplete data, adjustments and effects. Survey Methodology, 6(2), 93–132. https://www150.statcan.gc.ca/n1/en/pub/12-001-x/1980002/article/54945-eng.pdf?st=k9d-jaTD

Rao, J. N. K. (2021). On making valid inferences by integrating data from surveys and other sources. Sankhyā Series B, 83(1), 242–272. https://doi.org/10.1007/s13571-020-00227-w

Särndal, C.-E., & Lundström, S. (2005). Estimation in surveys with nonresponse. John Wiley & Sons.

Sun, B., Liu, L., Miao, W., Wirth, K., Robins, J., & Tchetgen, E. J. T. (2018). Semiparametric estimation with data missing not at random using an instrumental variable. Statistica Sinica, 28(4), 1965–1983. https://doi.org/10.5705%2Fss.202016.0324

Wu, C. (2022). Statistical inference with non-probability survey samples (with discussion). Survey Methodology, 48(2), 283–373. https://www150.statcan.gc.ca/n1/pub/12-001-x/2022002/article/00002-eng.pdf


©2023 Sharon L. Lohr. This article is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the article.

Connections
1 of 11
Another Commentary on A New Paradigm for Polling
Another Commentary on A New Paradigm for Polling
Another Commentary on A New Paradigm for Polling
Another Commentary on A New Paradigm for Polling
Comments
0
comment
No comments here
Why not start the discussion?