Skip to main content
SearchLoginLogin or Signup

The “Law of Large Populations” Does Not Herald a Paradigm Shift in Survey Sampling

Published onSep 27, 2023
The “Law of Large Populations” Does Not Herald a Paradigm Shift in Survey Sampling
key-enterThis Pub is a Commentary on


Data quality in sample surveys, and opinion polls in particular, is an increasing concern, given sampling-frame deficiencies, rising nonresponse rates, and perceived failures in election polls. Concern is evidenced by the recent report from the American Association for Public Opinion Research (AAPOR) on data quality metrics for online surveys (AAPOR, 2022).

Bailey’s (2023) statement that “random sampling is, for all practical purposes, dead” might be defensible for opinion polling, but not for the field of survey sampling as a whole. In areas such as auditing, probability sampling is still practical; in other settings, government statistical agencies like the U.S. Census Bureau and the National Center for Health Statistics, and survey research organizations like Westat, Research Triangle Institute, and the Institute for Social Research at Michigan, strive to conduct high-quality probability surveys. The ideal of random sampling remains a very important concept even when it is not attainable in practice.

Bailey argues that Meng’s (2018) discussion of the role of probability sampling in the era of ‘big data,’ and in particular Meng’s “Law of Large Populations,” heralds a paradigm shift in the assessment of polling data collected by nonrandom sampling methods. Meng has provided useful illustrations of the trade-off between bias and variance in survey data, emphasizing the role of probability sampling in the era of big data. See, for example, Bradley et al. (2021) and Meng’s (2016) discussion of Keiding and Louis (2016), a wide-ranging debate of the pros and cons of probability sampling in the context of epidemiologic studies. However, I have reservations about his proposed Law of Large Populations (Meng, 2018), as represented by what Bailey calls “Meng’s equation”:

YˉnYˉN=ρR,Y×Nnn×σY(1){\bar{Y}}_{n} - {\bar{Y}}_{N} = \rho_{R,Y} \times \sqrt{\frac{N - n}{n}} \times \sigma_{Y} \tag{1}

where ρR,Y\rho_{R,Y}is called the “data defect correlation,” (Nn)/n\sqrt{(N - n)/n} the “data quantity,” and σY\sigma_{Y} the “data difficulty.” Bailey follows Meng in arguing that the data defect correlation is a fundamental measure of data quality, and reiterates Meng’s Law of Large Populations, which implies that the population size is a major driver of selection bias. He also discusses ‘randomized response’ approaches to reducing ρR,Y\rho_{R,Y}.

I argue that ρR,X\rho_{R,X} is a derived measure of selection bias, and to regard it as a ‘universal constant’ is misleading, as is the conclusion that the population size is a key quantity for assessing the trade-off between bias and precision. I present a more traditional view of selection bias and the bias-variance trade-off, which leads to the conclusion that the sample size, not the population size, is the main driver of the effects of selection. I also discuss relationships of Meng’s equation with earlier models for nonignorable nonresponse, including Heckman’s (1976) model, and proxy pattern-mixture analysis (Andridge & Little, 2011), which provides a sensitivity analysis on the potential impact of selection bias on survey estimates.
Meng is a friend who is aware of my reservations, and I appreciate his generous invitation to write about them in this commentary.

‘Classical’ Selection Bias 101

The left side of Equation 1, YˉnYˉN{\bar{Y}}_{n} - {\bar{Y}}_{N}, involves both sampling uncertainty and selection bias for inference about a population mean. Strictly speaking, as a measure of bias, Yˉn{\bar{Y}}_{n} should be replaced by its expectation. In the absence of random sampling, it requires a superpopulation model, which Meng avoids. Under simple random sampling and, more generally, probability sampling, the bias is zero in expectation and YˉnYˉN{\bar{Y}}_{n} - {\bar{Y}}_{N} is O(1/n)O(1/\sqrt{n*}), where nn* is the effective sample size. An alternative expression of the selection bias that separates the selected and nonselected values is

b=(1f)(YˉnYˉNn)(2)b = (1 - f)({\bar{Y}}_{n} - {\bar{Y}}_{N - n}) \tag{2}

where YˉNn{\bar{Y}}_{N - n} is the mean of the unselected units, and f=n/Nf = n/N is the selection fraction. The fraction f seems a more intuitive measure of “data quantity” than (Nn)/n\sqrt{(N - n)/n} in Equation 1, and Equation 2 captures the idea that, other things being equal, the bias is reduced as f increases.

The parameter σY\sigma_{Y} in Equation 1, which Meng calls “data difficulty,” does not impact bias, but it does impact precision. In my very first course in statistics as an MS student at Imperial College, London, Sir David Cox distinguished between ‘precision’ (the variability of the estimate) and ‘accuracy’ (the quality of the estimate,) which involves both precision and negligible bias. A standard statistical measure of accuracy is the root mean squared error:

RMSE=(E(b))2+var(3)\text{RMSE}\mathbf{=}\sqrt{\left( \mathbf{E(b)} \right)^{\mathbf{2}}\mathbf{+}\text{var}} \tag{3}

where E(b) is the expected bias and var is the variance of the sample estimate. In a simple superpopulation model where units are assumed exchangeable and unit outcomes modeled as independent, a particular form of the RMSE is

RMSE=(E(b))2+(1f)σY2/n(4)\text{RMSE} = \sqrt{\left( E(b) \right)^{2} + (1 - f)\sigma_{Y}^{2}/n} \tag{4}

where (1 – f) is the finite population correction.

A characteristic of this formulation is that bias from nonrandom selection, unlike precision, remains relatively constant as a function of sample size n. One might argue that bias actually increases with n, because collection of a larger sample is harder to control and more subject to measurement error; but measurement error issues are not a part of Meng’s (2018) discussion. As n increases, the relative contribution of precision (as measured by the variance), to the RMSE decreases, and bias increasingly dominates. Thus, for ‘big data,’ it is bias, not precision, that is the key issue. Meng has illustrated the impact of even small amounts of bias on the RMSE of sample estimates.

Provided f is small, it is the sample size n, not the population size N, that controls both the RMSE and the relative contribution of bias and variance. For example, a random sample of n = 1,000 yields the same precision whether the population size is 100,000 or 100 million; for nonrandom forms of sampling, if YˉnYˉNn{\bar{Y}}_{n} - {\bar{Y}}_{N - n} is relatively independent of the population size, it is again the sample size that determines RMSE. Meng (and Bailey) argue based on Equation 1 that the population size is what matters for the assessment of bias. I describe in Section 4 why I view this logic as flawed. Before doing so, I question the interpretability of the data defect correlation, and discuss its relationship with previous superpopulation models for selection bias.

The Data Defect Correlation Is Not an Easily Interpreted Measure of Selection Bias

Equations 1 and 4 are both valid expressions, so the underlying mathematics is not an issue. The data defect correlation is a measure of selection bias, but I do not think it is very easy to interpret; the correlation usually measures association between continuous variables, and is not a natural measure of association when one of the variables, R, is binary. The fact that it a dimensionless quantity makes it potentially transportable across studies (but see the next section for a counterargument); it is less useful for assessing bias in a particular substantive setting. For example, I do not have a particularly strong intuitive notion of the difference between a correlation of 0.01 and 0.05. On the other hand, the more basic difference in means YˉnYˉNn{\bar{Y}}_{n} - {\bar{Y}}_{N - n} in Equation 2 has a real meaning in the context of a particular survey variable. If the variable is binary, it is the difference in proportions for selected and unselected units, and from a Bayesian perspective, a subject-matter expert might be able to posit a subjective prior distribution for this quantity.

The ability to address selection bias is limited without auxiliary information. Addressing selection bias more generally requires a superpopulation model for the joint distribution of R and Y, given auxiliary covariates X known for the whole population. Assuming for simplicity independence over population units i, two main approaches for formulating these models can be distinguished. Selection models factor the joint distribution of rir_{i} and yiy_{i} as

fY,R(ri,yixi,θ,ψ )=fY(yixi,θ )fRY(rixi,yi,ψ ),(5)f_{Y,R}\left( r_{i},y_{i}\left| x_{i},\theta,\psi \right.\ \right) = f_{Y}\left( y_{i}\left| x_{i},\theta \right.\ \right)f_{R|Y}\left( r_{i}\left| x_{i},y_{i},\psi \right.\ \right), \tag{5}

where the first factor characterizes the distribution of yiy_{i} and the second factor models the selection mechanism. Alternatively, pattern-mixture models factor the joint distribution as

fY,R(ri,yixi,ξ,ω )=fYR(yixi,ri,ξ )fR(rixi,ω ),(6)f_{Y,R}\left( r_{i},y_{i}\left| x_{i},\xi,\omega \right.\ \right) = f_{Y|R}\left( y_{i}\left| x_{i},r_{i},\xi \right.\ \right)f_{R}\left( r_{i}\left| x_{i},\omega \right.\ \right), \tag{6}

where the first distribution characterizes the distribution of yiy_{i} in the strata defined by selected and unselected units, and the second distribution models the probabilities of selection given the covariates (Glynn et al., 1986, 1993; Little, 1993; Rubin, 1977).

The data defect correlation ρR,Y\rho_{R,Y} in Meng’s formula (1) is analogous to the correlation ρ\rho^{*} in Heckman’s famous model for selectivity bias (Heckman, 1976), where selection into the sample occurs when a normal latent variable U crosses a threshold. This models the joint distribution in Equation 6 as:

[(yiui)xi,θ,ψ,σY2 ρU,Y]indG2((θ0+θTxi)ψ0+ψTxi)),(σY2ρU,YσYρU,YσY1)),ri=1 when ui>0;(7)\left\lbrack \begin{pmatrix} y_{i} \\ u_{i} \\ \end{pmatrix}\left| x_{i},\theta,\psi,\sigma_{Y}^{2} \right.\ \rho_{U,Y} \right\rbrack\sim_{\text{ind}}G_{2} \\ \left( \begin{pmatrix} \theta_{0} + \theta^{T}x_{i}) \\ \psi_{0} + \psi^{T}x_{i}) \\ \end{pmatrix}, \begin{pmatrix} \sigma_{Y}^{2} & \rho_{U,Y}\sigma_{Y} \\ \rho_{U,Y}\sigma_{Y} & 1 \\ \end{pmatrix} \right), r_{i} = 1\text{\ when\ }u_{i} > 0; \tag{7}

where G2(μ,Σ)G_{2}(\mu,\Sigma) is the bivariate Gaussian distribution with mean μ\mu and covariance matrix Σ\Sigma, and U is scaled to have variance 1. The correlation ρU,Y\rho_{U,Y} is a more natural measure of association than ρR,Y\rho_{R,Y} because it is between two normal variables Y and U. The model (7) implies the following probit selection model:

Example 1. A Probit Selection Model for Univariate Data

Suppose that for unit i,

(yixi,θ,σY2)indG(θ0+θTxi,σY2)(rixi,yi,ψ,λ)indBERN(Φ(ψ0+ψTxi+λyi)),(8)(y_{i}|x_{i},\theta,\sigma_{Y}^{2})\sim_{\text{ind}}G(\theta_{0} + \theta^{T}x_{i},\sigma_{Y}^{2}) \\ (r_{i}|x_{i},y_{i},\psi,\lambda)\sim_{\text{ind}}\text{BERN} \left( \Phi(\psi_{0} + \psi^{T}x_{i} + \lambda y_{i}) \right), \tag{8}

where BERN denotes the Bernoulli distribution, Φ\Phi denotes the probit (cumulative normal) distribution function, and λ=ρU,Y/σY\lambda = \rho_{U,Y}/\sigma_{Y}.

The parameter λ\lambda in (8) is thus a rescaled analog of the data defect correlation, adjusted for the covariates X; the probit transformation addresses the fact that R is binary. Selection is ignorable if λ\lambda = 0, and nonignorable if λ\lambda ≠ 0. In practice, λ\lambda is only properly estimable if restrictions are placed on the regressions of R and/or Y on X, by setting one or more regression coefficients to zero. Results are then highly sensitive to whether these assumptions are correct.

The following pattern-mixture model can be viewed as a natural generalization of the expression in Equation 2:

Example 2. A Normal Pattern-Mixture Model for Univariate Data

An alternative to (8) is the following pattern-mixture model:

(yiri=r,xi,ξ,σ(0),σ(1))indG(ξ0(r)+ξxi,σ(r)2)(rixi,ω)indBERN(Φ(ω0+ωxi)).(9)(y_{i}|r_{i} = r,x_{i},\xi,\sigma^{(0)},\sigma^{(1)})\sim_{\text{ind}}G(\xi_{0}^{(r)} + \xi x_{i},\sigma^{(r)2}) \\ (r_{i}|x_{i},\omega)\sim_{\text{ind}}\text{BERN}\left( \Phi(\omega_{0} + \omega x_{i}) \right). \tag{9}

The difference in means for selected and unselected cases, namely δ=ω0(1)ω0(1)\delta = \omega_{0}^{(1)} - \omega_{0}^{(1)}, characterizes the selection bias, adjusted for the covariates X.

Little and Rubin (2019, chap. 15) argue that the pattern-mixture model (Example 2) is easier to interpret than the selection model (Example 1). In particular, in (9), the selection effect is characterized by δ\delta, a parameter that has a simple interpretation as an adjusted difference in means. In Equation 8, the effect of selection is characterized by the parameter λ\lambda, which has the (more obscure) interpretation as the effect of increasing Y by one unit on the probit of the probability of selection, adjusting for the covariates. In Greenlees et al. (1982), the probit is replaced by a logit model, but the interpretation of λ\lambda remains difficult to grasp. I feel the same way about Meng’s data defect correlation.

Other useful features of the pattern-mixture model are that it is often much easier to fit than the probit selection model, given assumptions to render the parameters estimable, and that it can include covariates that are only available in aggregate data form. Also, imputations of the missing values are based on the predictive distribution of Y given X and R = 0, which is modeled directly in the pattern-mixture factorization. For sensitivity analysis for nonignorable selection, pattern-mixture models are easier to implement and (in my view) easier to interpret for nonstatisticians.

The Data Defect Correlation Is Not a Universal Constant

I now return to the simpler case without auxiliary covariates. The data defect correlation has another feature that limits its value as a universal constant: its value varies depending on the selection fraction f.

If N is increased holding ρR,Y,σY and n\rho_{R,Y},\sigma_{Y}\text{\ and\ }n fixed, then the left side of Equation 1 increases, which seems to imply that bias is related to population size. This conclusion is a consequence of assuming that the ρR,Y\rho_{R,Y} has the same interpretation as a measure of selection bias for all values of N. But Equation 1 implies that

ρR,Y=(YˉnYˉNn)σY×nNn\rho_{R,Y} = \frac{({\bar{Y}}_{n} - {\bar{Y}}_{N - n})}{\sigma_{Y}} \times \sqrt{\frac{n}{N - n}}

the standardized selection bias multiplied by n/(Nn)\sqrt{n/(N - n)}. Thus, given values of n, Yˉn{\bar{Y}}_{n}, YˉNn{\bar{Y}}_{N - n}, and σY\sigma_{Y}, the absolute value of ρR,Y\rho_{R,Y} shrinks toward zero as N is increased. Thus, the interpretation of ρR,Y\rho_{R,Y}varies as a function of N—it is different for a population size of 100,000 and for a population size of 100 million. For this reason, it does not make sense (as Meng and Bailey imply) to treat ρR,Y\rho_{R,Y} as a ‘universal constant,’ with the same interpretation irrespective of N. On the other hand, the difference in selected and unselected means is independent of n and N aside from precision considerations.

As a numerical example, suppose n=100n = 100, Y is binary, and Yˉn=0.2{\bar{Y}}_{n} = 0.2, YˉNn=0.3{\bar{Y}}_{N - n} = 0.3.

If N=105N = {10}^{5}, then YˉN0.3{\bar{Y}}_{N} \simeq 0.3, σY=0.458\sigma_{Y} = 0.458, (YˉnYˉNn)/σY=0.218({\bar{Y}}_{n} - {\bar{Y}}_{N - n})/\sigma_{Y} = - 0.218, ρR,Y=0.0069\rho_{R,Y} = - 0.0069.

If N = 106, then YˉN0.3{\bar{Y}}_{N} \simeq 0.3, σY=0.458\sigma_{Y} = 0.458,(YˉnYˉNn)/σY=0.218({\bar{Y}}_{n} - {\bar{Y}}_{N - n})/\sigma_{Y} = - 0.218, ρR,Y=0.0022\rho_{R,Y} = - 0.0022.

The data defect correlation is very different in these cases, but the selection bias is approximately the same.

Approaches to Assessing Selection Bias

Joint models for R and Y all suffer from lack of identifiability of parameters. In the context of nonresponse, a close relative of the problem of selection, Little and Rubin (2019, chap. 15) describe five strategies for missing not-at-random models:

  1. Collect data on a subsample of nonrespondents, and use information from this sample to weight the selected units or impute the values of unselected units. Simulation studies in Glynn et al. (1986) in the context of nonresponse suggest that even a small nonrespondent subsample can markedly reduce sensitivity of inference to nonignorable nonresponse.

  2. Use Bayesian modeling, including a prior distribution for unidentified parameters. An early example is Rubin (1977).

  3. Impose restrictions to identify parameters. Two applications of this approach in the context of the Heckman (1976) model, one successful and one not successful, are described in Little and Rubin (2019, Examples 15.11 and 15.12). Bailey’s proposal for randomized response designs is an example, though this approach seems more tuned to missing data than to selection effects.

  4. Selectively discard data based on assumed missing not-at-random assumptions. See in particular subsample ignorable likelihood methods (Little & Zhang, 2011).

  5. Conduct a sensitivity analysis, varying values of the parameters measuring deviations from ignorable selection. Example 3 below illustrates one such approach. The approach was originally developed in the context of nonresponse (Andridge & Little, 2011), but has recently been adapted to develop indices of selection bias for nonrandom samples for means (Boonstra et al., 2021; Little et al., 2020) and regression coefficients (West et al., 2021). Extensions to a binary outcome are provided in Andridge and Little (2020) and Andridge et al. (2019).

Example 3. Proxy Pattern-Mixture Analysis

Let Y be a continuous survey variable recorded on the selected sample, and Z=(Z1,...,Zp)Z = (Z_{1},...,Z_{p}), a set of variables that are observed for the population. Replace Z by a single proxy variable X that has the highest correlation with Y. This proxy variable can be estimated by regressing Y on Z using the selected sample, and taking X to be the predicted values of Y, available for both selected and unselected units. This regression should include important predictors of Y, as well as interactions and nonlinear terms where appropriate. Let ρ(1)\rho^{(1)} be the correlation of Y and X among the selected cases, which we assume is positive. If the ρ(1)\rho^{(1)} is high (say, 0.8), X is called a strong proxy for Y, and if ρ(1)\rho^{(1)}is low (say, 0.2), X is called a weak proxy for Y. The proposed method is based on a bivariate pattern-mixture model (Little, 1994) for the distribution of (X,Y) for selected and unselected units:

((X,Y)R=r)G2((μx(r),μy(r)),Σ(r))\left( (X,Y)|R = r \right)\sim G_{2}\left( (\mu_{x}^{(r)},\mu_{y}^{(r)}),\Sigma^{(r)} \right)

RBernoulli(π)Σ(r)=(σxx(r)ρ(r)σxx(r)σyy(r)ρ(r)σxx(r)σyy(r)σyy(r))Pr(R=1X,Y)=f(Y),Y=(1ϕ)Xσyy(1)/σxx(1)+ϕY,0ϕ1,\large {R\sim Bernoulli(\pi) } \\ {\Sigma^{(r)} = \begin{pmatrix} \sigma_{xx}^{(r)} & \rho^{(r)}\sqrt{\sigma_{xx}^{(r)}\sigma_{yy}^{(r)}} \\ \rho^{(r)}\sqrt{\sigma_{xx}^{(r)}\sigma_{yy}^{(r)}} & \sigma_{yy}^{(r)} \\ \end{pmatrix} } \\ {\Pr(R = 1|X,Y) = f(Y^{*}),Y^{*} = (1 - \phi)X\sqrt{\sigma_{yy}^{(1)}/\sigma_{xx}^{(1)}} + \phi Y,0 \leq \phi \leq 1,}

where f is an arbitrary function. With some additional assumptions, missingness can also depend on auxiliary variables uncorrelated with X. The maximum likelihood estimate of the population mean of Y is

μ^(ϕ)=Yˉn+g(ρ^(1))(sy/sx)(XˉNXˉn),g(ρ^(1))=(ϕ+(1ϕ)ρ^(1)1ϕ+ϕρ^(1)),\widehat{\mu}(\phi) = {\bar{Y}}_{n} + g({\widehat{\rho}}^{(1)})(s_{y}/s_{x})({\bar{X}}_{N} - {\bar{X}}_{n}),g({\widehat{\rho}}^{(1)}) = \left( \frac{\phi + (1 - \phi){\widehat{\rho}}^{(1)}}{1 - \phi + \phi{\widehat{\rho}}^{(1)}} \right),

where Yˉn{\bar{Y}}_{n} is the mean of Y in the selected sample, Xˉn{\bar{X}}_{n} and XˉN{\bar{X}}_{N} are the means of X in the selected sample and population, sy and sxs_{y}\text{\ and\ }s_{x} are standard deviations of Y and X in the selected sample, and ρ^(1){\widehat{\rho}}^{(1)} is the sample correlation between X and Y. The proposed sensitivity analysis calculates estimates for three values of ϕ\phi:

ϕ=0, g(ρ^(1))=ρ^(1) (Selection at random, usual regression estimator)ϕ=0.5, g(ρ^(1))=1 (Selection not at random, carries over the biasadjustment estimated from proxy)ϕ=1, g(ρ^(1))=1/ρ^(1) (Selection at random, inverse regression estimator)\large {\phi\text{=0,\ }g({\widehat{\rho}}^{(1)}) = {\widehat{\rho}}^{(1)}\ (\text{Selection\ at\ random,\ usual\ regression\ estimator}) } \\ {\phi\text{=0.5,\ }g({\widehat{\rho}}^{(1)}) = 1\ (\text{Selection\ not\ at\ random,\ carries\ over\ the\ bias}} \\ {\text{adjustment\ estimated\ from\ proxy}) } \\ {\phi\text{=1,\ }g({\widehat{\rho}}^{(1)}) = 1/{\widehat{\rho}}^{(1)}\ (\text{Selection\ at\ random,\ inverse\ regression\ estimator})}

Bayesian inference for this model is also relatively straightforward. For recent applications, see Andridge and Thompson (2015), West and Andridge (2023), and Andridge (2023).

Sensitivity to nonignorable selection bias in the above analysis is reduced when ρ\rho is large, which means having auxiliary variables that are good predictors of Y; a weakness in many current surveys is that auxiliary variables are confined to demographic variables that do not have this property. An important role for existing high-quality probability surveys is to provide good auxiliary information for other nonprobability samples. One practical step is to include in the nonprobability survey any variables that (a) are good predictors of the survey content and (b) are available in large probability samples like the American Community Survey. These variables can then be incorporated as auxiliary variables in a proxy pattern-mixture analysis.


I appreciate useful suggestions on this article by the editor, and my colleagues Michael Elliott and Yajuan Si.

Disclosure Statement

Roderick J. Little has no financial or non-financial disclosures to share for this article.


American Association for Public Opinion Research. (2022). Data quality metrics for online samples: Considerations for study design and analysis.

Andridge, R. R. (2023). Using proxy pattern-mixture models to explain bias in estimates of COVID-19 vaccine uptake from two large surveys. ArXiv.

Andridge, R. R., & Little, R. J. (2011). Proxy-pattern mixture analysis for survey nonresponse. Journal of Official Statistics, 27(2), 153–180.

Andridge, R. R., & Little, R. J. (2020). Proxy pattern-mixture analysis for a binary survey variable subject to nonresponse. Journal of Official Statistics, 36(3), 703–728.

Andridge, R. R., & Thompson, K. J. (2015). Assessing nonresponse bias in a business survey: Proxy pattern-mixture analysis for skewed data. Annals of Applied Statistics, 9(4), 2237–2265.

Andridge, R. R., West, B. T., Little, R. J., Boonstra, P. S., & Alvarado-Leiton, F. (2019). Indices of non-ignorable selection bias for proportions estimated from non-probability samples. Journal of the Royal Statistical Society Series C: Applied Statistics, 68(5), 1465–1483.

Bailey, M. A. (2023). A new paradigm for polling. Harvard Data Science Review, 5(3).

Boonstra, P. S., Little, R. J., West, B. T., Andridge, R. R., & Alvaredo-Leiton, F. (2021). A simulation study of diagnostics for bias in non-probability samples. Journal of Official Statistics, 37(3), 751–769.

Bradley, V. C., Kuriwaki, S., Isakov, M., Sejdinovic, D., Meng, X-L., & Flaxman, S. (2021). Unrepresentative big surveys significantly overestimated US vaccine uptake. Nature, 600, 695–700.

Glynn, R. J., Laird, N. M., & Rubin, D. B. (1986). Selection modeling versus mixture modeling with nonignorable nonresponse. In H. Wainer (Ed.), Drawing inferences from self-selected samples (pp. 115–142). Springer-Verlag.

Glynn, R. J., Laird, N. M., & Rubin, D. B. (1993). Multiple imputation in mixture models for nonignorable nonresponse with follow-ups. Journal of the American Statistical Association, 88(423), 984–993.

Greenlees, W. S., Reece, J. S., & Zieschang, K. D. (1982). Imputation of missing values when the probability of response depends on the variable being imputed. Journal of the American Statistical Association, 77, 251–261.

Heckman, J. (1976). The common structure of statistical models of truncation, sample selection and limited dependent variables, and a simple estimator for such models. Annals of Economic and Social Measurement, 5, 475–492.

Keiding, N., & Louis, T. A. (2016). Perils and potentials of self-selected entry to epidemiological studies and surveys (with discussion). Journal of the Royal Statistical Society Series A: Statistics in Society, 179(2), 319–376.

Little, R. J. (1993). Pattern‑mixture models for multivariate incomplete data. Journal of the American Statistical Association, 88(421), 125–134.

Little, R. J. (1994). A class of pattern‑mixture models for normal missing data. Biometrika, 81(3), 471–483.

Little, R. J., & Rubin, D. B. (2019). Statistical analysis with missing data (3rd ed.). Wiley.

Little, R. J., West, B. T., Boonstra, P. S., & Hu, J. (2020). Measures of the degree of departure from ignorable sample selection. Journal of Survey Statistics and Methodology, 8(5), 932–964.

Little, R. J., & Zhang, N. (2011). Subsample ignorable likelihood for regression analysis with missing data. Journal of the Royal Statistical Society Series C: Applied Statistics, 60(4), 591–605.

Meng, X.-L. (2016). Discussion of paper by Keiding & Louis. Journal of the Royal Statistical Society Series A: Statistics in Society, 179(2), 351–352.

Meng, X.-L. (2018). Statistical paradises and paradoxes in big data (I): Law of large populations, big data paradox, and the 2016 US presidential election. Annals of Applied Statistics, 12(2), 685–726.

Rubin, D. B. (1977). Formalizing subjective notions about the effect of nonrespondents in sample surveys. Journal of the American Statistical Association, 72(359), 538–543.

West, B. T., & Andridge, R. R. (2023). An evaluation of 2020 pre-election polling estimates using new measures of non-ignorable selection bias. Public Opinion Quarterly, 87(S1), 575–601.

West, B. T., Little, R. J., Andridge, R. R., Boonstra, P., Ware, E. B., Pandit, A., & Alvarado-Leiton, F. (2021). Assessing selection bias in regression coefficients estimated from nonprobability samples with applications to genetics and demographic surveys. Annals of Applied Statistics, 15(3), 1556–1581.

©2023 Roderick J. Little. This article is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the article.

No comments here
Why not start the discussion?