Skip to main content

# Abstract

# Media Summary

# 1. Toward a New Polling Paradigm

# 2. Paradigm Lost

# 3. Paradigm Found

## 3.1. The Importance of $\rho$

## 3.2. Population Size Matters

## 3.3. Random Contact

## 3.4. How Current Practice Fits Into the New Paradigm

# 4. The New Paradigm in Practice

### Example 1: Using Observational Data to Account for $\rho$

### Example 2: Randomized Response Instruments

# 5. The Future of Polling

# Acknowledgments

# Disclosure Statement

# References

# Appendix

## Derivation of the Meng equation

## Derivation of sampling error in random contact case

##### Connections

1 of 5

A New Paradigm for Polling

Additional discussions and rejoinder forthcoming.

Published onJul 27, 2023

A New Paradigm for Polling

Scientific fields operate within paradigms that define problems and solutions for a community of researchers. The dominant paradigm in polling centers on random sampling, which is unfortunate because random sampling is, for all practical purposes, dead. The pollsters who try to produce random samples fail because hardly anyone responds. And more and more pollsters do not even try. The field therefore has folded weighting-type adjustments into the paradigm, but this too is unfortunate because weighting works only if we assume away important threats to sampling validity, threats that loom particularly large in the growing nonprobability polling sector. This article argues that the polling field needs to move to a more general paradigm built around the Meng (2018) equation that characterizes survey error for *any* sampling approach, including nonrandom samples. Moving to this new paradigm has two important benefits. First, this new paradigm elevates new insights, including the fact that survey error increases with population size when individuals’ decisions to respond are correlated with how they respond. This insight helps us understand how small sampling defects can metastasize into large survey errors. Second, the new paradigm points the field toward new methods that more directly identify and account for sampling defects in a nonrandom sampling environment. This article describes the intuition and potential power of these new tools, tools that are further elaborated in Bailey (2024).

**Keywords:** survey nonresponse, survey design, survey methods

Low response rates and low-cost internet polls have for all practical purposes killed the random sampling paradigm that built the public opinion field. This article argues that the polling field needs to move to a more general paradigm built around the Meng (2018) equation that characterizes survey error for *any* sampling approach, including nonrandom samples. Moving to this new paradigm elevates new insights and points the field toward new methods that address more of the challenges of the contemporary polling environment. The article summarizes work that uses randomized response instruments that provide a systematic way to determine whether the people who respond to polls differ from those who do not, even after controlling for demographics. Such work has found that polls in the Midwest understated Trump support and overstated the liberalism of Democratic voters.

A scientific paradigm provides a model for articulating problems, solutions, and future research directions for a community of practitioners (Kuhn, 1970). In polling, the main paradigm has long revolved around random sampling, a tool that provides an elegant way to make inferences about a large population based on information from a relatively small, randomly chosen subset of people.

Because it is incredibly difficult to randomly sample in the contemporary polling environment, most pollsters augment random sampling with weighting and related tools such as quota sampling and multilevel regression with poststratification. These weighting-type adjustments make the nonrandom samples resulting from nonresponse look like they came from a random sample, but with a cost: the techniques require us to assume that the decision to respond is independent of the content of response once the weighting variables have been controlled for.

I argue in this article that the weighting-augmented random sampling paradigm is ill-suited for the contemporary polling environment. First, the random sampling heart of the paradigm is hardly relevant today given low response rates and nonprobability samples. Nonetheless, polls are routinely ‘pollwashed’ in ways that make them appear to have inherited the precision and distributional properties of random sampling even though they have not. Second, weighting-type adjustments bear the weight for fixing nonrandomness in modern polling, but are built on assumptions that are quite restrictive, especially in the current environment in which respondents are often recruited via nonrandom mechanisms.

The field needs a better paradigm, one that moves beyond random sampling without relying on the strong assumptions involved in weighting. The simple decomposition of survey error provided by Meng (2018) provides the foundation for such a paradigm. Instead of reducing pollsters to explaining their work in terms of idealized and never-seen random samples, we can characterize survey error for *any* sampling approach, including nonrandom samples and samples that arise when survey response is related to survey content.

Shifting to a modern polling paradigm produces two important payoffs. First, the new paradigm provides intuition that is more relevant to current polling practice. A key element of the Meng equation is a so-called data defect parameter that characterizes the degree to which *whether* someone responds is related to *how* someone responds. This parameter tends to get lost in the dominant polling paradigms: random sampling essentially minimizes it, while weighting-type methods assume it away. The Meng equation makes clear that this parameter is centrally important and interacts with population size—and not, to be clear, sample size. Even a small data defect in sampling can create large survey errors when surveying large populations (Bradley et al., 2021). The Meng equation also helps us appreciate why random contact is worthwhile even when response rates are low.

The second payoff of the new paradigm is that it helps us chart a path forward for research on survey methods. A great deal of survey research—including research on nonrandom samples, as described in Wu (2023) —focuses on adjustments that assume there is no correlation between whether and how people respond after controlling for population-benchmarked variables. Given the critical role of data defects in the new paradigm, it is no longer tenable to focus so heavily on approaches that assume them away. Instead, the new paradigm points us toward tools that minimize, measure, and counteract any relationships between whether and how people respond to surveys. Bailey (2024) elaborates these benefits and provides additional context and tools.

To appreciate the challenges of the current paradigmatic ambiguity, consider two polls: one uses a probability-based sample by a respected newspaper with a one percent response rate. The other is a Twitter poll initiated by an unpredictable billionaire. Suppose they both have the same sample size and that demographic data is available so that the weighted results are ‘nationally representative.’ Most polling experts will have a strong preference for one of the polls, but random sampling provides little direct guidance, other than helping us appreciate that neither sample is random and hence both could be biased. Within the new paradigm, on the other hand, the Meng equation enables us to clearly show why the newspaper poll is higher in quality, as I discuss below.

The goal of this article is to provide an overview of a new way of thinking about polling that is better suited to the contemporary polling environment than today’s focus on weighting and other tools that assume ignorable nonresponse. Section 2 highlights what we already know: random sampling is a distant echo of polling as practiced. Section 3 presents the Meng equation, focusing on its distinctive intuition. Section 4 shows how a paradigm built around the Meng equation naturally points to new research agendas, providing two examples in which approaches motivated by the new paradigm are able to address important survey challenges.

Modern polling began with a commonsense but not deeply theorized paradigm of more-is-better (Bailey, 2024; Converse, 2009). The exemplar of this approach was the *Literary Digest*, a magazine that sent millions of surveys to voters before presidential elections. They had a decent track record until 1936 when their polls infamously indicated that Republican Alf Landon would win in a landslide. He lost in a landslide, discrediting the early big-data approach to polling. Quota samplers such as George Gallup filled the void, showing how relatively small representative samples were more accurate. They did well until 1948 when they predicted Republican Thomas Dewey would defeat President Harry Truman. Truman won, famously hoisted the “Dewey Defeats Truman” newspaper and sent the polling community scrambling for a more robust paradigm.

Random sampling filled the gap (Neyman, 1934). Using standard statistical theory, one could characterize the statistical properties of the mean of a random sample in ways that enabled accurate and systematic reasoning about population attributes from samples in the hundreds or low thousands. Remarkably, the accuracy of random sampling depends on the sample size, not the population size. Fortuitously, widespread adoption of telephones made random sampling cheap to implement.

The theory assumes that everyone randomly contacted for a survey responds. This was never true, but response rates were high and the connection between response and political views was attenuated enough that random sampling provided a decent approximation to guide political polling.

Over the last several decades, the relevance of random sampling theory has declined, largely due to accelerating levels of nonresponse. In the late 1990s, 60% of those contacted for political polls did not respond; today, that number is often 95% or higher (Cohn, 2022, October 12; Kennedy & Hartig, 2019).

The first problem that low response creates is that it attenuates—and probably breaks—the connection between survey theory and practice. No one thinks that the 1% of people who respond when contacted are truly a random sample of the population. The field therefore accommodated large-scale nonresponse by augmenting random sampling with weighting. Weighting involves placing more weight on respondents from groups who are underrepresented in a sample relative to their population proportion and placing less weight on respondents from groups who are overrepresented relative to their population proportion. Weighting requires identifying variables that affect response and the attribute being surveyed from among those variables for which the pollster knows the totals in the population. Typically, these variables are demographic variables such as age, race, gender, income, region, and education.

The shift from the random sampling paradigm to the random sampling-plus-weighting paradigm is so pervasive that it is unremarkable to many pollsters, even as they acknowledge the many decisions that must be made when weighting data (Gelman, 2007). Weighting is not costless, however, as it requires pollsters to assume that nonresponse is ignorable, meaning that the decision to respond is independent of the content of response once we have controlled for the weighting variables. This assumption implies that the people who choose to reply are representative samples given the covariates used in the weighting. A violation of this assumption means that nonresponse is non-ignorable, meaning that even after weighting, poll respondents differ from nonrespondents.

The assumption that the response mechanism is ignorable is also referred to as a mechanism that produces data that is ‘missing at random.’ Little and Rubin (2014, p. 22) note that virtually every approach to dealing with missing data makes this strong assumption. This list includes mutltilevel regression with poststratification (so called MrP models) (Gelman & Hill, 2007) and nearest-neighbor imputation (YouGov, 2014).

Nonignorable nonresponse is concerning in many contexts.

Virtually every postmortem of the 2016 and 2020 U.S. presidential elections raised the possibility that weighting failed to properly adjust for the possibility that voters favoring Trump were less likely to respond, especially in the Midwest; see, e.g., Clinton et al. (2021), Kennedy et al. (2018).

Surveys of voting typically overestimate turnout, likely in part due to non-ignorable nonresponse (Jackman & Spahn, 2019).

Bradley et al. (2021) provide evidence that the type of people who get vaccinated are more likely to respond to some polls (especially ones based on nonrandom samples) even after controlling for demographics.

In marketing, evidence suggests that people’s willingness to provide product feedback depends on their experience with the product (Schoenmueller et al., 2020).

Low response rates have created another problem that has been harder to ignore: rising costs. It is now very expensive to field probability-based polls because pollsters need to wade through dozens of nonrespondents before they reach a single respondent, leading some to doubt the viability of the approach (Cohn, 2022, October 12). An increasing number of pollsters therefore have moved to nonprobability samples that are created by finding people willing to answer polls via ads, outreach to mailing lists and other, often opaque and sometimes novel, methods (Clinton et al., 2021; Wang et al., 2015). Pollsters use weighting-type adjustments to produce samples that are representative with regard to demographic benchmarks.

While true random sampling produces estimates with clear measures of quality, the field has struggled to operationalize quality in the post–random sampling world. Some pollsters anachronistically use the language of random sampling to imply that their polls have the properties of a random sample, a process I call ‘pollwashing.’ One way to do this is to report margins of error even though the theoretical basis of a margin of error is undone by nonresponse (and especially massive nonresponse, to say nothing of a nonrandom sample) (Shirani-Mehr et al., 2018).

Another tool for pollwashing is for pollsters to claim their samples are ‘nationally representative’ (Jamieson et al., 2023). In random sampling, a sample is probabilistically representative of a target population in expectation. In weighting, a sample can be made to share certain distributional characteristics with the population for variables used in the weighting protocol. This provides the survey with an aura of accuracy even for polls that have at best a modest claim at being truly representative in the way that an actual random sample would be. It is easy to see how this concept can stretch the concept of representativeness to the breaking point. Consider, for example, an opt-in internet poll on a candidate’s website. The data could be weighted to be nationally representative with respect to demographics, but no serious pollster would consider the sample representative in the sense that a true random sample would be. University of Michigan polling expert Raphael Nishimura summed it up nicely: “For the laymen, [representative sample] sounds like a well-defined technical sampling term, but it’s not. This is just as vague and meaningless as saying that a sample is ‘robust,’ ‘statistically valid’ or ‘awesome’ ” (Nishimura, 2023).

Pollwashing extends even to sample size. In a random sample, the survey average converges to the population average as the sample size increases, making sample size a useful metric for precision. In nonrandom samples, however, large samples guarantee little. We’ve known this since the 1936 *Literary Digest* fiasco, yet surveys continue to report sample sizes for their ‘nationally representative’ samples with the implication that more is better. When samples are nonrandom, however, our intuition that more is better—one of the core insights for random sampling—fails. Bradley et al. (2021) and others have shown that if response is correlated with opinion, the sample size can be wildly unreflective of the amount of information in a sample. I address this point below, as well.

Given the lack of a clear measure for assessing polls, some in the field use predictive accuracy as measure of polling quality (Silver, 2023). The danger with this approach is that if surveys have a systematic error, then a tool that counteracts that bias—however crudely—will do well. Survey firms with Republican bias were relatively accurate in 2016 and 2020. Were their methods better? Or were they biased in a fortuitous way for those elections? Many of these same firms performed poorly in 2018 and 2022, suggesting limits to polling accuracy as a measure of quality. Perhaps with enough time and a stable polling environment, track records may prove meaningful, but rather than waiting for polling methods to be exposed in an election, a better aspiration is to have a paradigmatic set of standards against which to judge polling methods.

What remains once we have ruled out metrics of survey quality such as demographic representativeness or large sample sizes or predictive accuracy? In this section, I articulate a framework that answers this question. The framework builds on Meng’s (2018) surprisingly simple and completely general characterization of survey error. It helps us contextualize survey error across protocols and points to shared standards and future research directions.

The framework is built on a simple model of the sample mean of a variable of interest,

$\underbrace{\overline{Y}_n}_{\text{Sample average}} - \underbrace{\overline{Y}_N}_{\text{Population average}}.$

At this point, we are not doing any statistical modeling; we are simply calculating the difference between the average value of

$\overline{Y}_n-\overline{Y}_N =\underbrace{\rho_{R,Y}}_{\text{data defect correlation}} \times \underbrace{\sqrt{\frac{N-n}{n}}}_{\text{data quantity}} \times \underbrace{\sigma_Y}_{\text{data difficulty}}.\hspace{1.in} (1)$

The first term on the right-hand side of the Meng Equation is

Because the Meng equation is an accounting identity, we know that if

The second term on the right-hand side of the Meng equation is

The final term on the right-hand side of the Meng equation is

The Meng equation is very general, relatively simple, and remarkably insightful. It motivates intuitions that provide a more robust starting point for thinking about modern polling than random sampling. Here I focus on three important insights that are largely absent in the weighting-augmented random sampling paradigm, but clear in the new paradigm.

Survey error is the product of three terms, meaning that we need to think of survey error as a combination of factors. If any one of the terms is zero, then survey error is zero, whatever the value of the other terms. The data quantity term is zero only if the sample (

The only term that we can realistically drive toward zero with survey methods is

Neither of these methods are satisfying in the contemporary context in which nonresponse renders random samples virtually impossible and in which we would rather not solve problems by assuming them away. As seen in Section 2, there are many plausible scenarios in which response and outcome are correlated, even after controlling for observable covariates with known population proportions.

The data quantity term in the middle of the right-hand size of the Meng equation is a function of sample size

Notice that the sample size is doing something quite different than it does in random sampling. The expected mean from a random sample is the true value *no matter what the sample size is*. The power of a larger sample in random sampling is to reduce the sampling variance of the mean. In the Meng equation, a larger sample is associated with smaller error. Roughly speaking, the Meng equation says that when

Sampling error also depends on the size of the population,

In nonrandom samples, however, population size matters. Figure 1 based on Bailey (2024) displays samples of 20 from a relatively large and a relatively small population. Each square is a person. On the

The blue squares are respondents. In the large population panel on the left, there are 328 people, 20 of whom respond (about 6%). These respondents are quite unrepresentative. Every one of them rates Biden above 40 and their average rating is 68, which is much higher than the population average of 40.

In the small population panel on the right, there are 40 people. As with the panel on the left, the sample size is 20. The respondents are also unrepresentative, but the magnitude of the unrepresentativeness is much smaller because the pollster had to go deeper into the pool to get 20 responses. This means that less unrepresentative people made their way into the sample, leading us to see ratings of Biden values as low as 25 and an average rating of Biden among respondents of 50, which is higher than the population average of 40, but not as far off as for the large population example.1 In other words, the example shows how a sample of size

Building from the Meng equation, we can articulate a third important insight for contemporary polling (Bailey, 2024). First, let us distinguish:

**Random sample**: What random sampling theory is built on, but is*not*delivered by probability-based polling.**Random contact**: What probability-based polling actually does. Pollsters using probability-based polls randomly contact people who may or may not respond.

Given that random contact is very expensive and nonetheless produces nonrandom samples, it is easy to sympathize with pollsters who have given up on random contact. However, random contact does important work even with very low response rates, an intuition that is hard to see in the current random sampling-plus-weighting paradigm.

To show this, I first show graphically how randomly choosing whom to contact unlinks the connection between sampling error and population size. After that, I use Meng’s equation to reconsider how non-ignorable nonresponse affects error when contact (but not response) is randomized.

Figure 2 from Bailey (2024) starts with the ‘large’ population in panel (a) of Figure 1. We know from Figure 1 that a sample of 20 respondents will produce a highly skewed sample, with an average

Random contact does not imply that those who respond are a random sample. After all, people choose to pick up the phone or respond to an email and this process can be influenced by many nonrandom factors, including factors correlated with

Even though the sample is skewed, random contact has done something very important. The sample of 20 respondents from the random contact survey is not as unrepresentative as the sample of 20 respondents from the large population panel of Figure 1. We no longer get the

In other words, while random contact does not eliminate error associated with a positive value of *population* size. In terms of the equation, Meng (2021) and Bailey (2024) show that survey error in a random contact survey is

$\overline{Y}_n-\overline{Y}_N =\rho_{R,Y}\times \underbrace{\sqrt{\frac{1-p_r}{p_r}}}_{\text{data quantity}} \times\sigma_Y \hspace{1.5in} (2)$

where

One of the nice features of the new paradigm based on the Meng equation is that it is general enough to encompass the multiple approaches to surveys that dominate the field. We have already seen that random sampling is a mechanism that drives

Weighting approaches succeed if

The new paradigm also makes it harder to ignore potential weaknesses of weighting. Most polls are reported with no effort to measure

Equation 1 helps us appreciate the scale of the sampling problem we face in a post–random sampling world. It does not, however, provide specific guidance on estimating

While some believe that there is little to be done to measure or undo nonzero

In this section I describe two such approaches. Both rely on response instruments, which are variables that affect the probability of response but do not directly affect the outcome of interest. Sun et al. (2018) show that a broad class of weighting, imputation, and doubly robust models can work if a response instrument is available. Bailey (2024) shows examples of how even parametric models that do not literally require response instruments tend to perform much better when a response instrument is available.

The first example uses an observational response instrument, which is convenient but that suffers from the usual concerns the literature has about observational instruments related to whether they actually have no effect on

The intuition behind response rates is straightforward. If response is ignorable conditional on covariates, then the expected value of

Like many polls, the 2020 American National Election Study (ANES) overestimated Biden’s support. Biden won the popular vote by 4.4 percentage points, but the 2020 ANES preelection poll reported that Biden led Trump by 11.8 percentage points. Weighting did not help, as Biden’s margin was 12.6 percentage points when responses were weighted.2

Signs that

If people who are more interested in politics are more likely to answer a poll about politics—which hardly seems unreasonable—then the ANES may have too many people interested in politics and thereby produced a sample that was more pro-Biden than the population. While it seems natural to model likely lower support for Biden among nonrespondents, pollsters did not do so because weighting-type adjustments are not feasible for a variable like political interest, which does not have a known population-level distribution.

Grounding our thinking about polling in the Meng equation, however, it becomes harder to dismiss the possibility that

Directly modeling and estimating

Using such models requires new thinking and, of course, does not eliminate the necessity of assumptions. However, instead of assuming that nonrespondents have the same political interest as respondents—as required in standard weighting-type adjustments—these models allow us to incorporate information in the data that suggest nonrespondents differ from respondents in important ways. In recent years, such models have made use of advances in copulas (Gomes et al., 2019), moment estimators (Burger & McLaren, 2017; Sun et al., 2018), and other approaches in ways that allow them to be more robust to distributional assumptions and other concerns.

The challenge with observational models is that is is often hard to definitively defend the assumption that one or more variables affect

There are several attractive features of creating randomized response instruments. First, doing so builds on our long-standing inclination to use sampling design to solve sampling problems. After all, in random sampling the design of the survey has long been accepted as a better way to solve sampling error than by increasing sampling size in a nonrandom way. Second, this approach is simple to implement as the pollster need only identify a protocol that affects survey response, a task that is familiar among pollsters who have long explored how to increase response rates.

Figure 4, based on Bailey (2024), illustrates the logic of randomized response instruments. The purpose of the figure is to highlight how and why such instruments can allow us to assess whether the response mechanism is ignorable or not. The configuration of population values is reasonable, but should not be taken by itself as a general characterization of the world. Reality may deviate from the features in the figure and readers are encouraged to review Bailey (2024), which includes tools such as control functions, copulas, and specification searches that can enable modern non-ignorable nonresponse–oriented tools to work under a broad range of circumstances. These tools do not work under all circumstances, though. Sun et al. (2018) provides a formal treatment of identification and Bailey (2024) provides a practical discussion of how to deal with threats to validity in these models.

As with Figures 1 and 2, the panels in Figure 4 plot the response interest and values of

When

The figure shows how variation in

While the logic is straightforward, estimation requires models such as those described in Bailey (2024). These models range from widely known Heckman models, to copula models, to methods of moments estimators. Bailey (2024) argues that the quality of data often dominates the choice of model, meaning that the key step is typically creating a good randomized-response instrument; with that in hand, the models tend to produce similar results.

For example, Bailey (2023) presents doubly robust estimates that use a randomized response instrument to create estimates that are robust to non-ignorable nonresponse. The double robustness comes from incorporating both weighting- and imputation-based approaches in a way that if either or both of the weighting and imputation models is correct, the estimate will be consistent. Note that both the weighting and imputation models allow for non-ignorable nonresponse and can produce quite different estimates from conventional weighting or imputation when there is evidence of non-ignorable nonresponse. Bailey (2024) analyzes this data using parametric and other methods, finding similar results.

Bailey (2024) implements the approach with an Ipsos poll of U.S. voters in 2019. The response instrument was created by randomly assigning potential respondents to high- and low-response protocols. In the low-response protocol, respondents were asked whether they want to discuss politics, sports, health, or movies. Only those that chose politics were retained in the respondent pool for the models discussed here. In the high-response protocol, people were asked the political questions in the standard way and, therefore, had a much higher likelihood of providing answers. The response instrument was strong—changing response rates by 60 percentage points—and thereby provided enough statistical power to estimate

As is typical in polls, the patterns varied by question and across party. Here I provide three examples to give a flavor of the results.

First, the survey asked people how likely they were to vote, giving them five response categories ranging from “absolutely certain to vote” to “will not vote.” In the raw data, 78% of respondents said they were certain to vote. When weighted using conventional weights, 75% of respondents said they were certain to vote. A doubly robust estimate based on weighting and imputation models that used a randomized response instrument to model potential non-ignorable nonresponse found strong evidence of non-ignorable nonresponse (

$p$ <0.01), suggesting a strong relationship between willingness to respond and expressing certainty about voting. Such a pattern occurs when the answers in the low- response group differ clearly from the answers in the high-response group, as they did in the case. This model then produced an estimate that 55% of people were certain to vote. Since it is not entirely clear how to map answers to a five-category question to actual turnout (which was 67%), it is a bit difficult to say for certain that accuracy increased. At a minimum, the raw and weighted results seem to overestimate turnout given that they indicated 75% or more people were certain to vote when 67% actually voted and that people in the other categories voted as well. The selection model results, in contrast, moderated the estimate and was consistent with the idea that raw and weighted data in surveys of turnout tend to overestimate turnout.Second, the poll asked people about support for President Trump. In the whole sample, the raw, conventionally weighted and non-ignorable nonresponse doubly robust models produced similar results. There was, however, interesting variation by region. Among whites in the Midwest—a group for whom polls have tended to underestimate Trump support—raw support for Trump was 45%, a number that fell to 43% when the data was weighted. In the selection model, in contrast, the parameter associated with non-ignorable nonresponse in the doubly robust model was unlikely to have arisen by chance (

$p$ < 0.05), which, in turn, led the selection model to estimate higher Trump support among whites in the Midwest (5%). Because the poll was conducted more than a year before the election, it is hard to gauge accuracy, but it is interesting to note the strong signal in the selection model that conventional polls were underestimating Trump’s support in the Midwest.Third, pollsters worry about

$\rho$ on sensitive questions as it may be the case that people with certain opinions on such matters are less likely to respond. For example, on race it is possible that social pressure may make people with more conservative views on race less likely to provide their opinions to a pollster. For example, on a question about whether it was appropriate for black athletes to kneel during the national anthem, the observed percent conservative among Democrats was about 17%, a number that fell slightly when conventionally weighted. However, when analyzed with the non-ignorable nonresponse doubly robust model, the estimated percent of Democrats with the more conservative answer rose to 33%, almost double the percent estimated by conventional weights.

Random sampling is dead. Weighting cannot revive it and the field risks losing coherence as it devolves into a mélange of pollsters using bespoke tools evaluated on past performance rather than common theoretically justified standards. It is time to update our paradigmatic foundations so that they encompass not only the random sampling or assumption-driven weighting methods of the past and present, but also the myriad methods in development that produce nonrandom samples.

Such a new paradigm is indeed available, one that builds on the Meng equation. It is quite general—general enough, in fact, to be used in ecology (Boyd et al., 2023), the mathematics of multidimensional integration (Hickernell, 2018), and particle physics (Courtoy et al., 2023). The equation characterizes sampling error for any poll, yet is specific enough to provide guidance about sources of this error. This new paradigm provides not only a common language that applies to contemporary polling, but also produces unfamiliar insights. Central to this new paradigm is the correlation between whether and how people respond. When this correlation is nonzero, it interacts with population size, meaning that for large populations, even a small correlation can devastate survey accuracy.

The new paradigm also points the field in a different direction than it is currently headed. Currently, most survey research relies on weighting-type tools that assume away the correlation between whether and how people respond, conditional on observable covariates with known population distributions. Such tools are useful, of course, but cover only a limited range of possible conditions, a limitation that is becoming more striking as the polling field moves further away from its random-sampling roots.

This article has provided an overview for the kind of work that naturally emerges in the new paradigm. The general theme is that any nonrandom sample needs to minimize, measure, and/or account for

Much work remains to be done as the selection models that measure and account for

With a paradigm that better applies to the contemporary polling environment, more of the field will be drawn to this important work, and they can build from a common foundation that more directly applies to the complicated polling environment of today.

I am grateful for helpful comments from Xiao-Li Meng, Jon Ladd, and anonymous reviewers. All errors are mine.

Michael Bailey has no financial or non-financial disclosures to share for this article.

Bailey, M. A. (2024). *Polling at a crossroads: Rethinking modern survey research*. Cambridge University Press.

Bailey, M. A. (2023, July 12). *Doubly robust estimation of non-ignorable non-response models of political survey data* [Paper presentation]. Fortieth annual meeting of the Society for Political Methodology at Stanford University, Stanford, CA, United States.

Boyd, R. J., Powney, G. D., & Pescott, O. L. (2023). We need to talk about nonprobability samples. *Trends in Ecology & Evolution*, *38*(6), 521–531. https://doi.org/10.1016/j.tree.2023.01.001

Bradley, V. C., Kuriwaki, S., Isakov, M., Sejdinovic, D., Meng, X.-L., & Flaxman, S. (2021). Unrepresentative big surveys significantly overestimated US vaccine uptake. *Nature*, *600*(7890), 695– 700. https://doi.org/10.1038/s41586-021-04198-4

Burger, R. P., & McLaren, Z. M. (2017). An econometric method for estimating population parameters from non-random samples: An application to clinical case finding. *Health Economics*, *26*(9), 1110–1122. https://doi.org/10.1002/hec.3547

Clinton, J., Agiesta, J., Brenan, M., Burge, C., Connelly, M., Edwards-Levy, A., Fraga, B., Guskin, E., Hillygus, D. S., Jackson, C., Jones, J., Keeter, S., Khanna, K., Lapinski, J., Saad, L., Shaw, D., Smith, A., Wilson, D., & Wlezien, C. (2021). *Task force on 2020 pre-election polling: An evaluation of the 2020 general election polls*. American Association for Public Opinion Research. https://aapor.org/wp-content/uploads/2022/11/AAPOR-Task-Force-on-2020-Pre-Election-Polling_Report-FNL.pdf

Cohn, N. (2022, October 12). Who in the world is still answering pollsters’ phone calls? *New York Times*. https://www.nytimes.com/2022/10/12/upshot/midterms-polling-phone-calls.html

Cohn, N. (2022, November 8). Are the Polls Still Missing ‘Hidden’ Republicans? Here’s What We’re Doing to Find Out? *New York Times*. https://www.nytimes.com/2022/11/08/upshot/poll-experiment-wisconsin-trump.html

Converse, J. M. (2009). *Survey research in the United States: Roots and emergence 1890-1960*. Transaction Publishers.

Courtoy, A., Huston, J., Nadolsky, P., Xie, K., Yan, M., & Yuan, C. (2023). Parton distributions need representative sampling. *Physical Review D*, *107*(3), Article 034008. https://doi.org/10.1103/PhysRevD.107.034008

Gelman, A. (2007). Struggles with survey weighting and regression modeling. *Statistical Science*, *22*(2), 153–164. https://doi.org/10.1214/088342306000000691

Gelman, A., & Hill, J. (2007). *Data analysis using regression and multilevel/hierarchical models*. Cambridge University Press.

Gomes, M., Radice, R., Brenes, J. C., & Marra, G. (2019). Copula selection models for non-Gaussian outcomes that are missing not at random. *Statistics in Medicine*, *38*(3), 480–496. https://doi.org/10.1002/sim.7988

Hickernell, F. J. (2018). The trio identity for quasi-Monte Carlo error. In A. B. Owen & P. W. Glynn (Eds.), *Monte Carlo and quasi-Monte Carlo methods*. Springer. https://doi.org/10.1007/978-3-319-91436-7_1

Jackman, S., & Spahn, B. (2019). Why does the American National Election Study overestimate voter turnout? *Political Analysis*, *27*(2), 193–207. https://doi.org/10.1017/pan.2018.36

Jacobson, G. C. (2022). *Explaining the shortfall of Trump voters in the 2020 pre- and post-election surveys* [Unpublished manuscript]. Department of Political Science, University of California, San Diego.

Jamieson, K. H., Lupia, A., Amaya, A., Brady, H. E., Bautista, R., Clinton, J. D., Dever, J. A., Dutwin, D., Goroff, D. L., Hillygus, D. S., Kennedy, C., Langer, G., Lapinski, J. S., Link, M., Philpot, T., Prewitt, K., Rivers, D., Vavreck, L., Wilson, D. C., & McNutt, M. K. (2023). Protecting the integrity of survey research. *PNAS Nexus*, *2*(3), Article pgad049. https://doi.org/10.1093/pnasnexus/pgad049

Kennedy, C., Blumenthal, M., Clement, S., Clinton, J., Durand, C., Franklin, C., McGeeney, K., Miringoff, L., Olson, K., Rivers, D., Saad, L., Witt, G. E., & Wlezien, C. (2018). An evaluation of the 2016 election polls in the United States: AAPOR task force report. *Public Opinion Quarterly*, *82*(1), 1–33. https://doi.org/10.1093/poq/nfx047

Kennedy, C., & Hartig, H. (2019). Response rates in telephone surveys have resumed their decline. *Pew Research Center*. https://www.pewresearch.org/short-reads/2019/02/27/response-rates-in-telephone-surveys-have-resumed-their-decline/

Kuhn, T. S. (1970). *The structure of scientific revolutions*. University of Chicago Press.

Little, R. J., & Rubin, D. B. (2014). *Statistical analysis with missing data*. Wiley.

Meng, X.-L. (2018). Statistical paradises and paradoxes in big data (1): Law of large populations, big data paradox, and the 2016 presidential election. *The Annals of Applied Statistics*, *12*(2), 685–726. https://doi.org/10.1214/18-AOAS1161SF

Meng, X.-L. (2021). *Data defect index: A unified quality metric for probabilistic sample and nonprobabilistic sample *[Presentation]. Harvard University.

Neyman, J. (1934). On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection. *Journal of the Royal Statistical Society*, *97*(4), 558–625. https://doi.org/10.2307/2342192

Nishimura, R. [@rnishimura]. (2023, March 29). *To add my reasons on why we should avoid using "representative sample": For the laymen, it sounds like a* [Tweet]. Twitter. https://twitter.com/rnishimura/status/1641126804030373888

Peress, M. (2010). Correcting for survey nonresponse using variable response propensity. *Journal of the American Statistical Association*, *105*(492), 1418–1430. https://doi.org/10.1198/jasa.2010.ap09485

Schoenmueller, V., Netzer, O., & Stahl, F. (2020). The polarity of online reviews: Prevalence, drivers and implications. *Journal of Marketing Research*, *57*(5), 853–877. https://doi.org/10.1177/0022243720941832

Shirani-Mehr, H., Rothschild, D., Goel, S., & Gelman, A. (2018). Disentangling bias and variance in election polls. *Journal of the American Statistical Association*, *113*(522), 607–614. https://doi.org/10.1080/01621459.2018.1448823

Silver, N. (2023, March 13). Fivethirtyeight’s pollster ratings. *FiveThirtyEight.com*. https://projects.fivethirtyeight.com/pollster-ratings/

Sun, B., Liu, L., Miao, W., Wirth, K., Robins, J., & Tchetgen-Tchetgen, E. J. (2018). Semiparametric estimation with data missing not at random using an instrumental variable. *Statistica Sinica*, *28*(4), 1965–1983. https://doi.org/10.5705%2Fss.202016.0324

Wang, W., Rothschild, D., Goel, S., & Gelman, A. (2015). Forecasting elections with non-representative polls. *International Journal of Forecasting*, *31*(3), 980–991. https://doi.org/10.1016/j.ijforecast.2014.06.001

Wu, C. (2023). *Statistical inference with non-probability survey samples* [Unpublished manuscript]. Department of Statistics, University of Waterloo.

YouGov. (2014). *Sampling and weighting methodology for the February 2014 Texas statewide study*. https://static.texastribune.org/media/documents/utttpoll-201402-methodology.pdf

Begin by rewriting the sample average in terms of

$\overline{Y}_n = \frac{\sum_{i=1}^NR_iY_i}{n} = \frac{\sum_{i=1}^NR_iY_i}{\sum_{i=1}^NR_i} = \frac{\frac{\sum_{i=1}^NR_iY_i}{N}}{\frac{\sum_{i=1}^NR_i}{N}} = \frac{\overline{RY}}{\overline{R}}$

where

The difference between the mean of

$\begin{aligned}
\overline{Y}_n -\overline{Y}_N & = & \frac{\overline{RY}}{\overline{R}} - \overline{Y}_N \\
& = & \frac{\overline{RY} - \overline{Y}_N\overline{R}}{\overline{R}} \\
& = & \frac{covar(R, Y)}{\overline{R}}
\end{aligned}$

where

Correlation (

$\begin{aligned}
\overline{Y}_n -\overline{Y}_N & = & \frac{\rho_{R, Y} \sigma_R\sigma_Y}{\overline{R}}\\
& = & \rho_{R, Y} \frac{\sigma_R}{\overline{R}}\sigma_Y
\end{aligned}$

Because

For the random contact case, we assume

$\begin{aligned}
\overline{Y}_c-\overline{Y}_N& = \frac{\sum_{i \subset C}R_iY_i}{\sum_{i \subset C}R_i} - \frac{\sum_{i=1}^{N}Y_i}{N}\\
& = \frac{\sum_{i \subset C}R_iY_i - \frac{\sum_{i \subset C}Y_i}{N_c}\sum_{i \subset C}R_i}{\sum_{i \subset C}R_i} \\
& = \frac{\overline{R_cY_c} - \overline{Y}_c\overline{R}_c}{\overline{R}_c} \\
& = \frac{covar(R, Y)}{p_r} \\
& = \rho_{R, Y}\frac{\sigma_R}{p_r}\sigma_Y
\end{aligned}$

Since

©2023 Michael Bailey. This article is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the article.

A Commentary on this Pub

The “Law of Large Populations” Does Not Herald a Paradigm Shift in Survey Sampling

A Commentary on this Pub

Is It Time for a New Paradigm in Biodiversity Monitoring? Lessons From Opinion Polling