A New Paradigm for Polling

. Scientific fields operate within paradigms that define problems and solutions for a community of researchers. The dominant paradigm in polling centers on random sampling, which is unfortunate because random sampling is, for all practical purposes, dead. The pollsters who try to produce random samples fail because hardly anyone responds. And more and more pollsters do not even try. The field therefore has folded weighting-type adjustments into the paradigm, but this too is unfortunate because weighting works only if we assume away important threats to sampling validity, threats that loom particularly large in the growing non-probability polling sector. This paper argues that the polling field needs to move to a more general paradigm built around the Meng (2018) equation that characterizes survey error for any sampling approach, including non-random samples. Moving to this new paradigm has two important benefits. First, this new paradigm elevates new insights, including the fact that survey error increases with population size when individuals’ decisions to respond are correlated with how they respond. This insight helps us understand how small sampling defects can metastasize into large survey errors. Second, the new paradigm points the field toward new methods that more directly identify and account for sampling defects in a non-random sampling environment. This paper describes the intuition and potential power of these new tools, tools that are further elaborated in Bailey (2023b).


Media Summary
Low response rates and low-cost internet polls have for all practical purposes killed the random sampling paradigm that built the public opinion field. This paper argues that the polling field needs to move to a more general paradigm built around the Meng (2018) equation that characterizes survey error for any sampling approach, including non-random samples. Moving to this new paradigm elevates new insights and points the field toward new methods that address more of the challenges of the contemporary polling environment. The article summarizes work that uses randomized response instruments that provide a systematic way to determine whether the people who respond to polls differ from those who do not, even after controlling for demographics. Such work has found that polls in the Midwest understated Trump support and overstated the liberalism of Democratic voters.

Toward a New Polling Paradigm
A scientific paradigm provides a model for articulating problems, solutions and future research directions for a community of practitioners (Kuhn, 1970). In polling, the main paradigm has long revolved around random sampling, a tool that provides an elegant way to make inferences about a large population based on information from a relatively small, randomly chosen subset of people.
Because it is incredibly difficult to randomly sample in the contemporary polling environment, most pollsters augment random sampling with weighting and related tools such as quota sampling and multi-level regression with post-stratification. These weighting-type adjustments make the non-random samples resulting from non-response look like they came from a random sample, but with a cost: the techniques require us to assume that the decision to respond is independent of the content of response once the weighting variables have been controlled for.
I argue in this paper that the weighting-augmented random sampling paradigm is ill-suited for the contemporary polling environment. First, the random sampling heart of the paradigm is hardly relevant today given low-response rates and non-probability samples. Nonetheless, polls are routinely "pollwashed" in ways that make them appear to have inherited the precision and distributional properties of random sampling even though they have not. Second, weighting-type adjustments bear the weight for fixing non-randomness in modern polling, but are built on assumptions that are quite restrictive, especially in the current environment in which respondents are often recruited via non-random mechanisms.
The field needs a better paradigm, one that moves beyond random sampling without relying on the strong assumptions involved in weighting. The simple decomposition of survey error provided by Meng (2018) provides the foundation for such a paradigm. Instead of reducing pollsters to explaining their work in terms of idealized and never-seen random samples, we can characterize survey error for any sampling approach, including non-random samples and samples that arise when survey response is related to survey content.
Shifting to a modern polling paradigm produces two important payoffs. First, the new paradigm provides intuition that is more relevant to current polling practice. A key element of the Meng Equation is a so-called data defect parameter that characterizes the degree to which whether someone responds is related to how someone responds. This parameter tends to get lost in the dominant polling paradigms: random sampling essentially minimizes it, while weighting-type methods assume it away. The Meng Equation makes clear that this parameter is centrally important and interacts with population size -and not, to be clear, sample size. Even a small data defect in sampling can create large survey errors when surveying large populations (Bradley et al., 2021). The Meng Equation also helps us appreciate why random contact is worthwhile even when response rates are low.
The second payoff of the new paradigm is that it helps us chart a path forward for research on survey methods. A great deal of survey research -including research on non-random samples (as described in Wu (2023)) -focuses on adjustments that assume there is no correlation between whether and how people respond after controlling for population-benchmarked variables. Given the critical role of data defects in the new paradigm, it is no longer tenable to focus so heavily on approaches that assume them away. Instead, the new paradigm points us toward tools that minimize, measure and counteract any relationships between whether and how people respond to surveys. Bailey (2023b) elaborates these benefits and provides additional context and tools.
To appreciate the challenges of the current paradigmatic ambiguity, consider two polls: one uses a probability-based sample by a respected newspaper with a one percent response rate. The other is a Twitter poll initiated by an unpredictable billionaire. Suppose they both have the same sample size and that demographic data is available so that the weighted results are "nationally representative." Most polling experts will have a strong preference for one of the polls, but random sampling provides little direct guidance, other than helping us appreciate that neither sample is random and hence both could be biased. Within the new paradigm, on the other hand, the Meng Equation enables us to clearly show why the newspaper poll is higher quality, as I discuss below.
The goal of this paper is to provide an overview of a new way of thinking about polling that is better suited to the contemporary polling environment than today's focus on weighting and other tools that assume ignorable non-response. Section 2 highlights what we already know: random sampling is a distant echo of polling as practiced. Section 3 presents the Meng Equation, focusing on its distinctive intuition. Section 4 shows how a paradigm built around the Meng Equation naturally points to new research agendas, providing two examples in which approaches motivated by the new paradigm are able to address important survey challenges.

Paradigm Lost
Modern polling began with a common-sense but not deeply theorized paradigm of more-is-better (Bailey, 2023b;Converse, 2009). The exemplar of this approach was the Literary Digest, a magazine that sent millions of surveys to voters before presidential elections. They had a decent track record until 1936 when their polls infamously indicated that Republican Alf Landon would win in a landslide. He lost in a landslide, discrediting the early big-data approach to polling. Quota samplers such as George Gallup filled the void, showing how relatively small representative samples were more accurate. They did well until 1948 when they predicted Republican Thomas Dewey would defeat President Harry Truman. Truman won, famously hoisted the "Dewey Defeats Truman" newspaper and sent the polling community scrambling for a more robust paradigm.
Random sampling filled the gap (Neyman, 1934). Using standard statistical theory, one could characterize the statistical properties of the mean of a random sample in ways that enabled accurate and systematic reasoning about population attributes from samples in the hundreds or low thousands. Remarkably, the accuracy of random sampling depends on the sample size not the population size. Fortuitously, widespread adoption of telephones made random sampling cheap to implement.
The theory assumes that everyone randomly contacted for a survey responds. This was never true, but response rates were high and the connection between response and political views was attenuated enough that random sampling provided a decent approximation to guide political polling.
Over the last several decades the relevance of random sampling theory has declined, largely due to accelerating levels of non-response. In the late 1990s, 60 percent of those contacted for political polls did not respond; today that number is often 95 percent or higher (Cohn, 2022a;Kennedy & Hartig, 2019).
The first problem that low response creates is that it attenuates -and probably breaks -the connection between survey theory and practice. No one thinks that the one percent of people who respond when contacted are truly a random sample of the population. The field therefore accommodated large-scale non-response by augmenting random sampling with weighting. Weighting involves placing more weight on respondents from groups who are underrepresented in a sample relative to their population proportion and placing less weight on respondents from groups who are overrepresented relative to their population proportion. Weighting requires identifying variables that affect response and the attribute being surveyed from among those variables for which the pollster knows the totals in the population. Typically, these variables are demographic variables such as age, race, gender, income, region and education.
The shift from the random sampling paradigm to the random sampling-plus-weighting paradigm is so pervasive that it is unremarkable to many pollsters, even as they acknowledge the many decisions that must be made when weighting data (Gelman, 2007). Weighting is not costless, however, as it requires pollsters to assume that non-response is ignorable, meaning that the decision to respond is independent of the content of response once we have controlled for the weighting variables. This assumption implies that the people who choose to reply are representative samples given the covariates used in the weighting. A violation of this assumption means that non-response is non-ignorable, meaning that even after weighting, poll respondents differ from non-respondents.
The assumption that the response mechanism is ignorable is also referred to as a mechanism that produces data that is "missing at random." Little and Rubin (2014, p.22) note that virtually every approach to dealing with missing data makes this strong assumption. This list includes mutlti-level regression with post-stratification (so called "MrP" models) (Gelman & Hill, 2007) and nearest neighbor imputation (YouGov, 2014).
Non-ignorable non-response is concerning in many contexts.
• • • • Virtually every post-mortem of the 2016 and 2020 U.S. presidential elections raised the possibility that weighting failed to properly adjust for the possibility that voters favoring Trump were less likely to respond, especially in the Midwest (see, e.g., Clinton et al. (2021) and Kennedy et al. (2018)). Surveys of voting typically overestimate turnout, likely in part due to non-ignorable nonresponse (Jackman & Spahn, 2019). Bradley et al. (2021) provide evidence that the type of people who get vaccinated are more likely to respond to some polls (especially ones based on non-random samples) even after controlling for demographics. In marketing, evidence suggests that people's willingness to provide product feedback depends on their experience with the product (Schoenmueller et al., 2020).
Low response rates have created another problem that has been harder to ignore: rising costs. It is now very expensive to field probability-based polls because pollsters need to wade through tens of non-respondents before they reach a single respondent, leading some to doubt the viability of the approach (Cohn, 2022a). An increasing number of pollsters therefore have moved to non-probability samples that are created by finding people willing to answer polls via ads, outreach to mailing lists and other, often opaque and sometimes novel, methods (Clinton et al., 2021;Wang et al., 2015). Pollsters use weighting-type adjustments to produce samples that are representative with regard to demographic benchmarks.
While true random sampling produces estimates with clear measures of quality, the field has struggled to operationalize quality in the post-random sampling world. Some pollsters anachronistically use the language of random sampling to imply that their polls have the properties of a random sample, a process I call "pollwashing." One way to do this is to report margins of error even though the theoretical basis of a margin of error is undone by non-response (and especially massive non-response, to say nothing of a non-random sample) (Shirani-Mehr et al., 2018).
Another tool for pollwashing is to for pollsters to claim their samples are "nationally representative" (Jamieson et al., 2023). In random sampling, a sample is probabilistically representative of a target population in expectation. In weighting, a sample can be made to share certain distributional characteristics with the population for variables used in the weighting protocol. This provides the survey with an aura of accuracy even for polls that have at best a modest claim at being truly representative in the way that an actual random sample would be. It is easy to see how this concept can stretch the concept of representativeness to the breaking point. Consider for example an opt-in

Population average
At this point, we are not doing any statistical modeling; we are simply calculating the difference between the average value of Y for people with R = 1 and the average value of Y for the entire sample. Following the simple steps described in Meng (2018) and the appendix, we can re-write sampling error in a way that decomposes it into three conceptually interesting quantities. I present the case with no covariates but the logic carries over when there are covariates. (One may wish to consider the equation as applied within weighting demographics, for example.) internet poll on a candidate's website. The data could be weighted to be nationally representative with respect to demographics, but no serious pollster would consider the sample representative in the sense -that a true random sample would be. University of Michigan polling expert Raphael Nishimura summed it up nicely: "For the laymen, [representative sample] sounds like a well-defined technical sampling term, but it's not. This is just as vague and meaningless as saying that a sample is 'robust', 'statistically valid' or 'awesome' " (Nishimura, 2023).
Pollwashing extends even to sample size. In a random sample, the survey average converges to the population average as the sample size increases, making sample size a useful metric for precision. In non-random samples, however, large samples guarantee little. We've known this since the 1936 Literary Digest fiasco, yet surveys continue to report sample sizes for their "nationally representative" samples with the implication that more is better. When samples are non-random, however, our intuition that more is better -one of the core insights for random sampling -fails. Bradley et al. (2021) and others have shown that if response is correlated with opinion the sample size can be wildly unreflective of the amount of information in a sample. I address this point below, as well.
Given the lack of a clear measure for assessing polls, some in the field use predictive accuracy as measure of polling quality (Silver, 2021). The danger with this approach is that if surveys have a systematic error, then a tool that counteracts that bias -however crudely -will do well. Survey firms with Republican bias were relatively accurate in 2016 and 2020. Were their methods better? Or were they biased in a fortuitous way for those elections? Many of these same firms performed poorly in 2018 and 2022, suggesting limits to polling accuracy as a measure of quality. Perhaps with enough time and a stable polling environment, track records may prove meaningful, but rather than waiting for polling methods to be exposed in an election, a better aspiration is to have a paradigmatic set of standards against which to judge polling methods.

Paradigm Found
What remains once we have ruled out metrics of survey quality such as demographic representativeness or large sample sizes or predictive accuracy? In this section I articulate a framework that answers this question. The framework builds on Meng (2018)'s surprisingly simple and completely general characterization of survey error. It helps us contextualize survey error across protocols and points to shared standards and future research directions.
The framework is built on a simple model of the sample mean of a variable of interest, Y , from a sample of n i.i.d. observations drawn from a population of size N . (The logic extends to other statistical quantities such as regression coefficients.) I denote the observed sample mean among respondents as Y n where the lower case n subscript indicates the number of people in the sample (i.e., people for whom R = 1). The difference between the mean of Y in the R = 1 group and the entire population is The first term on the right-hand side of the Meng Equation is ρ R,Y , the correlation in the population between R and Y . This quantity can be taken to reflect quality of data with regard to sampling. Bradley et al. (2021) refer to this quantity as the "data defect correlation" (sometimes referred to as the "confounding correlation"). The higher this correlation, the more response is correlated with outcome. When ρ R,Y = 0 the response mechanism is ignorable; when ρ R,Y ̸ = 0, respondents have systematically different values of Y than non-respondents.
Because the Meng Equation is an accounting identity, we know that if ρ R,Y = 0, then the mean of the sample will literally equal the mean of the population. This fact points to central insight of random sampling: if R is based on a truly random process then ρ R,Y will be expected to be quite close to zero. Remember, though, that the Meng Equation is an accounting identity so even when a sample is randomly chosen, it is unlikely that the correlation of R and Y will literally equal zero, hence the sample mean will generally not equal the population mean. Meng shows that as long as the data defect correlation is on the order of √ 1 N (as it is with random sampling), then the response mechanism can be treated as ignorable. the population (capital N ) and the size of the sample (lower case n). Describing survey quality in terms of both N and n runs strongly counter to the intuitions of random sampling, but is crucial to understanding non-random samples. I explore this term in detail momentarily.
The final term on the right-hand side of the Meng Equation is σ Y , the square root of the variance of Y . Meng (2018) refers to this quantity as data difficulty in the sense that errors will be smaller if Y varies only a little in the population. In an extreme case, Y is the same for everyone in a population, which would mean σ Y = 0 and the sample mean will equal the population mean. Generally, this source of polling error is taken as a given for any given survey item.
The Meng Equation is very general, relatively simple and remarkably insightful. It motivates intuitions that provide a more robust starting point for thinking about modern polling than random sampling. Here I focus on three important insights that are largely absent in the weightingaugmented random sampling paradigm but clear in the new paradigm.
The importance of ρ. Survey error is the product of three terms, meaning that we need to think of survey error as a combination of factors. If any one of the terms is zero, then survey error is zero, whatever the value of the other terms. The data quantity term is zero only if the sample (n) is equal to the population (N ) and the data difficulty term is zero only if Y does not vary at in the population, neither of which is plausible in most polling contexts.
The only term that we can realistically drive toward zero with survey methods is ρ R,Y . Random sampling does this in expectation via randomization. Weighting lowers ρ R,Y by conditioning on covariates that are observed in the sample for which we have known population level information. The easiest way to conceptualize this is to consider a case of cell weighting in which the population is broken into cells based on demographics. A single cell may contain college-educated Hispanic women over 65 years old, for example. The Meng Equation applies to the estimates within each cell. It is possible that not accounting for education may induce a non-zero data defect correlation in the overall population because, for example, people with more education may respond at higher rates. Within each cell, however, it could be that there is no systemic difference between those who respond and those who do not. In this case, conditioning on covariates enables low-error sampling estimates The second term on the right-hand side of the Meng Equation is . It relates to the size of A New Paradigm for Polling 8 within cells and, because we know population proportions for each cell, an analyst using weighting can combine the estimates proportionately to produce a low-bias population estimate. Even as weighting can reduce sampling bias it needs to make a strong assumption that the correlation of response and outcome is small in magnitude within cells, something that is not true if respondents differ from non-respondents conditional on covariates.
Neither of these methods are satisfying in the contemporary context in which non-response renders random samples virtually impossible and in which we would rather not solve problems by assuming them away. As seen in Section 2, there there are many plausible scenarios in which response and outcome are correlated even after controlling for observable covariates with known population proportions.
Population size matters. The data quantity term in the middle of the right-hand size of the Meng Equation is a function of sample size n and population size N . That sample size matters fits comfortably with our random sampling inflected intuition: as long as ρ R,Y ̸ = 0 and σ Y ̸ = 0, survey error will decline as the sample size n increases.
Notice that the sample size is doing something quite different than it does in random sampling. The expected mean from a random sample is the true value no matter what the sample size is. The power of a larger sample in random sampling is to reduce the sampling variance of the mean. In the Meng Equation, a larger sample is associated with smaller error. Roughly speaking, the Meng Equation says that when ρ R,Y ̸ = 0, sampling error diminishes as the sample size gets larger.
Sampling error also depends on the size of the population, N . One of the incredible properties of random sampling is that it decouples the size of the population from the properties of the estimator. A (truly) random sample of 1,000, for example, will be equally accurate in expectation for a given data difficulty for any target population, be it a small state in the U.S. or the entire country of India.
In non-random samples, however, population size matters. Figure 1 based on Bailey (2023b) displays samples of 20 from a relatively large and a relatively small population. Each square is a person. On the x-axis is R * , the latent propensity to respond for each person. We observe a response R = 1 if R * > k which varies across the two panels. The key point for the intuition is that higher latent propensities to respond are associated with higher probabilities of response. On the y-axis is a feeling thermometer rating of, for example, President Biden; this is Y , the survey response of interest. The upward tilt of the shape in Figure 1 suggests that ρ R,Y > 0, meaning that the people with higher propensities to respond have higher ratings of Biden.
The blue squares are respondents. In the large population panel on the left, there are 328 people, 20 of whom respond (about 6 percent). These respondents are quite unrepresentative. Every one of them rates Biden above 40 and their average rating is 68, which is much higher than the population average of 40.
In the small population panel on the right, there are 40 people. As with the panel on the left, the sample size is 20. The respondents are also unrepresentative, but the magnitude of the unrepresentativeness is much smaller because the pollster had to go deeper into the pool to get 20 responses. This means that less unrepresentative people made their way into the sample, leading us to see ratings of Biden values as low as 25 and an average rating of Biden among respondents of 50, which is higher than the population average of 40, but not as far off as for the large population example. 1 In other words, the example shows how a sample of size n will produce smaller error from a smaller population when the data defect correlation is not zero.
Random contact. Building from the Meng Equation we can articulate a third important insight for contemporary polling (Bailey, 2023b). First, let us distinguish • • Random sample: What random sampling theory is built on, but is not delivered by probability-based polling. Random contact: What probability-based polling actually does. Pollsters using probabilitybased polls randomly contact people who may or may not respond.
Given that random contact is very expensive and nonetheless produces non-random samples, it is easy to sympathize with pollsters who have given up on random contact. However, random contact does important work even with very low-response rates, an intuition that is hard to see in the current random sampling-plus-weighting paradigm.
To show this, I first show graphically how randomly choosing whom to contact unlinks the connection between sampling error and population size. After that, I use Meng's Equation to reconsider how non-ignorable non-response affects error when contact (but not response) is randomized.
1 The pattern illustrated in Figure 1 actually underestimates the effect of sample size. The quantities plotted are the latent response propensities (on the horizontal axis) and outcome. The correlation of these quantities in each panel of Figure 1 is visibly quite similar: in the left panel, the correlation R * and Y is 0.77 and in the right panel the correlation of R * and Y is 0.74. The Meng Equation is based on the correlation of R (not R * ) and Y , which depends not only on the correlation of R * and Y but also on the response rate. (To get a sense of why response rate also matters note that if response rate is 0 or 100 percent, the correlation of R and Y must be zero even if there is a large correlation between R * and Y because there will be no variation in R.) Despite similar relationships between latent response propensity and Y , the relationship between R and Y differs across the two panels. In the left panel of Figure 1, ρR,Y = 0.36 in the right panel ρR,Y = 0.67. In other words, the data defect correlation is substantially higher in the right panel. Even with a higher data defect correlation, we see how a smaller population size induces less sampling error in the panel on the right.  Figure 1 that a sample of 20 respondents will produce a highly skewed sample, with an average Y of 68, which is far from the population average of Y which is 40. Each box still represents a person with their value of Y (e.g., a feeling thermometer for a politician) on the y-axis and their propensity to respond on the x-axis. The filled in grey boxes are randomly selected individuals contacted by the pollster. The open boxes are people the pollster does not contact.
Random contact does not imply that those who respond are a random sample. After all, people choose to pick up the phone or respond to an email and this process can be influenced by many non-random factors, including factors correlated with Y , the feature we are trying to estimate in the population. The panel on the right of Figure 2 shows who responds among those randomly contacted. This sample continues to be unrepresentative.
Even though the sample is skewed, random contact has done something very important. The sample of 20 respondents from the random contact survey is not as unrepresentative as the sample of 20 respondents from the large population panel of Figure 1. We no longer get the n most responsive people in the whole population (which is wildly unrepresentative for a large population), but instead hear from the n most responsive people in a smaller representative sample. The respondents in the random contact case depicted in the right panel of Figure 2 have an average value of Y of 56, which is larger than the population average, but not as bad as the sample average of 68 that emerged from the no random contact case depicted in the left side panel of Figure 1. The random contact converted the large population into a smaller one population.
In other words, while random contact does not eliminate error associated with a positive value of ρ R,Y , it de-couples sampling error from population size. In terms of the equation, @Meng2021 and @Bailey2023 show that survey error in a random contact survey is where p r is response rate (see the appendix for the derivation). The crucial difference from Equation 1 is that the data quantity term depends on the response rate, p r , and *not* on population size N . Since populations can be very large, this is very useful (although identifying the correct target population for the random contact is a challenge, see, e.g., @JackmanSpahn2021).
How Current Practice Fits into the New Paradigm. One of the nice features of the new paradigm based on the Meng Equation is that it is general enough to encompass the multiple approaches to surveys that dominate the field. We've already seen that random sampling is a mechanism that drives ρ R,Y toward zero in expectation. It continues to be an amazing tool, but is a special case in a more general framework.
Weighting approaches succeed if ρ R,Y goes to zero within each demographic subgroup (or, more precisely, conditional on observed covariates with known population distributions). Weighting protocols then essentially patch together subgroup estimates with good properties via the population proportions of weighting groups.
The new paradigm also makes it harder to ignore potential weaknesses of weighting. Most polls are reported with no effort to measure ρ R,Y , meaning that the results are predicated on a leap of faith that ρ R,Y will be close to zero conditional on covariates. Given that ρ R,Y interacts with population size, this can be quite a leap: even a small non-zero value of ρ R,Y can lead things to go wrong very quickly, something that is hard to see from within the random sampling paradigm and which is particularly concerning when samples are not generated via random contact.

The New Paradigm in Practice
Equation 1 helps us appreciate the scale of the sampling problem we face in a post-random sampling world. It does not, however, provide specific guidance on estimating ρ R,Y and associated standard errors needed for inference.
While some believe that there is little to be done to measure or undo non-zero ρ R,Y , there is in fact a vibrant and growing literature that models, measures and counters ρ R,Y . These models cannot avoid relying on assumptions, of course, but they do not require response to be ignorable and they can produce uncertainty estimates that allow us to rule non-ignorability in or out in many reasonable data contexts.
In this section I describe two such approaches. Both rely on response instruments, which are variables that affect the probability of response but do not directly affect the outcome of interest. @SunEtal2018 show a broad class of weighting, imputation and doubly robust models can work if a response instrument is available. @Bailey2023 shows examples of how even parametric models that do not literally require response instruments tend to do perform much better when a response instrument is available.
The first example uses an observational response instrument, which is convenient but which suffers from the usual concerns the literature has about observational instruments related to whether they actually have no effect on Y . The second example uses a randomized response instrument, which is easier to defend on theoretical grounds. In some circumstances, randomized instruments are practically difficult to implement, but we shall see that they are not that difficult to create in a survey context. For both observational and randomized contexts, the response instruments must affect response and, as is typical in instrumental variable type approaches, statistical power rises as the magnitude of the effect of the instrument on response rises.
The intuition behind response rates is straightforward. If response is ignorable conditional on covariates, then the expected value of Y should be independent of response propensity conditional on covariates for an individual and, hence, across a number of i.i.d. draws. However, if response on many polls indicated that Biden led by 8.3 percentage points. Jacobson (2022) reports that the average lead in across a selection of large, high quality polls (the ANES, the Cooperative Election Study, Nationscape and Pew) was 17.9 percentage points unweighted and 14.7 percentage points when weighted.
3 Controlling for demographics does not change the patterns reported in the figure. I am grateful to Leonie Huddy for suggesting this example.

Figure 3. Presidential preferences in 2020 ANES survey, by interest in politics
is non-ignorable, the expected value of Y differs conditional on covariates for across those with high and low response propensities. Hence if we are able to observe observations from high and low response contexts, we can assess whether Y differs and infer whether we are looking at data produced by an ignorable or non-ignorable response mechanism. I provide a graphical illustration of this intuition when I discuss randomized response instruments below. @SunEtal2018 provide a formal proof of the conditions under which population quantities are statistically identified when one has a response instrument.
Example 1: Using Observational Data to Account for ρ. Like many polls, 2020 American National Election Study (ANES) overestimated Biden's support. Biden won the popular vote by 4.4 percentage points, but the 2020 ANES pre-election poll reported that Biden led Trump by 11.8 percentage points. Weighting did not help as Biden's margin was 12.6 percentage points when responses were weighted. 2 Signs that ρ R,Y was not zero were hiding in plain view. Following Bailey (2023b), Figure 3 displays presidential preferences in ANES data by political interest. Support for Biden and Trump was essentially equal among those respondents "not at all" or "not very" interested in politics. Among those "very interested" in politics, however, there was a huge gap as 61.9 percent of such respondents supported Biden. 3 If people who are more interested in politics are more likely to answer a poll about politicswhich hardly seems unreasonable -then the ANES may have too many people interested in politics and thereby produced a sample that was more pro-Biden than the population. While it seems natural to model likely lower support for Biden among non-respondents, pollsters did not do so because weighting-type adjustments are not feasible for a variable like political interest, which does not have a known population level distribution.
Grounding our thinking about polling in the Meng Equation, however, it becomes harder to dismiss the possibility that ρ R,Y ̸ = 0 due to a non-weighting factor that affected both whether and how people responded to a poll. Rather than shrug our shoulders and say that polling is hard, we can model response and outcome in a way that allows for ρ R.Y ̸ = 0. Peress (2010), for example, did this when he modeled survey measures of turnout in the 1980s. As is often the case, surveys at that time overestimated turnout: even though only 50 percent of adults turned out to vote at that time, 70 percent of ANES respondents voted. 4 ANES turnout declined in the weighted data to around 60 percent. Peress incorporated information about response-interest, information that is akin to the political interest variable discussed plotted above, and was able to bring estimates to within 1 percent of actual turnout in 1980 and 1988 and within 2 percent in 1984.
Directly modeling and estimating ρ R,Y was the crucial element that powered the Peress model. His model, like others in this spirit, jointly modeled R -the decision to respond -and Y -the content of response. He linked the two equations via a ρ parameter capturing the degree to which unmeasured factors affected both R and Y . The model was identified by including a variable in the R equation that was not included in the Y equation.
Using such models requires new thinking and, of course, does not eliminate the necessity of assumptions. However, instead of assuming that non-respondents have the same political interest as respondents -as required in standard weighting-type adjustments -these models allow us to incorporate information in the data that suggest non-respondents differ from respondents in important ways. In recent years, such models have made use of advances in copulas (Gomes et al., 2019), moment estimators (Burger & McLaren, 2017;Sun et al., 2018) and other approaches in ways that allow them to be more robust to distributional assumptions and other concerns.
Example 2: Randomized Response Instruments. The challenge with observational models is that is is often hard to definitively defend the assumption that one or more variables affect R but not Y . We are not helpless in the face of these concerns however, as we can use the power of randomization to create variables that affect R but not Y by design. Specifically, we can create randomized response instruments that reflect treatments that affect whether or not someone responds. There are many ways to do this. A pollster can, for example, randomly divide potential respondents into two pools and provide one group incentives to response. Cohn (2022b) did this by offering money to the treatment group, thereby lifting the response rate by 30 percentage points relative to the control group that was contacted with conventional incentive-less protocols. I desribe another approach below.
There are several attractive features of creating randomized response instruments. First, doing so builds on our long-standing inclination to use sampling design to solve sampling problems. After all, in random sampling the design of the survey has long been accepted as a better way to solve sampling error than by increasing sampling size in a non-random way. Second, this approach is simple to implement as the pollster need only identify a protocol that affects survey response, a task that is familiar among pollsters who have long explored how to increase response rates. Figure 4 based on Bailey (2023b) illustrates the logic of randomized response instruments. The purpose of the figure is to highlight how and why such instruments can allow us to assess whether the response mechanism is ignorable or not. The configuration of population values is reasonable, but  Bailey (2023b), which includes tools such as control functions, copulas and specification searches that can enable modern non-ignorable non-response oriented tools to work under a broad range of circumstances. These tools do not work under all circumstance, though. Sun et al. (2018) provides a formal treatment of identification and Bailey (2023b) provides a practical discussion of how to deal with threats to validity in these models.
As with Figures 1 and 2, the panels in Figure 4 plot the response interest and values of Y of a hypothetical population. The blue dots indicate people who respond the the survey. In the panels on the left, the shapes tilt as in Figures 1 and 2, suggesting that ρ > 0 because people interested in responding tend to have higher values of Y . In the panels on the right, the joint distributions create shapes that are flat, which suggest that ρ = 0 because there is no relationship between interest in responding and Y . The top panels show instances which the response rate is around seven percent. The panels on the bottom show instances in which the response rate is around 38 percent.
When ρ > 0 as in the panels on the left, Y varies depending on the response rate. In the panel on the top left, Y = 68, which is the same as we saw in the left panel of Figure 1. In the panel on the bottom left, Y = 57. When ρ = 0 as in the panels on the right, however, Y = 40 in both the low and high response surveys. In other words, when response is ignorable, the estimates will be the true value in expectation, whatever the sample size (even as the precision varies based on sample size).
The figure shows how variation in Y associated with response rate reflects information about ρ. The randomized response instrument induces variation in response rate (something that is easily verifiable), meaning that conceptually at least, it is easy to assess whether or not ρ = 0. If Y is the same in treatment and control groups, we have evidence that ρ = 0. If Y differs across treatment and control groups, we have evidence that ρ ̸ = 0. In other words, even though ρ is often characterized in terms of unobserved variables, it is not the case that it never leaves a trace. As response rates vary, observed patterns in Y will reflect ρ. While the logic is straightforward, estimation requires models such as those described in Bailey (2023b). These models range from widely known Heckman models, to copula models to methods of moments estimators. Bailey (2023b) argues that the quality of data often dominates the choice of model, meaning that the key step is typically creating a good randomized response instrument; with that in hand, the models tend to produce similar results.
For example, Bailey (2023a) presents doubly robust estimates that use a randomized response instrument to create estimates that are robust to non-ignorable non-response. The double robustness comes from incorporating both weighting and imputation based approaches in a way that if either or both of the weighting and imputation models is correct, the estimate will be consistent. Note that both the weighting and imputation models allow for non-ignorable non-response and can produce quite different estimates from conventional weighting or imputation when there is evidence of non-ignorable non-response. Bailey (2023b) analyzes this data using parametric and other methods, finding similar results.
Bailey (2023b) implements the approach with an Ipsos poll of U.S. voters in 2019. The response instrument was created by randomly assigning potential respondents to high and low response protocols. In the low response protocol, respondents were asked whether they want to discuss politics, sports, health or movies. Only those that chose politics were retained in the respondent pool for the models discussed here. In the high response protocol, people were asked the political questions in the standard way and, therefore, had a much higher likelihood of providing answers. The response instrument was strong -changing response rates by 60 percentage points -and thereby provided enough statistical power to estimate ρ and, in turn, to create population estimates that were purged of malign effects of ρ. This protocol is feasible in many survey contexts; other randomized response instruments such as Cohn (2022b) have produced large effects on response as well. If one is not able to design a randomized response instrument with large effects on response the methods described here will likely be underpowered.
As is typical in polls, the patterns varied by question and across party. Here I provide three examples to give a flavor of the results.
• First, the survey asked people how likely they were to vote, giving them five response categories ranging from "absolutely certain to vote" to "will not vote." In the raw data, 78 percent of respondents said they were certain to vote. When weighted using conventional weights, 75 percent of respondents said they were certain to vote. A doubly robust estimate based on weighting and imputation models that used a randomized response instrument to model potential non-ignorable non-response found strong evidence of non-ignorable nonresponse (p <0.01), suggesting a strong relationship between willingness to respond and expressing certainty about voting. Such a pattern occurs when the answers in the low response group differ clearly from the answers in the high response group, as they did in the case. This model then produced an estimate that 55 percent of people were certain to vote. Since it is not entirely clear how to map answers to a 5 category question to actual turnout (which was 67 percent), it is a bit difficult to say for certain that accuracy increased. At a minimum, the raw and weighted results seem to overestimate turnout given that they indicated 75 percent or more people were certain to vote when 67 percent actually voted and that people in the other categories voted as well. The selection model results, in contrast, moderated the estimate and was consistent with the idea that raw and weighted data in surveys of turnout tend to overestimate turnout.

• •
Second, the poll asked people about support for President Trump. In the whole sample, the raw, conventionally weighted and non-ignorable non-response doubly robust models produced similar results. There was, however, interesting variation by region. Among whites in the Midwest -a group for whom polls have tended to underestimate Trump support -raw support for Trump was 45 percent, a number that fell to 43 percent when the data was weighted. In the selection model, in contrast, the parameter associated with non-ignorable non-response in the doubly robust model was unlikely to have arisen by chance (p < 0.05), which, in turn, led the selection model to estimate higher Trump support among whites in the Midwest (5 percent). Because the poll was conducted more than a year before the election, it is hard to gauge accuracy, but it is interesting to note the strong signal in the selection model that conventional polls were underestimating Trump's support in the Midwest.
Third, pollsters worry about ρ on sensitive questions as it may be the case that people with certain opinions on such matters are less likely to respond. For example, on race it is possible that social pressure may make people with more conservative views on race less likely to provide their opinions to a pollster. For example on a question about whether it was appropriate for Black athletes to kneel during the national anthem, the observed percent conservative among Democrats was about 17 percent, a number that fell slightly when conventionally weighted. However, when analyzed with the non-ignorable non-response doubly robust model, the estimated percent of Democrats with the more conservative answer rose to 33 percent, almost double the percent estimated by conventional weights.

The Future of Polling
Random sampling is dead. Weighting cannot revive it and the field risks losing coherence as it devolves into a mélange of pollsters using bespoke tools evaluated on past performance rather than common theoretically-justified standards. It is time to update our paradigmatic foundations so that they encompass not only the random sampling or assumption-driven weighting methods of the past and present, but also the myriad of methods in development that produce non-random samples.
Such a new paradigm is indeed available, one that builds on the Meng Equation. It is quite general -general enough, in fact, to be used in ecology (Boyd et al., 2023), the mathematics of multidimensional integration (Hickernell, 2018) and particle physics (Courtoy et al., 2023). The equation characterizes sampling error for any poll, yet specific enough to provide guidance about sources of this error. This new paradigm provides not only a common language that applies to contemporary polling, but also produces unfamiliar insights. Central to this new paradigm is the correlation between whether and how people respond. When this correlation is non-zero it interacts with population size meaning that for large populations even a small correlation can devastate survey accuracy.
The new paradigm also points the field in a different direction than it is currently headed. Currently, most survey research relies on weighting-type tools that assume away the correlation between whether and how people respond, conditional on observable covariates with known population distributions. Such tools are useful, of course, but cover only a limited range of possible conditions, a limitation that is becoming more striking as the polling field moves further away from its random sampling roots. This paper has provided an overview for the kind of work that naturally emerges in the new paradigm. The general theme is that any non-random sample needs to minimize, measure and/or where RY and R are the population averages of R × Y and R, respectively. The difference between the mean of Y in the R = 1 group and the entire population is Y n − Y N : is the covariance of R and Y . Correlation (ρ) is the covariance divided by the product of the standard deviations of the two variables (σ R and σ Y , respectively); hence covar(R, Y ) = ρ R,Y σ R σ Y where ρ R,Y is the population correlation of R and Y . Substituting for covar(R, Y ) yields In addition, R = n N . Substituting for σ R and R and doing some algebra yields the Meng Equation.
account for ρ. I showed examples that do this with observational data and, even better, with randomized response instruments. To take non-ignorable non-response seriously does not mean that we expect to find it everywhere. Indeed, Bailey (2023b) provides examples in which surveys designed and analyzed to address non-ignorable non-response find no evidence that whether and how people respond is correlated. For some survey questions and subgroups, however, these new tools produce estimates that differ importantly from weighted results. As summarized here and elaborated in Bailey (2023b), using randomized response instruments and tools that allow for nonignorable non-response leads to different and arguably better estimates of turnout, Trump support and racial conservatism.
Much work remains to be done as the selection models that measure and account for ρ involve new survey designs and analytical tools. Some may find these models unfamiliar or complicated, but we are long past the time for wishing for a simple solution to contemporary survey non-response; after all, modern weighting is quite complex and only works by assuming away much of the non-response problem. And the fast-growing non-probability polling industry uses complicated and often opaque protocols.
With a paradigm that better applies to the contemporary polling environment, more of the field will be drawn to this important work and they can build from a common foundation that more directly applies to the complicated polling environment of today.
Disclosure Statement. The author has no conflicts of interest to declare.
Derivation of sampling error in random contact case. For the random contact case, we assume N i=1 Yi N = i⊂C Yi Nc where C is the set of people contacted and N c is the number of people contacted. The probability someone responds given that they were contacted is p r = i⊂C Ri Since R i is binary, its standard deviation is σ R = p r (1 − p r ). A bit of algebra yields Equation 2.