Suppose I collect an online survey that shows, after weighting, that 45% of the sample would vote for Donald Trump in the 2024 U.S. presidential election. I incentivized these respondents by paying them $1 per response. Is my survey underestimating or overestimating the Trump vote? Now suppose, in another poll, I incentivize responses by paying respondents $10, while keeping the rest of the poll and weighting method identical. In this new poll, Trumpʼs support is 42%. What further information does this reveal about the potential selection bias?
Professor Bailey (2023a, in press) introduces this sort of instructive example, and leaves us with an intuition to what our answer should be: The $10 survey is likely reaching harder-to-reach (i.e., lower response propensity) voters, and evidently, they are less likely to be Trump voters. In other words, there is a positive correlation with response propensity and voting for Trump, which implies that my survey is likely overestimating the true proportion of Trump voters in the electorate. This type of exercise is not commonly mentioned in recent survey modeling guidance (e.g., Caughey et al., 2020). It is indeed an example that goes beyond the “random sampling or assumption-driven weighting methods of the past and present” (Bailey, 2023a).
If the randomized response instrument works, pollsters can sign the nonresponse bias without needing to, for instance, wait until the election occurs (Meng, 2018), assume that the error structure is the same as in the previous election (Isakov & Kuriwaki, 2020), or only study cases where a population ground truth is available (Bradley et al., 2021). But, as Professor Bailey’s work is careful to note, these refreshing tools are not entirely new, and come instead from tools in causal inference that have grappled with unobserved omitted variable bias (Angrist et al., 1996). Such tools give us leverage in the face of unobservable confounders, but no leverage comes for free.
For example, one assumption I slipped in with my preceding logic of the opening example is that increasing the incentives of the survey necessarily means that I have reached harder-to-reach populations. This is reminiscent of the strong instrument condition of instrumental variables (IV). Another condition for the intuition to hold is that my randomized intervention to reach harder-to-reach populations has not shooed away the original population—similar to the monotonicity condition of IV. The response instrument discussed in Bailey (2023b) randomly assigns respondents the encouragement that mimics opting out from a political survey and opting into a survey on sports. But, with apologies to my sports-enthusiast political science colleagues, I would guess that the correlation between being interested in politics and being interested in sports is negative in the population. If different instruments induce changes in response propensity to different subpopulations, the resulting local adjustment may not be the global adjustment that is needed. The assumptions necessary for the proposed randomized instrument conditions are addressed by Professor Bailey in a separate manuscript (Bailey, 2023b), more carefully than the brief treatment I have given here. In that work, the highlighted assumption is that the first-stage effect of the instrument on response does not vary by the outcome of interest.
Even with the conditions and assumptions that the randomized instrument approach comes with, the approach is still something pollsters and researchers should consider incorporating. Recent U.S. examples suggest that weighting unrepresentative surveys to Census demographics alone cannot substantially reduce error (Bradley et al., 2021). It is still possible to improve the sophistication of weighting methods with existing weighting variables, but in practice the gains to those methods over simple weighting may be modest (e.g., Ben-Michael et al., 2023, Figure 5).
All these considerations bodes well for the future of survey modeling research. Tools for correcting unrepresentative surveys (rather than design-based survey sampling strategies, such as the distinction between cluster sampling and random sampling) deserve a more prominent position in the quantitative social scientistʼs training, particularly alongside causal inference. Survey statistics have provided insights for observational causal inference in the past (e.g., Hainmueller, 2012). I hope Professor Baileyʼs framework will facilitate the change in how survey methods are taught, and further enrich the cross-pollination between survey sampling and causal inference methods.
For insightful conversations that informed this commentary, I thank Professor Ernesto Calvo.
Shiro Kuriwaki has no financial or non-financial disclosures to share for this article.
Angrist, J. D., Imbens, G. W., & Rubin, D. B. (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association, 91(434), 444–455. https://doi.org/10.2307/2291629
Bailey, M. (2023a). A new paradigm for polling. Harvard Data Science Review, 5(3). https://doi.org/10.1162/99608f92.9898eede
Bailey, M. (2023b). Doubly robust estimation of non-ignorable non-response models of political survey data [Paper presentation]. Fortieth Annual Meeting of the Society for Political Methodology at Stanford University, Stanford, CA, United States.
Bailey, M. (in press). Polling at a crossroads: Rethinking modern survey research. Cambridge University Press.
Ben-Michael, E., Avi, F., & Hartman, E. (2023). Multilevel calibration weighting for survey data. Political Analysis. Advance online publication. https://doi.org/10.1017/pan.2023.9
Bradley, V. C., Kuriwaki, S., Isakov, M., Sejdinovic, D., Meng, X.-L., & Flaxman, S. (2021). Unrepresentative big surveys significantly overestimated US vaccine uptake. Nature, 600(7890), 695–700. https://doi.org/10.1038/s41586-021-04198-4
Caughey, D., Berinsky, A. J., Chatfield, S., Hartman, E., Schickler, E., & Sekhon, J. S. (2020). Target estimation and adjustment weighting for survey nonresponse and sampling bias. Cambridge University Press.
Hainmueller, J. (2012). Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies. Political Analysis, 20(1), 25–46. https://doi.org/10.1093/pan/mpr025
Isakov, M., & Kuriwaki, S. (2020). Towards principled unskewing: Viewing 2020 election polls through a corrective lens from 2016. Harvard Data Science Review, 2(4). https://doi.org/10.1162/99608f92.86a46f38
Meng, X.-L. (2018). Statistical paradises and paradoxes in big data (i) law of large populations, big data paradox, and the 2016 US presidential election. The Annals of Applied Statistics, 12(2), 685–726. https://statistics.fas.harvard.edu/files/statistics-2/files/statistical_paradises_and_paradoxes.pdf
©2023 Shiro Kuriwaki. This article is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the article.