Additional discussions and rejoinder forthcoming.
Bailey (2023) argues convincingly for a paradigm shift in opinion polling. The current paradigm works on the basis that (truly) random samples are available, despite the fact that nonresponse and nonprobability samples have essentially consigned them to history. To ease this tension, the current paradigm has incorporated weighting adjustments on the premise that weighted nonrandom samples possess the useful properties of random ones. The problem, Bailey writes, is that weighting and related methods make strong assumptions, assumptions that are untenable in a polling context.
Reading Bailey’s article, I was struck by the relevance of his arguments to my line of work: biodiversity monitoring. Although there are rare cases where random samples are available to biodiversity monitors, the prevalence of nonprobability samples and incomplete uptake of sites (equivalent to nonresponse in polling) mean that they are the exception rather than the rule. Weighting and other adjustments are less common than in polling, but where they are used, their assumptions are similarly untenable. In this commentary, I expand on these points and pose the question of whether biodiversity monitoring, like polling, is due a paradigm shift.
Although very different subjects, opinion polling and biodiversity monitoring are both exercises in descriptive statistical inference, so similar statistical challenges arise. The first step is (or should be) to define a finite population comprising many ‘observation units.’ In polling, the units are people, and the population is all people in some geographic or political boundary (or a subset like registered voters). In biodiversity monitoring, it is simplest to think of the population as a landscape and the observation units as patches of land (sites) within that landscape. The goal in both disciplines is to estimate some parameter describing a variable of interest in the population. Pollsters are interested in parameters like the average political opinion in the electorate; for biodiversity monitors, it is parameters like the average abundance of some species across the landscape that are of interest. Calculating these parameters would be simple if data were available on every observation unit. Typically, however, it is not possible to survey every observation unit, so analysts must estimate them from data on a sample of observation units. Estimating population parameters from a sample—that is, inference—is most straightforward where the sample was selected randomly from the population.
Random sampling has two useful properties that permit straightforward inference. The first is that it ensures that the distribution of the variable of interest in the sample mirrors its distribution in the population in expectation; that is, when averaged over many possible (but never realized) samples. Intuitively, if the distribution of the variable of interest in the sample mirrors its population distribution, then sample-based statistics, like means and proportions, are good approximations of their population analogues (the quantities of interest). The second handy property of random samples is that sample-based statistics converge to their population equivalents as the size of the sample increases, so sample size is a useful metric for error.
As Bailey explains, in polling, nonresponse has consigned random samples to history. Pollsters might contact a random sample of the population, but few respond, and those who do respond often have opinions on the subject of interest different from those who do not. Bailey demonstrates this point using the example of the American National Election Study (ANES) and its attempt to predict the result of the 2020 U.S. election. People who were more interested in politics were more likely to respond when asked about their political opinions. It turns out that the politically minded were also disproportionately pro-Biden, which meant that ANES underestimated support for Trump. While pollsters contacted a random sample of the population, whose opinions would have been representative of that population (on average), respondents were not a random sample. ANES ended up correctly predicting that Biden would win the election, but it overestimated his share of the vote by a large margin.
The issue of nonresponse in polling has a direct analogy in biodiversity monitoring: incomplete uptake of sites. Many monitoring schemes rely on volunteers to collect the data. While the schemes might select a random sample of sites at which they would like data to be collected, the volunteers are not necessarily able or willing to visit them all (e.g., Pescott et al., 2019). It is possible that the variable of interest, say, a species’ abundance, will differ between the types of sites that volunteers do and do not visit. For example, species tend to be less abundant in intensively managed landscapes, and, largely because they are relatively uninteresting in terms of the wildlife that can be seen, volunteers prefer not to visit them (Pescott et al., 2015). Failing to visit intensively managed sites would result in an overestimate of the species’ average abundance across the wider landscape, in the same way that the low response rate of Trump supporters caused ANES to overestimate support for Biden.
Further tipping the balance away from random samples, in both polling and biodiversity monitoring, is the prevalence of nonprobability samples. Whereas nonresponse and incomplete uptake of sites turn random samples into nonrandom ones, nonprobability samples were never intended to be random. Pollsters might obtain a nonprobability sample by recruiting participants via ads or mailing lists (Bailey, 2023). In biodiversity monitoring, nonprobability samples might comprise records submitted to mobile phone apps by amateur naturalists when they spot species that interest them (August et al., 2015) or digitized records of preserved specimens held in museums and herbaria (Nelson & Ellis, 2019).
Nonrandom samples, whether nonprobability samples or nominally random samples with nonresponse or incomplete uptake, do not share the attractive properties of their truly random counterparts. The distribution of the variable of interest in a nonrandom sample is not guaranteed (or even likely) to be similar to its distribution in the population, so sample means, proportions, and so forth are likely to differ from their population analogues (even in expectation). Moreover, statistics derived from nonrandom samples do not converge to their population equivalents as the size of the sample increases. Consequently, conventional uncertainty intervals, which are based on sample size, can be (wildly) misleading (e.g., Boyd, Powney, & Pescott, 2023; Bradley et al., 2021).
In polling, and to a lesser extent in biodiversity monitoring (e.g., Fink et al., 2023; Van Swaay et al., 2008), analysts try to bestow the useful properties of random samples on nonrandom samples by adjusting them. The general idea is to weight each observation unit in the sample in such a way that the distribution of the variable of interest matches its distribution in the population (as is true in expectation under random sampling). What makes weighting challenging is that the distribution of the variable of interest in the population is not known (if it were, there would be no need for a survey in the first place). Instead, the usual strategy is to assemble ‘auxiliary variables,’ whose distributions (or at least totals) in the population are known, and to weight the sample in such a way that the distributions of those auxiliaries match their distributions (or totals) in the population (Valliant et al., 2018). A good auxiliary variable is a common cause of the variable of interest and sample inclusion; that is, whether each observation unit ended up in the sample (Thoemmes & Rose, 2014). If the auxiliary variable explains an appreciable portion of the variance in both, then including it the weighting process will bring the distribution of the variable of interest in the sample closer to its distribution in the population (Collins et al., 2001), and sample-based statistics will be closer to their population analogues.
As Bailey points out, however, it is highly unlikely that weighting will fully align the distributions of the variable of interest in the sample and population. The problem is that we do not know all of the relevant auxiliary variables (common causes of sample inclusion and the variable of interest), and even if we did, they might not be reflected in available data (Bailey, 2022; Boyd, Stewart, & Pescott, 2023). Readers familiar with missing data theory might recognize this situation, which is equivalent to saying that nonsampled people or sites are ‘missing not at random’ (MNAR; Rubin, 1976). It is very difficult to recover population parameters in a MNAR situation.
Bailey’s vision is of a new paradigm in which analysts acknowledge that weighted nonrandom samples do not necessarily share the attractive properties of their random counterparts. It builds on the Meng (2018) equation, an algebraic reexpression of survey error for any sample—random or not (and weighted or not). The Meng equation is essentially an alternative to random sampling theory that remains applicable in the post–random sampling world of polling (although it has some practical limitations, as Bailey explains).
Bailey makes two general arguments for a paradigm shift in polling: random sampling theory is irrelevant given nonresponse and nonprobability samples and weighting only restores its relevance if nonsampled observation units are ‘missing at random.’ I think these arguments are sound and apply to biodiversity monitoring. In terms of the new paradigm itself, I needed no convincing, having already found the Meng equation to be useful in the context of biodiversity monitoring (Boyd, Bowler, et al., 2023; Boyd, Powney, & Pescott, 2023). I strongly recommend ecologists read Bailey’s excellent article and book on the topic (Bailey, 2023, in press) and would be interested to hear his thoughts on the broader applicability of his new paradigm.
Thank you to Xiao-Li Meng for the opportunity to contribute to the discussion of Bailey’s article.
The author was supported by the NERC Exploring the Frontiers award number NE/X010384/1, “Biodiversity indicators from nonprobability samples: Interdisciplinary learning for science and society.”
August, T. A., Harvey, M. C., August, T. O. M., Harvey, M., Lightfoot, P., Kilbey, D., Papadopoulos, T., & Jepson, P. (2015). Emerging technologies for biological recording. Biological Journal of the Linnean Society, 115(3), 731–749. https://doi.org/10.1111/bij.12534
Bailey, M. A. (2022). Comments on “ Statistical inference with non-probability survey samples.” Survey Methodology, 48(12), 331–338.
Bailey, M. A. (2023). A new paradigm for polling. Harvard Data Science Review, 5(3). https://doi.org/10.1162/99608f92.9898eede
Bailey, M. A. (in press). Polling at a crossroads: Rethinking modern survey research. Cambridge University Press. https://www.cambridge.org/core/books/polling-at-a-crossroads/796BAA4A248EA3B11F2B8CAA1CD9E079
Boyd, R. J., Bowler, D. E., Isaac, N. J. B., & Pescott, O. L. (2023). On the trade-off between accuracy and spatial resolution when estimating species occupancy from biased samples. Ecoevorxiv. https://doi.org/10.32942/X2KK61
Boyd, R. J., Powney, G. D., & Pescott, O. L. (2023). We need to talk about nonprobability samples. Trends in Ecology & Evolution, 38(6), 521–531. https://doi.org/10.1016/j.tree.2023.01.001
Boyd, R. J., Stewart, G. B., & Pescott, O. L. (2023). Descriptive inference using large, unrepresentative nonprobability samples: An introduction for ecologists. Ecoevorxiv. https://doi.org/10.32942/X2359P
Bradley, V. C., Kuriwaki, S., Isakov, M., Sejdinovic, D., Meng, X. L., & Flaxman, S. (2021). Unrepresentative big surveys significantly overestimated US vaccine uptake. Nature, 600(7890), 695–700. https://doi.org/10.1038/s41586-021-04198-4
Collins, L. M., Schafer, J., & Kam, C. (2001). A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods, 6(4), 330–351. https://doi.org/10.1037/1082-989X.6.4.330
Fink, D., Johnston, A., Auer, M. T., Hochachka, W. M., Ligocki, S., Oldham, L., Robinson, O., Wood, C., Kelling, S., Rodewald, A. D., & Fink, D. (2023). A Double machine learning trend model for citizen science data. Methods in Ecology and Evolution, 14(9), 2435–2448. https://doi.org/10.1111/2041-210X.14186
Meng, X.-L. (2018). Statistical paradises and paradoxes in big data (I): Law of large populations, big data paradox, and the 2016 US presidential election. Annals of Applied Statistics, 12(2), 685–726. https://doi.org/10.1214/18-AOAS1161SF
Nelson, G., & Ellis, S. (2019). The history and impact of digitization and digital data mobilization on biodiversity research. Philosophical Transactions of the Royal Society B: Biological Sciences, 374(1763), Article 20170391. https://doi.org/10.1098/rstb.2017.0391
Pescott, O. L., Walker, K. J., Harris, F., New, H., Cheffings, C. M., Newton, N., Jitlal, M., Redhead, J., Smart, S. M., & Roy, D. B. (2019). The design, launch and assessment of a new volunteer-based plant monitoring scheme for the United Kingdom. PLoS ONE, 14(4), Article e0215891. https://doi.org/10.1371/journal.pone.0215891
Pescott, O. L., Walker, K. J., Pocock, M. J. O., Jitlal, M., Outhwaite, C. L., Cheffings, C. M., Harris, F., & Roy, D. B. (2015). Ecological monitoring with citizen science: The design and implementation of schemes for recording plants in Britain and Ireland. Biological Journal of the Linnean Society, 115(3), 505–521. https://doi.org/10.1111/bij.12581
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3), 581–592. https://doi.org/10.1093/biomet/63.3.581
Thoemmes, F., & Rose, N. (2014). A cautious note on auxiliary variables that can increase bias in missing data problems. Multivariate Behavioral Research, 49(5), 443–459. https://doi.org/10.1080/00273171.2014.931799
Valliant, R., Dever, J. A., & Kreuter, F. (2018). Practical tools for designing and weighting survey samples (2nd ed.). Springer Cham. https://doi.org/10.1007/978-3-319-93632-1
Van Swaay, C. A. M., Nowicki, P., Settele, J., & Van Strien, A. J. (2008). Butterfly monitoring in Europe: Methods, applications and perspectives. Biodiversity and Conservation, 17(14), 3455–3469. https://doi.org/10.1007/s10531-008-9491-4
©2023 Robin J. Boyd. This article is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the article.