Skip to main content
SearchLoginLogin or Signup

What Are Statistical Assumptions About? An Answer From Perspectivism

Published onFeb 10, 2025
What Are Statistical Assumptions About? An Answer From Perspectivism
·

Abstract

This article presents a perspectivist framework for understanding and evaluating statistical assumptions. Drawing on the thesis of perspectivism from the philosophy of science, this framework treats statistical assumptions not as empirical hypotheses that are descriptively accurate or inaccurate about the world but as prescribing a particular perspective from which statistical knowledge is generated. What this means is that we ought not judge statistical models solely by how closely they correspond with the world as we independently understand it, but by whether they paint a picture of the world that is epistemically significant.

Keywords: modeling assumptions, philosophy of science, perspectivism


Media Summary

The application of statistical models depends on modeling assumptions. It is typically understood that these assumptions describe what the world must be like in order for us to legitimately apply these models. However, many assumptions cannot be empirically validated to the extent where we can consider them as true, despite the widely accepted use of models that rely on them. This article introduces two alternative ways of understanding the role of statistical modeling assumptions, both inspired by the thesis of perspectivism from the philosophy of science. Perspectivism holds that we can only make sense of scientific knowledge from a particular perspective. Accordingly, this article advocates understanding statistical assumptions as specifying the perspective from which we ought to understand the knowledge generated by the corresponding statistical models.


1. Introduction

On what grounds can we say that a statistical model is or is not applicable to a particular inferential context? One way to answer this question is to examine the inferential context in question and ask whether the modeling assumptions we invoke are descriptive of the data-generating process. Was the sample randomly chosen such that we can assume it is independent and identically distributed (IID)? Was the response variable constructed in such a way that we can assume it is continuous? This framework treats statistical modeling assumptions as empirical hypotheses: they express claims that are true or false of the world and a model is applicable to a context just in case it is built on true or approximately true assumptions.

There is a limitation to this approach, however, which is that it is not always straightforward to determine whether an assumption is true of the inferential context. To give a historical example, consider the dispute between Karl Pearson and G. Udny Yule on the interpretation of discrete outcome variables (Aldrich, 1995; Breen et al., 2018; Powers & Xie, 2008). While Pearson holds that discrete variables ought to be interpreted as manifestations of continuous latent variables, Yule believes that this interpretation is empirically unfounded. In cases such as this where modelers disagree over the empirical implications of the same modeling assumption, the dispute over whether a model applies in a context can go beyond both sides agreeing on the same characterization of that context. In other words, it is not often a simple matter of getting the science right before statistical analysis.

Statisticians have responded to this worry in at least two ways. First, some modeling assumptions can be made true through intentional decisions in data gathering or experimental design. An example of this is design-based analysis, which uses experimental design choices, such as physical randomization, instead of modeling assumptions, to justify the use of certain analysis methods (cf. Cox, 2006). This strategy has the benefit that, when executed well, we can confidently say that our model applies to this problem not because we hope that the world is as we understand it to be, but because we have engineered the problem precisely so that models like this apply. In response, some authors (e.g., Goldenberg, 2006; Lawler & Zimmermann, 2021) worry that the success of this strategy in some research contexts has caused researchers to prefer ways of setting up the problem that may not align with their original research goals.

A second strategy is to develop statistical tools that do not rely on a lot of modeling assumptions so that they can be widely applicable. An example of this is Breiman’s (2001) call to move away from modeling data to modeling algorithms. The idea here is that, instead of justifying statistical inference on a representation of the world (the data model), we should aim to justify statistical conclusions on the demonstratable success of decision procedures (the algorithm). A difficulty that sometimes arises here is that, once again, researchers do not always agree on the extent to which a method is assumption free. For example, the disagreement between Rouder (2014) and de Heide and Grünwald (2021) over whether optional stopping presents a problem for Bayesian analysis can be understood as a disagreement over whether Bayesian methods need to make assumptions about stopping rules.

The current article aims to propose a different framework of answers. Instead of understanding statistical assumptions as empirical boundaries that constrain model building, this framework understands them as rules that prescribe a particular perspective from which statistical knowledge is generated. Instead of giving a description of the world that may or may not be true, modeling assumptions prescribe a perspective that specifies the kind of information we can hope to gain from the subsequent model. Consequently, these assumptions should not be evaluated solely by their empirical adequacy but also by whether they allow the model to serve the inferential role we expect it to serve, which, I will argue, does not rely on a faithful representation of the world.

This framework draws upon the thesis of perspectivism from the philosophy of science, which is a cluster of views that hold that the expectation that an ultimate scientific theory would be able to describe everything everywhere all at once is both impossible and unhelpful. Instead, scientific knowledge is necessarily partial and incomplete; scientific knowledge necessarily comes from one perspective or another. In this sense, statistical knowledge is not different from other kinds of scientific knowledge.

There exist considerable disagreements over what is involved in such a scientific perspective as well as what it means to understand scientific knowledge through a perspectival lens. In this article, I review and apply two versions of perspectivism to the interpretation of statistical knowledge, one mild and one radical. The two versions differ in their philosophical ambitions but share much of the same interpretive implications about statistical knowledge and modeling assumptions.

The article is organized as follows. Section 2 introduces a mild perspectivism and applies it to the case of statistical knowledge. Section 3 does the same for a radical perspectivism. Section 4 concludes by discussing implications of this framework.

Before proceeding further, I would like to make a quick terminological note. Throughout this article I will use the term empirical adequacy to mean alignment with reality purely observationally, while truth means capturing the underlying (unobservable) data-generating process. It is generally assumed that empirical adequacy alone is insufficient for determining whether a model is true,1 and true models may not be empirically adequate. Nevertheless, assessing the empirical adequacy of a model remains an important step when establishing model truth. Similarly, an empirically adequate modeling assumption is one that is, as far as we can tell, consistent with our data at hand, whereas a true modeling assumption describes the world in a way that we may not be able to directly verify.

2. Statistics as a Language for Coordination

What is the point of mathematizing a scientific theory? One answer is to say that mathematizing a theory improves it in some way—perhaps because the fundamental laws of the universe must be expressed mathematically or that mathematics makes a theory precise and a true theory must be precise. This is the answer I will set aside here.

Another answer is to point to the fact that mathematizing a theory translates it into the language of mathematics, which is something that has extrinsic value. This is the answer pursued by Eronen and Romeijn (2020) in the context of psychiatry. In particular, they argue that the point of mathematizing psychiatric theories is not because we think the mathematized form captures the essence of a theory, but because the mathematized forms of different theories can be compared in a way that the original forms of those theories cannot.

Philosophers have long pointed out the problem of comparing scientific theories because successful theories often carry a rich worldview that justifies their own success. It is difficult—some argue impossible—to establish a theory-neutral comparison. What Eronen and Romeijn (2020) argue is that the language of mathematical and statistical modeling provides such a comparison by putting conceptual terms, which are only interpretable in the context of their theory of origin, into mathematical terms, which can be compared with other mathematical terms.

It is important to note that this view does not assume that the mathematical language holds any privilege or is capable of capturing any more essence than the conceptual language. Consider the case where a speaker of Korean and English and a speaker of German and English want to compare a sentence in Korean with a sentence in German. Both speakers translate their respective sentences into English for comparison, but neither is under any illusion that the English translation somehow captures the essence of the original sentences.

Similarly, Eronen and Romeijn (2020) argue that the usefulness of comparing theories in mathematical terms should not lead us to believe that the essence of those theories is mathematical. Instead, mathematical models do not correspond with nature in a straightforward way; “we have to actively co-ordinate the mathematical structure onto empirical reality. That is, we have to lay down definitions that connect key theoretical concepts to experimental procedures and measurements, so that claims cast in terms of these concepts are provided with empirical content” (p. 790, emphasis in original).

What does this mean for statistical assumptions? If the point of statistical models is to translate theories into a theory-neutral language for comparison, then the role of statistical assumptions is to coordinate this translation so that the resulting model is somewhat faithful to the theory. Nevertheless, since the theory is (according to this view) not statistical in essence, there is no reason to expect a single set of assumptions to accurately capture all that a theory says about the world. In other words, the statistical assumptions present but one particular perspective from which two distinct theories can be evaluated and compared.

The idea that the epistemic value of models may not reside in their correspondence with the data-generating process is a controversial one. For example, following McCullagh's (2002) category-theoretic analysis of statistical modeling, Besag (2002) argues that the word “model” should be reserved “for something that has a physical justification, such as the Poisson process in certain applications” (p. 1268), rather than an abstract mathematical object devoid of physical attachment. Huber (2002) similarly contends that “a statistical model ordinarily is supposed to model some real-world situation, and as such it either is adequate, or it is not” (p. 1290). Kalman (2002) states this sentiment most forcefully (p. 1293):

My critique is that the currently accepted notion of a statistical model is not scientific; rather, it is a guess at what might constitute (scientific) reality without the vital element of feedback, that is, without checking the hypothesized, postulated, wished-for, natural-looking (but in fact only guessed) model against that reality.

Indeed, if we understand the point of statistical modeling to be representing reality as it objectively exists, then the observation that we use many models without establishing the empirical adequacy of their assumptions first is certainly worrisome. However, if we take statistical modeling to serve a coordinative role between theory and reality, then the worry about empirical adequacy is not as straightforward.

The idea is this. Since we have no theory-independent access to reality, it is difficult to compare scientific theories on the basis of their empirical adequacy alone. This is because scientific theories speak different languages that lead us to see the world differently, so a direct comparison is likely to be unfair for one theory or another. What we can do instead is to translate scientific theories into the language of statistics and compare the resulting model with reality. This is a difficult translation because neither the scientific theory nor the world is essentially statistical.2 Consequently, scientists must constantly negotiate the interface between theory and model as well as between model and world in the form of statistical assumptions.

Importantly, on this view, the epistemic value of a statistical model does not rely on the empirical adequacy of its assumptions but on the empirical adequacy of its underlying theory, which is imperfectly and partially described by the assumptions. What this means is that, contra-Huber (quoted earlier), statistical assumptions ought to be judged as more or less reasonable, rather than adequate or inadequate.

As an example, consider Parker’s (2006) analysis of modeling pluralism in climate science. Parker notes that climate scientists often employ multiple incompatible models in climate predictions in a way that is inconsistent with the assumption that these models are competitors for being the true model. At the same time, climate scientists also debate over the empirical adequacy of these models, which suggests that the models are not treated merely as predictive instruments. Instead, these models are treated as something in between possible descriptions of the world and predictive instruments.

Parker’s analysis of this modeling pluralism has several features that are consistent with my claim that we can understand these models as serving coordinative roles in Eronen and Romeijn’s (2020) sense. First, one source of this modeling pluralism comes from the fact that there are multiple ways to mathematically represent the same physical system—for example, the atmosphere can be represented as a grid of points or as a series of waves. Decisions like these are akin to translational decisions when one word in a language can be translated into multiple words in another. It is possible that one such translation is better than the others in a given context, but we would not know this to be case by examining the words themselves alone. That is, while it is possible that one modeling assumption leads to better fitting models than another given some context, we would not be able to know this by comparing the modeling assumptions with their corresponding descriptive claims alone.

Second, another source of this modeling pluralism is that, even when models differ substantively in what they assume the world to be like, there is often no clear way to adjudicate the merits of these differences. For example, since there exists only one real climate history, the usefulness of retrodictive accuracy as a measure of model adequacy is quite limited. This highlights the difficulty of comparing multiple competing theories even when they are all empirical theories and, in principle, ought to be adjudicated on empirical grounds. Similarly, while it is important for statistical models to fit the data with some base level of fitness, our understanding of how the data came to be and what the theory behind the model is may nevertheless make us trade some fitness for other merits.

Third, translating physical theories about climate systems into mathematical models has practical benefits even if the translation can never be fully faithful to the original theories. As Parker points out, climate change predictions are typically done using an ensemble of models. Instead of establishing a single model as the correct one before using it for prediction, climate scientists opt to use a group of models, all of which have some claim for being close to the truth. Note that this strategy is only viable if the models ‘speak the same language’—that they make predictions in a way that can be directly compared. This is the benefit of mathematizing highlighted by Eronen and Romeijn (2020).

It is important to note that mathematical modeling is not the only platform where this kind of coordination happens, although it is perhaps the most popular platform. Tal (2016) has presented a case study of the International Bureau of Weights and Measures’s (BIPM) use of clock ensembles to keep universal time. As Tal explains, the precise theoretical definition of the standard second is one that cannot be realized perfectly because it involves idealizations such as that the atom is at rest at zero degrees Kelvin. Consequently, the BIPM keeps an ensemble of clocks, each of which tries to approximate this ideal definition in its own ways. In this example, the physical clocks serve the same role as statistical models—they interpret the theoretical definition in ways that, we all know, cannot be perfectly accurate. Nevertheless, their interpretations coordinate the definition with the practical goal of time-keeping.

The design decisions in clocks and in statistical models are treated in a way that, as both Parker (2006) and Eronen and Romeijn (2020) have noted, sits somewhere in between realism and instrumentalism. On the one hand, the design decisions are often made with reference to theories that give empirical descriptions of the world rather than mere predictive utility. Moreover, some design decisions are rejected for being empirically inadequate. This suggests that the models are not treated as mere tools for inference. On the other hand, the design decisions are often made with the explicit knowledge that they cannot be fully empirically adequate. Moreover, a plurality of mutually incompatible modeling decisions is tolerated or even encouraged. This suggests that the models are not in competition for the true model. The answer from perspectivism is that the models are representations of the world from a particular perspective. While the perspective brings about certain benefits such as the ability to integrate modeling results, it also comes with sacrifices, such as the inability to match statistical assumptions with physical hypotheses.

To end this section, I would like to also point out that the use of empirically dubious yet practically useful rules for comparison is not foreign to statistical modeling. Consider the use of information criteria in model selection. Theoretically speaking, information criteria estimate the information distance between the model at hand and the true model as a way to balance overfitting and underfitting. However, there is a limitation to how seriously we should take this gloss as the actual justification for the use of these information criteria. For example, in response to Shao’s (1997) asymptotic analysis of linear model selection, several commentators question the relevance of asymptotic justifications because they rest on a number of implausible assumptions, such as that the true model is also linear (Stone, 1997) or that a true model exists (Zhang, 1997). Again, if we understand the epistemic value of information criteria to be based in the empirical picture they paint—namely, that there is a true linear model that is a certain distance away from the model under consideration—then these questions are detrimental to the justifiability of information-based model selection. However, what we can say instead is this: It is unilluminating to compare two models in terms of goodness of fit alone since different models tend to involve different sets of parameters and paint different pictures of the data (‘Rashomon effect’). An information criterion such as the Akaike Information Criterion (AIC) provides a data-neutral language with which we can compare models without unfairly adopting one or another such picture. At the same time, the information criterion is not intended to capture the essence of why one model is the best description of the data since sometimes we do have nonstatistical reasons for preferring one picture over another. To the extent that this is an acceptable justification for the use of information criteria, what I have argued in this section is to apply the same attitude toward the adoption of statistical assumptions.

3. Statistical Knowledge as Perspectival Knowledge

The perspectival framework introduced in the previous section values statistical assumptions instrumentally rather than empirically. I will call this the mild version of perspectivism, for two reasons. First is that it applies attitudes that already exist in certain applications of statistical concepts, such as the use of information criteria for model selection. Second is that it is compatible with a nonpluralist realist understanding of both the world and how science operates—namely, there is a true and unique data-generating process behind any set of conscientiously gathered data set and our goal as scientists is to describe and explain that process using tools such as statistical modeling. In this section, I present a stronger version of perspectivism that relies on a fundamentally pluralist, and hence more controversial, picture of science and reality. What I aim to present here is not a conclusive argument for the adoption of this strong perspectivism about philosophy of science, but to illustrate how this philosophy may bring new insights into understanding how statistical knowledge relates to other kinds of knowledge.

The most comprehensive exploration and defense of a thoroughgoing perspectivism in the philosophy of science is recently given by Massimi (2022), who argues that we ought to understand scientific theories as giving inferential blueprints to guide activities. Because different epistemic communities face different challenges as they navigate the world, they naturally would model reality differently, resulting in multiple perspectives of the same world (see Massimi, 2018a, 2018b, 2021). None of these perspectives present a complete picture of the world, because a complete theory of everything is impossible (as per the premise of perspectivism). Yet all of these theories may be true in the sense that they carry the kind of information about the world we expect true theories to carry, such as instructions that guide successful action.

Nevertheless, Massimi’s (2021) perspectivism is not quite strong enough for my purpose here because it still relies on the assumption that the reason we need a plurality of perspectives is that having a single unifying perspective is practically impossible. Modelers know that, barring toy examples, we never really have enough information to build a model that is everywhere faithful to the target system and, even if we did, the resulting model would not be computationally feasible to use. Consequently, we need multiple models of the same target that are useful for different tasks. This is not the perspectival pluralism I am interested in here, however.

Instead, I am interested in a kind of perspectivism that is not motivated by limitations. It holds that there is something about the nature of knowledge itself that tethers it to one perspective as opposed to another. Since there is currently no full-blown defense of this kind of perspectivism in these terms, in what follows, I draw inspirations from several different literatures to motivate this strong version of perspectivism and to illustrate how it applies to statistical knowledge.

An explicit attempt to push perspectivism further than Massimi’s account comes from Chang (2020), who argues that “[i]t is not only the truth conditions for a knowledge claim that are perspectival but the knowledge claims themselves” (p. 20). That is, in addition to the belief that justifications for claiming something to be true (its ‘truth conditions’) must be perspectival because scientists are necessarily limited by their own points of view, we should also hold that truth itself is attached to certain perspectives. This is because knowledge claims are, for Chang, the answers the world gives to a scientific activity, and so cannot be made sense of outside of this activity. What scientific activities are worthwhile, in turn, is dictated by pragmatic considerations rather than by the propensity for them to yield any kind of answer that can be seen as mirroring the world in any way. In other words, a knowledge claim is true just in case it allows us to perform an activity successfully; because activities are always done within a certain cultural-historical context and thus embodying a perspective, knowledge claims are always perspectival.

Consider the kind of activity someone might engage in that can be described as ‘predicting.’ If I were given a table of numbers representing starting salaries of recent alumni and asked to give a prediction in the form of a number representing the predicted starting salary of this year’s alumni, the kind of activity I would be doing would be quite different from if I knew my cousin very well and would like to help her decide which college major could support the lifestyle she wanted. In both cases, I am using past information to predict the future. In both cases, I may be right or wrong in my prediction. However, if someone were to argue that analyzing a table of numbers alone is never going to work unless I also know the individuals behind those numbers personally, we would say that they have misunderstood the point of the task. It is in this sense that knowledge claims are tethered to the perspective that generates them.

With a slightly different emphasis, historians and sociologists have similarly noted how scientific activities define what information can be extracted from the world. For example, Porter (1995) documents the early development of actuarial science, which often involved interviewing the insurance seeker in order to judge his character. An interview was preferable because there was widespread belief that such judgments were necessarily personal and subjective. Modern actuarial science no longer holds this view. However, it is not the case that modern actuarial science improves upon the old one by incorporating more information to achieve better accuracy. Instead, we now believe that subjective assessments of character should be ignored altogether. In fact, one benefit of using quantitative methods is precisely so that we can more easily ignore subjective assessments of this type.

Once again, it is worth noting that the strong perspectivist thesis I present here is not unique to statistical modeling. Consider the phenotypic gambit in evolutionary game theory, which is the assumption that evolutionary dynamics in a population can be fruitfully studied without considering the genetic makeup of the competing strategies. Although some have expressed worries about the truth of this assumption (e.g., Rubin, 2016; van Oers & Sinn, 2011), knowledge claims coming out of evolutionary game theory generally do not reference genetics. In fact, I would argue that game theoretic explanations of behavior are valuable precisely because of the amount of information they are able to ignore.

Moreover, the fact that sometimes we would reject the phenotypic gambit and thereby reject the game theoretic perspective does not undermine my claim that game theoretic knowledge claims gain their epistemic significance by ignoring certain kind of information. As Chang (2020) has noted, even though each perspective generates its own knowledge claims and so is in some sense not epistemically accountable to other perspectives, a perspective can nevertheless be rejected for not fitting the pragmatic task at hand. This is why assumptions, as in the case of mild perspectivism discussed in the previous section, play a dual descriptive and prescriptive role. Sometimes they are rejected as false or inadequate, which seems to suggest that they hold descriptive content. When they are accepted, however, it is often not the case that we have positive empirical evidence that support their truth. Instead, it is often the epistemic ramifications of assuming their truth that motivates their adoption.

Similarly, statistical assumptions specify the perspective from which statistical knowledge ought to be interpreted in a way that is both descriptive and prescriptive. Consider treatment assessment done on a sample of individuals in a clinical setting. If we want to learn anything at all about the population from this sample, we need to assume that the sample is drawn from the same distribution that describes the population. This is an assumption that is sometimes empirically shown to be false. The sample may differ from the target population in relevant ways, in which case we ought to reject models that are built on the assumption that they are from the same distribution. However, when this assumption is not rejected, its acceptance represents its prescriptive value rather than its descriptive truth. Any sample of individuals is going to differ from each other in some way, so the point of assuming that they are drawn from the same distribution is not to assume that they are interchangeable simpliciter, which is empirically false, but to assert that the differences between them do not matter. It is a prescriptive assertation that we ought to ignore individual differences.

While it is true that, practically speaking, we cannot expect a model to encompass all individual differences, the assertion that we ought to ignore them goes beyond practical limitations. In fact, it plays a central role in how we can make use of statistical knowledge claims. Suppose I am considering taking a drug that has been shown to be better than placebo in a trial but I am concerned about the side effects that many trial participants have reported. Why would I think that any information gained about the drug from testing a group of people who are not me is in any way applicable to me? One answer is that the drug has been shown to be causally efficacious in curing this disease; although specific people were used to show this, the identity of those people is inessential. What is important is the causal pathway through their physiology and the fact that I have the same physiology. This answer has been disputed by Cartwright (2007), among others, who argue that the kind of experimental setup in drug trials is insufficient at establishing the kind of causal relations we would need to make this kind of claim (cf. Deaton & Cartwright, 2018; Lawler & Zimmermann, 2021).

Here is a better answer, offered by the strong perspectivism presented in this section. Patients who participate in drug trials are very different from each other. Even when they are diagnosed as having the same disease, they are likely going to have it to different degrees of severity, manifesting different symptoms, and with different comorbidity. Nevertheless, a statistical analysis that ignores many of these individual differences suggests that the drug is still better than the placebo. This gives me reason to believe that the drug will be effective on me, even though I also differ from the test subjects in numerous ways.

In this case, the strength of the conclusion resides precisely in the fact that the assumption does not describe the world with perfect accuracy. Instead, the assumption prescribes a certain way of looking at the problem—where we view individuals as data points of the same phenomenon—that allows us to formulate certain kinds of knowledge claims.

Note also that this perspectivist picture is different from the idea that successful statistical inference must distinguish between signal and noise. By definition, the information we end up not using counts as ‘noise,’ but the point here goes beyond the usual observation that a generalizable conclusion must be drawn from a model that does not try to fit every idiosyncrasy within the data. The point is instead that, in order to perform the task we want to perform, which is to learn a generalizable lesson from a sample, we must take a particular epistemic stance, which means introducing elements such as randomness, which are not present in the world, and ignoring elements such as individual differences, which are present in the world (Craiu et al., 2023).

To further clarify this perspectivist picture, we can compare it with a related but different line of discussion—that of objectivity versus subjectivity in the interpretation of probability and statistical inference. At a basic level, objectivists interpret probability as a quantity that describes some aspects of the world, such as the relative frequency of an event or the physical propensity of an object. Subjectivists, on the other hand, interpret probability as a reflection of the uncertainty an agent has when facing a set of evidence. Not all aspects of this dichotomy affect the use and interpretation of statistical methods since, as long as the relevant consistency requirements are in place, the numbers can often be manipulated in the same way regardless of how we choose to interpret them (see, e.g., van Dongen et al., 2019; but also cf. Mayo & Spanos, 2011). The perspectivist picture I propose in this article similarly does not need to take a stance on this issue.

Nevertheless, considerations of objectivity in statistics go well beyond the issue of interpreting probability alone. Gelman and Hennig (2017) point out that statisticians’ preference for objectivity often manifests in their preference for methods that do not require user tuning, with the idea being that the absence of subjective tuning ensures the absence of subjective bias. This idea, argue Gelman and Hennig, is neither successful nor desirable. Instead, we ought to understand words like ‘objectivity’ and ‘subjectivity’ as standing in for several features desirable for community science, such as impartiality and context sensitivity. Statistical analysis, like all scientific methodologies, needs to balance these features in an intentional, rather than formulaic, way.

While I do not disagree with any specific proposal made by Gelman and Hennig (2017), I would like to take a closer look at the idea of balancing different virtues here. It is without a doubt that statistical modeling involves a great deal of balancing—between fitness and simplicity, between interpretability and predictive power, between generalizability and domain specificity, and so on. It is natural to frame the issue that quantitative data sets do not capture everything that goes on in the world as yet another area that needs balancing. Consequently, it is easy to fall into the temptation of thinking that we must choose between utilizing the full power of statistics as a method of inference, which involves making a number of false or untested assumptions about data, and faithfully describing the world. It seems to me that Gelman and Hennig accept the setup of this dichotomy, even though they disagree with how people have been resolving it.

Under the perspectivist framework I have been sketching in this section, however, the goal of making these contentious modeling assumptions is not to faithfully describe the world but to prescribe a particular point of view. There is no tension that needs resolving. The description of the world underlying statistical models has to be understood from the perspective that is sketched by the statistical assumptions, which means that the same description may no longer be considered adequate when understood from a different perspective. This is not a limitation of the statistical perspective, but a property of all perspectives.

To briefly summarize, this section sketches a perspectival understanding of statistical knowledge that is more radical than the previous section. Here, I assume that it is not possible to have a unified scientific picture of the world that is descriptively accurate of everything everywhere all at once. Instead, a scientific picture needs to be understood and evaluated from a particular perspective. Statistical assumptions, I contend, define such a perspective that allows us to understand and evaluate statistical knowledge, even though it is only one of many scientific perspectives. This is what I meant when I called statistical assumptions “prescriptive.” This means that we ought not value these assumptions for the extent to which they present an accurate picture of the world by themselves, but for whether the perspective they sketch is scientifically significant. In the next section, I elaborate on what this entails.

4. Statistical Assumptions Under Perspectivism

In the previous two sections, I have presented two ways of understanding the role modeling assumptions play in specifying the statistical perspective and defining statistical knowledge. To briefly overview, the first way is to understand statistical modeling as an exercise of translating different scientific theories into the same language so that they can be directly compared and synthesized. On this view, models are not meant to be faithful representations of these theories. They are supposed to capture only the parts of their corresponding theories that are relevant to the comparison context at hand. The role of modeling assumptions is to provide the language for this task. Consequently, modeling assumptions describe the world insofar as they capture some aspect of the theory that is descriptive of the world. At the same time, modeling assumptions are not judged by them being literally true or not, because they are, by supposition, imperfect tools for translation. They also serve a prescriptive role in telling us how we ought to understand what the salient part of a theory is.

The second way to understand statistical modeling is as providing a kind of knowledge that derives its epistemic significance by not trying to represent the world in its entirety. To adopt a statistical perspective is to agree to look at the world in a certain way, which involves ignoring information that may be crucial to other perspectives. Statistical assumptions define the statistical perspective by specifying what information we are ignoring and what knowledge claims we receive in return. Insofar as statistical knowledge is valuable, its value gains strength from the fact that statistical models are not maximally descriptively accurate.

Hennig (2010, 2023) presents a very similar framework as what I have been sketching here where he argues that mathematical modeling ought to be understood as “an autonomous domain that emerged from the pursuit of absolute agreement in communication” (2010, p. 46). In order to achieve this goal, mathematical models necessarily cannot respect personal reality to its fullest extent. What this means, Hennig argues, is that it is often counterproductive to test for certain modeling assumptions (such as normality) even if they appear to be foundational for model building. My answer will be similar to Hennig’s: Because I also reject statistical assumptions as empirical hypotheses, I cannot endorse their evaluation on grounds of empirical adequacy. Nevertheless, this does not mean that statistical assumptions cannot be evaluated.

If we understand statistics as a language for theory comparison, then it is successful just in case the theories we wish to compare are adequately expressible and comparable in this language. This means that the language needs to be flexible enough to capture sufficiently diverse details but also rigid enough to be able to standardize the theories’ idiosyncrasies. Similarly, if we understand statistics as a perspective from which certain knowledge claims ought to be understood, then it is successful just in case it allows for the generation and interpretation of knowledge in much of the same way that other scientific perspectives do. While it goes beyond the scope of this article to examine when and how the statistical language or perspective is successful at this task, I will briefly discuss the consequences of understanding the role of statistics in this way.

First, similar to Gelman and Hennig (2017), I hold that modeling disagreements ought to be resolved through consensus between modelers rather than by appealing to an external ‘objectivity’ criterion such as the lack of user tuning. Contrary to Gelman and Hennig, however, I do not think transparency or impartiality is the key here. This is because I do not take the statistical language to be the only language of comparison we ought to consider in science, and consequently this language does not need to accomplish every task we set out to do. That is, the statistical rendition of scientific theories does not need to capture everything that is valuable in those theories, nor does it need to be able to give the final verdict on their comparison. When modelers disagree, what they need to do is to settle on one aspect of the theories that is important and worth investigating and model the theories on that aspect, instead of trying to find ways to faithfully represent theories on all aspects.

Second, similar to Cox (1990) and Hand (1994), I agree that a statistical model needs to be evaluated in reference to the specific purpose for which it is constructed. Nevertheless, to adhere to this principle too strictly is, in my opinion, to undercut the potential power of statistical analysis. When a predictive model is successful in unexpected ways, it is difficult to resist the temptation of looking at the predictive variables involved and wondering whether they bear causal relationships with the outcome variables. Although I reject the view that statistical models are justified by their empirical adequacy, the perspectivist picture I have been endorsing still treats statistical models as representational. That is, the point of statistical modeling here is to present a good representation of the world rather than merely to lead to useful predictions.

Finally, let me return to the question of false assumptions once more. Throughout this article, I have been talking about true or false assumptions in a commonsensical way. When we model a discrete variable as continuous, a heteroscedastic variable as homoscedastic, a nonnormal distribution as normal, we have built a model on false assumptions. Yet the language of discrete variables or normal distributions is statistical language. It is not a straightforward empirical description that we can compare with the world to check for accuracy. We can know, for example, whether it is true that we have randomly sampled from a population under some operational definition of ‘random.’ To link the truth or falsity of this claim with the modeling assumption that the resulting sample is IID requires an extra step of translation. Indeed, de Finetti’s exchangeability can be seen as an alternative translation (Craiu et al., 2023).

What I hope to have shown in this article is that statistical assumptions are not justified by the empirical adequacy of their canonical empirical translations. What this also means is that there is room to approach this translation proactively and creatively. If we are to understand statistical assumptions as prescribing a perspective for theory comparison or knowledge generation, then we must take seriously their potential to change, rather than merely describe, how we understand the world.


Acknowledgments

The author wishes to thank the editor Xiao-Li Meng and several anonymous reviewers for helpful feedback and pointers to literature. The author also wishes to thank Kevin Kadowaki and Greg Lauro for helpful feedback on the draft and an audience at the 8th Bayesian, Fiducial and Frequentist Conference for engaging discussions.

Disclosure Statement

Kino Zhao has no financial or nonfinancial disclosures to share for this article.


References

Aldrich, J. (1995). Correlations genuine and spurious in Pearson and Yule. Statistical Science, 10(4), 364–376. https://doi.org/10.1214/ss/1177009870

Besag, J. (2002). Contribution to the discussion of McCullagh (2002). The Annals of Statistics, 30(5), 1267–1277.

Breen, R., Karlson, K. B., & Holm, A. (2018). Interpreting and understanding logits, probits, and other nonlinear probability models. Annual Review of Sociology, 44(1), 39–54. https://doi.org/10.1146/annurev-soc-073117-041429

Breiman, L. (2001). Statistical modeling: The two cultures. Statistical Science, 16(3), 199–215. https://doi.org/10.1214/SS%2F1009213726

Cartwright, N. (2007). Are RCTs the gold standard? BioSocieties, 2(1), 11–20. https://doi.org/10.1017/S1745855207005029

Chang, H. (2020). Pragmatism, perspectivism, and the historicity of science. In M. Massimi & C. D. McCoy (Eds.), Understanding perspectivism (pp. 10–27). Routledge.

Cox, D. R. (1990). Role of Models in Statistical Analysis. Statistical Science, 5(2), 169–174.

Cox, D. R. (2006). Principles of statistical inference. Cambridge University Press. https://doi.org/10.1017/CBO9780511813559

Craiu, R. V., Gong, R., & Meng, X.-L. (2023). Six statistical senses. Annual Review of Statistics and Its Application, 10, 699–725. https://doi.org/10.1146/annurev-statistics-040220-015348

Davies, P. L. (1995). Data features. Statistica Neerlandica, 49(2), 185–245. https://doi.org/10.1111/j.1467-9574.1995.tb01464.x

de Heide, R., & Grünwald, P. D. (2021). Why optional stopping can be a problem for Bayesians. Psychonomic Bulletin & Review, 28(3), 795–812. https://doi.org/10.3758/s13423-020-01803-x

Deaton, A., & Cartwright, N. (2018). Understanding and misunderstanding randomized controlled trials. Social Science & Medicine, 210, 2–21. https://doi.org/10.1016/j.socscimed.2017.12.005

Eronen, M. I., & Romeijn, J.-W. (2020). Philosophy of science and the formalization of psychological theory. Theory & Psychology, 30(6), 786–799. https://doi.org/10.1177/0959354320969876

Gelman, A., & Hennig, C. (2017). Beyond subjective and objective in statistics. Journal of the Royal Statistical Society: Series A (Statistics in Society), 180(4), 967–1033. https://doi.org/10.1111/rssa.12276

Goldenberg, M. J. (2006). On evidence and evidence-based medicine: Lessons from the philosophy of science. Social Science & Medicine, 62(11), 2621–2632. https://doi.org/10.1016/j.socscimed.2005.11.031

Hand, D. J. (1994). Deconstructing statistical questions. Journal of the Royal Statistical Society: Series A (Statistics in Society), 157(3), 317–356. https://doi.org/10.2307/2983526

Hennig, C. (2010). Mathematical models and reality: A constructivist perspective. Foundations of Science, 15(1), 29–48. https://doi.org/10.1007/s10699-009-9167-x

Hennig, C. (2023). Probability models in statistical data analysis: Uses, interpretations, frequentism-as-model. In B. Sriraman (Ed.), Handbook of the history and philosophy of mathematical practice (pp. 1–49). Springer. https://doi.org/10.1007/978-3-030-19071-2_105-1

Huber, P. J. (2002). Contribution to the discussion of McCullagh (2002). The Annals of Statistics, 30(5), 1289–1292.

Lawler, I., & Zimmermann, G. (2021). Misalignment between research hypotheses and statistical hypotheses: A threat to evidence-based medicine? Topoi, 40(2), 307–318. https://doi.org/10.1007/s11245-019-09667-0

Kalman, R. (2002) Contribution to the discussion of McCullagh (2002). The Annals of Statistics, 30(5), 1292–1294.

Massimi, M. (2018a). Four kinds of perspectival truth. Philosophy and Phenomenological Research, 96(2), 342–359. https://doi.org/10.1111/phpr.12300

Massimi, M. (2018b). Perspectival modeling. Philosophy of Science, 85(3), 335–359. https://doi.org/10.1086/697745

Massimi, M. (2021). Realism, perspectivism, and disagreement in science. Synthese, 198(S25), 6115–6141. https://doi.org/10.1007/s11229-019-02500-6

Massimi, M. (2022). Perspectival realism. Oxford University Press.

Mayo, D. G., & Spanos, A. (2011). Error Statistics. In P. S. Bandyopadhyay, & M. R. Forster (Eds.), Philosophy of Statistics (pp. 153–198). Elsevier. https://doi.org/10.1016/B978-0-444-51862-0.50005-8

McCullagh, P. (2002). What is a statistical model? The Annals of Statistics, 30(5), 1225–1310. https://doi.org/10.1214/aos/1035844977

Parker, W. S. (2006). Understanding pluralism in climate modeling. Foundations of Science, 11(4), 349–368. https://doi.org/10.1007/s10699-005-3196-x

Porter, T. M. (1995). Trust in numbers: The pursuit of objectivity in science and public life. Princeton University Press.

Powers, D., & Xie, Y. (2008). Statistical methods for categorical data analysis. Emerald Group Publishing.

Rouder, J. N. (2014). Optional stopping: No problem for Bayesians. Psychonomic Bulletin & Review, 21(2), 301–308. https://doi.org/10.3758/s13423-014-0595-4

Rubin, H. (2016). The phenotypic gambit: Selective pressures and ESS methodology in evolutionary game theory. Biology & Philosophy, 31(4), 551–569. https://doi.org/10.1007/s10539-016-9524-4

Shao, J. (1997). An asymptotic theory for linear model selection. Statistica Sinica, 7(2), 221–264.

Stone, M. (1997). Contribution to the discussion of Shao (1997). Statistica Sinica, 7(2), 252–254.

Tal, E. (2016). Making time: A study in the epistemology of measurement. The British Journal for the Philosophy of Science, 67(1), 297–335. https://doi.org/10.1093/bjps/axu037

van Dongen, N. N. N., van Doorn, J. B., Gronau, Q. F., van Ravenzwaaij, D., Hoekstra, R., Haucke, M. N., Lakens, D., Hennig, C., Morey, R. D., Homer, S., Gelman, A., Sprenger, J., & Wagenmakers, E.-J. (2019). Multiple perspectives on inference for two simple statistical scenarios. The American Statistician, 73(sup1), 328–339. https://doi.org/10.1080/00031305.2019.1565553

van Oers, K., & Sinn, D. L. (2011). Toward a basis for the phenotypic gambit: Advances in the evolutionary genetics of animal personality. In M. Inoue-Murayama, S. Kawamura, & A. Weiss (Eds.), From genes to animal behavior: Social structures, personalities, communication by color (pp. 165–183). Springer Japan. https://doi.org/10.1007/978-4-431-53892-9_7

Zhang, P. (1997). Contribution to the discussion of Shao (1997). Statistica Sinica, 7(2), 254–258.


©2025 Kino Zhao. This article is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the article.

Comments
0
comment
No comments here
Why not start the discussion?