Editor-in-Chief’s Note: The value of hypothesis testing, and the frequent misinterpretation and misuse of p-values as a cornerstone of statistical methodology, continues to be debated and investigated, including in articles published in HDSR (e.g., Mayo, 2020; Benjamini 2020). In 2019, the President of the American Statistical Association (ASA), Karen Kafadar, convened a Task Force to write a succinct statement about the proper use of statistical methods in scientific studies, specifically hypothesis tests and p-values, and their connections to replicability. The Task Force spent a significant amount of time to consider and debate about a wide range of opinions and suggestions, especially those from esteemed scientists in multiple disciplines and from ASA Board. A unanimously-agreed Task Force statement was released recently and sent to multiple journals to inform the broad community of (data) scientific inquiries. It is therefore reprinted here with permission of the Institute of Mathematical Statistics, which published the statement first (with an accompanying editorial from Kafadar), in The Annals of Applied Statistics.

Over the past decade, the sciences have experienced elevated concerns about replicability of study results. An important aspect of replicability is the use of statistical methods for framing conclusions. In 2019 the President of the American Statistical Association (ASA), Karen Kafadar, established a task force to address concerns that a 2019 editorial in The American Statistician (an ASA journal) might be mistakenly interpreted as official ASA policy. (The editorial recommended eliminating the use of “p < 0.05” and "statistically significant" in statistical analysis.) This document is the statement of the task force, and the ASA invited us to publicize it. Its purpose is two-fold: to clarify that the use of P-values and significance testing, properly applied and interpreted, are important tools that should not be abandoned, and to briefly set out some principles of sound statistical inference that may be useful to the scientific community.

P-values are valid statistical measures that provide convenient conventions for communicating the uncertainty inherent in quantitative results. Indeed, P-values and significance tests are among the most studied and best understood statistical procedures in the statistics literature. They are important tools that have advanced science through their proper application.

Much of the controversy surrounding statistical significance can be dispelled through a better appreciation of uncertainty, variability, multiplicity, and replicability. The following general principles underlie the appropriate use of P-values and the reporting of statistical significance and apply more broadly to good statistical practice.

Capturing the uncertainty associated with statistical summaries is critical. Different measures of uncertainty can complement one another; no single measure serves all purposes. The sources of variation that the summaries address should be described in scientific articles and reports. Where possible, those sources of variation that have not been addressed should also be identified.

Dealing with replicability and uncertainty lies at the heart of statistical science. Study results are replicable if they can be verified in further studies with new data. Setting aside the possibility of fraud, important sources of replicability problems include poor study design and conduct, insufficient data, lack of attention to model choice without a full appreciation of the implications of that choice, inadequate description of the analytical and computational procedures, and selection of results to report. Selective reporting, even the highlighting of a few persuasive results among those reported, may lead to a distorted view of the evidence. In some settings this problem may be mitigated by adjusting for multiplicity. Controlling and accounting for uncertainty begins with the design of the study and measurement process and continues through each phase of the analysis to the reporting of results. Even in well-designed, carefully executed studies, inherent uncertainty remains, and the statistical analysis should account properly for this uncertainty.

The theoretical basis of statistical science offers several general strategies for dealing with uncertainty. P-values, confidence intervals and prediction intervals are typically associated with the frequentist approach. Bayes factors, posterior probability distributions and credible intervals are commonly used in the Bayesian approach. These are some among many statistical methods useful for reflecting uncertainty.

Thresholds are helpful when actions are required. Comparing P-values to a significance level can be useful, though P- values themselves provide valuable information. P-values and statistical significance should be understood as assessments of observations or effects relative to sampling variation, and not necessarily as measures of practical significance. If thresholds are deemed necessary as a part of decision-making, they should be explicitly defined based on study goals, considering the consequences of incorrect decisions. Conventions vary by discipline and purpose of analyses.

In summary, P-values and significance tests, when properly applied and interpreted, increase the rigor of the conclusions drawn from data. Analyzing data and summarizing results are often more complex than is sometimes popularly conveyed. Although all scientific methods have limitations, the proper application of statistical methods is essential for interpreting the results of data analyses and enhancing the replicability of scientific results.

“The most reckless and treacherous of all theorists is he who professes to let facts and figures speak for themselves, who keeps in the background the part he has played, perhaps unconsciously, in selecting and grouping them.” (Alfred Marshall, 1885)

Disclosure Statement

Yoav Benjamini, Richard De Veaux, Bradley Efron, Scott Evans, Mark Glickman, Barry I. Graubard, Xuming He, Xiao-Li Meng, Nancy Reid, Stephen M. Stigler, Stephen B. Vardeman, Christopher K. Wikle, Tommy Wright, Linda J. Young, and Karen Kafadar have no financial or non-financial disclosures to share for this article.

Supplementary File

Task Force Bios.docx

28 KB

©2021 Yoav Benjamini, Richard De Veaux, Bradley Efron, Scott Evans, Mark Glickman, Barry I. Graubard, Xuming He, Xiao-Li Meng, Nancy Reid, Stephen M. Stigler, Stephen B. Vardeman, Christopher K. Wikle, Tommy Wright, Linda J. Young, and Karen Kafadar. This article is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the article.

ASA President’s Task Force Statement on Statistical Significance and Replicability

Disclosure Statement

Supplementary File