Why p values aren’t necessary: They don’t tell us whether hypotheses are true
Many scientists and researchers have recently called on the scientific community to stop using p values to support or refute scientific hypotheses. For example, the statisticians Amrhein, Greenland and McShane along with 800 signatories pushed to retire statistical significance, and one psychology journal banned NHST from the journal. These and other events generated much discussion, and not everyone agrees. Who is right?
In a great book chapter, psychologists Frank Schmidt and John Hunter describe common arguments in support of p values, and explain why these arguments are wrong. Perhaps the most important reason that p values are unnecessary is that they do not tell us what we want to know. Ultimately, we want to know the truth of a hypothesis based on the data, but the p value cannot tell us whether a hypothesis is true.
A p value only shows the probability of the data given a hypothesis, which is different to the probability of the hypothesis given the data
In doing research, we want to answer a research question based on the data. We run experiments to test ideas of what we might expect to find. Ultimately, we want to know *how probable a hypothesis is, based on the data.
In a test of statistical significance, the data are used to calculate a certain probability (the p value). The conventional way to interpret this probability is that a p value smaller than a threshold probability (e.g. 0.05, or 5%) indicates the finding is statistically significant. While it is tempting to think that the p value tells us the probability that our hypothesis is true, given the data, it does not. Instead, the p value tells us the probability of observing the data, given we accept the null hypothesis as true. These two probabilities are not the same, and need not behave in the same way. In fact, they can be wildly different. Much like the probability that a man is the president, versus the probability that the president is a man.
In fact, if we already accept the null hypothesis as true, it is pointless to determine whether it is true or not. So, thinking that tests of significance tell us about the truth of hypotheses just defeats the purpose of doing research at all.
Note. What we really want to know is whether an effect is large or small, and if it was estimated with precision. This is the basis of estimation statistics, which many people have written about (e.g. Cumming and Calin-Jageman 2017). The Research Concepts series summarizes this and how precision about estimates is interpreted.
How is the misconception of p values expressed?
Researchers commonly mistake the two probabilities above to mean the same thing. This is reflected in the mistaken view that tests of statistical significance are needed to know whether findings are real or just due to chance. Schmidt and Hunter provide quotes from researchers that express these mistaken ideas. For example:
> Null hypothesis significance testing does serve a useful purpose. In conducting experiments, the first hypothesis we need to rule out is that the difference or effect is zero. That is, we must, as the fist step, rule out the possibility that the result is something that could commonly occur by chance when the null hypothesis that the difference is zero is true.
> We need to be able to separate findings that are real from those that are just chance events. How can a researcher know whether the two means are (or two correlations) really different if he or she does not test to see if they are significantly different?
Both these quotes indicate the mistaken belief that p values indicate the truth of hypotheses. This is perhaps the most common misunderstanding of tests of statistical significance, and the one that is most deeply-rooted psychologically. As researchers, we do need to distinguish true effects from noise. But wanting the p value to tell us about the truth of hypotheses simply does not make it happen. If you can appreciate this point, you already understand more about p values than many practicing researchers.
Cohen J (1994) The earth is round (p < .05). American Psychologist 49: 997-1003.
Schmidt FL and Hunter JE (2016) Eight common but false objections to the discontinuance of significance testing in the analysis of research data. In What If There Were No Significance Tests?. Harlow LL, Mulaik SA and Steiger JH (Eds). Routledge: New York.
Herbert R (2019) Research note: Significance testing and hypothesis testing: meaningless, misleading and mostly unnecessary. Journal of Physiotherapy 65: 178-181.