## The limitations of p values

A recent Nature commentary highlighted a statement by the American Statistical Association on principles to guide the use of p values for interpretation of research findings. The statement was issued out of concern for the lack of understanding of p values and what they imply.

Specifically, the 6 principles of the statement are:

- P values can indicate how incompatible the data are with a specified statistical model.
- P values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
- Scientific conclusions and business or policy decisions should not be based only on whether a p value passes a specific threshold.
- Proper inference requires full reporting and transparency.
- A p value, or statistical significance, does not measure the size of an effect or the importance of a result.
- By itself, a p value does not provide a good measure of evidence regarding a model or hypothesis.

The biggest limitations of p values come from points 2 and 5. The formal definition of the p value is “the probability of observing the results at least as extreme as the results observed, given the null hypothesis is true”. To be honest, I have always found this definition a mouthful, difficult to interpret, and frankly, not even scientists can easily explain p values.

The best way I can frame the problems of p values is that p gives the probability of *observing the data*, given the null hypothesis is true. But after an experiment is done, what we are really interested in is the probability of whether *the hypothesis is true*, given these data were observed. These two probabilities are quite different, sort of like comparing the probability that a man is the president vs. the probability that the president is a man.

The other problem with p values is they do not indicate how worthwhile or how precise the effect is (point 5). From a previous post, we know that sample size affects estimates of precision of effects. Providing information on effect size and precision (i.e., 95% CI) forces the reader to consider whether the effect observed is big enough, sufficiently precise, and important enough to be worth caring about. This kind of statistics is known as *estimation* (because we are estimating the size of an effect and its precision).

The interpretation of research findings needs to consider effect size and precision, and include information on how the study was conducted, in order to judge how much to believe research findings.