Minimising false positives and false negatives in research
In December 2016, the Journal of Applied Physiology commenced the series Cores of Reproducibility in Physiology (CORP) to highlight the lack of reproducibliity in physiology research and provide solutions.
In the first CORP article, statistician Dogulas Curran-Everett explains that to improve reproducibility in research, experiments and analyses need to minimise false positive and false negative findings. The false positive rate is controlled by the significance level alpha; we declare we are willing to reject a true null hypothesis 100 alpha % of the time. In contrast, the false negative rate is controlled by how we define study power ie. the probability that we reject the null hypothesis given that the null hypothesis is false. The statistical power of a study changes with the significance level alpha, how variable the underlying population is (standard deviation), the sample size, and the magnitude of the effect we want to find.
Consequently, when judging how well experiments are replicated using hypothesis testing, if the p value from an original experimental finding is 0.05, the probability a replication experiment achieves p < 0.05 is only 50%. Only when the p value from an original experiment is 0.001 does the probablity that a replication experiment achieves p < 0.05 exceed 90%. So, false positive findings are only minimised by setting alpha to more stringent values than the traditional 0.05. In contrast, when judging how well experiments are replicated using estimation (ie. the 95% CI shows how an effect varies in the long run), if an experiment with low power mistakenly rejects the null hypothesis, then the estimate of the magnitude of the effect and the precision about the effect will be exaggerated.
Curran-Everett provides recommendations to scientists to improve reproducibility in research:
- When designing an experiment, estimate sample size so that power approaches 0.90. This minimises false negatives and minimises the probaility of exaggerated true effects.
- Define the significance level alpha to be 0.005 or 0.001. This minimises false positives and increases the probability that replication experiments will find effects with p < 0.05.
- Think less about a simple p value and more about the scientific importance of the confidence interval bounds for an experimental result. Even a convincing p value like 0.005 can be associated with a confidence interval whose bounds indicate a scientific effect that is inconsequential.
- Control for multiple comparisons (ie. testing more than one null hypothesis). This minimises the probability of getting false positives simply by making more than one comparison in a single experiment.
- Rely on repeated studies to accumulate evidence for biological phenomena.
Curran-Everett D (2016) CORP: Minimising the chances of false positives and false negatives. J Appl Physiol doi:10.1152/japplphysiol.00937.2016.