Power failure in the neurosciences
There is evidence that many (and possibly most) of the conclusions drawn from biomedical research are probably false (Ioannidis, 2005). In a 2013 Nature Reviews Neuroscience paper entitled “Power failure: why small sample size undermines the reliability of neuroscience”, Button et al. explained how low statistical power is partly to blame for a similar issue in the field of neuroscience. Low statistical power, which results from small sample sizes, small effects, or both, reduces the likelihood that statistically significant findings actually reflect true effects.
The basic logic of this problem is fairly simple. Researchers must publish in order to succeed, and publishing is a competitive endeavour. Because there is a bias to publish significant and clean results, researchers (knowingly or unknowingly) use practices that often include flexible study designs, flexible statistical analyses and small sample sizes to obtain quick, publishable results. Unfortunately, these practices lead to increased false-positive findings.
Low statistical power
Low statistical power means that the ability of a study to detect genuinely true effects is low. As stated by Button et al., studies with low power produce more false negative results than high-powered studies. For example, studies should be powered at least at 80% to detect effects, but if studies in a given field are designed with a power of 20%, it means that if there are 100 genuinely non-null effects to be discovered, these studies will discover only 20 of them; the other 80 non-null effects will be missed and be reported as non-significant results (and likely end up in the file-drawer). Additionally, the lower the power of a study, the lower the probability that an observed effect that passes the significance threshold actually reflects a true effect. This is because only extremely large values reach significance when samples are small, and a sample of extreme values may have been drawn from a population where there is no true effect. Finally, even when an underpowered study discovers a true effect, the magnitude of the effect is likely to be exaggerated.
More to the point, when statistical power is low:
- The probability of discovering a true effect is low.
- Significant effects may not reflect true effects.
- True significant effects are likely inflated.
Pretty scary stuff.
The consequences of underpowered studies do not end there. Some researchers base their ideas on the sample size of previous studies to determine the sample size of their own studies. This can be problematic if the original study was underpowered and the published effect was false or exaggerated; a follow-up study with the same sample size will, in most cases, find a smaller or absent effect. Also, when researchers perform sample size calculations, they often base these calculations on published values. Once again, this is problematic because published effects from studies with small sample sizes are often exaggerated.
Does neuroscience have a problem with low statistical power
Based on a review of meta-analyses covering various areas of neuroscience, Button et al found that the median power of studies was 20%, with range likely no more than between 8% and 31%.
What can be done?
Although Button et al paint a somewhat bleak picture, they do make suggestions of what researchers can do to remedy the problem of low statistical power:
- Perform an a priori sample size calculation. Use existing literature to estimate size of the effect and try to power studies sufficiently to detect effects. Remember that larger samples provide more precise estimates of population parameters. If time or financial constraints result in an underpowered study, make it clear and acknowledge this limitation in the interpretation of the results.
- Disclose methods and findings transparently. If intended analysis results in non-significant findings and you go on to explore the data, report the results as exploratory and do not hide the non-significant results in the file drawer.
- Pre-register your study protocol and analysis plan. Pre-registration clarifies whether analyses are confirmatory or exploratory.
- Make study materials and data available. This improves the quality of studies aimed at replicating and extending research findings. Providing raw data also allows for results to be confirmed and included in meta-analyses.
- Work collaboratively to increase power and replicate findings. Combining data increases the total sample size and improves the use of resources.
Ioannidis JP (2005). Why most published research findings are false. PLoS Med 2: e124.
Button KS, Ioannidis JP, Mokrysz C, Nozek BA, Flint J, Robinson SJ, Munafo MR (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14: 365-376.