Fixing statistics

Statistics are not broken. The problem lies with how scientists use, interpret and report them.
At the end of 2017, Nature asked five influential statisticians to recommend a key change that would improve science.
Adjust for human cognition
The first change was proposed by Jeff Leek of the Johns Hopkins School of Public Health, and it is based on the belief that humans are at the root of the problem. Leek believes researchers must study how scientists analyse and interpret data, and apply these results to prevent cognitive mistakes.
Leek argues we cannot blame the ever increasing size and availability of datasets and the lack of adequate statistical training. Similarly, banning p-values would not solve the problem.
Leek believes we know very little about how scientists analyse and process information. Computer programs and statistical software can crunch numbers, but in the end we–the scientists–play an important role in data analysis. Thus, information is needed on how researchers collect, manipulate, analyse, communicate and consume data. Armed with this knowledge, statistics and data analysis education can be improved.
Abandon statistical significance
The second change was proposed by Blakeley McShane and Andrew Gelman from Northwestern University and Columbia University. Their view is that publication bias and the hunt for statistical significance encourages scientists to investigate so many analysis paths that whatever appears in papers is an unrepresentative selection of data.
Rather than tighten the threshold, for example setting the critical threshold to p=0.001, McShane and Gelman believe thresholds should be dropped completely. The idea is not to ban p-values, but consider them as just one piece of evidence among many (e.g. prior knowledge, plausibility of mechanism, study design, data quality). According to them, it is time to move beyond binary states of an effect
or no effect
, based on a fickle p-value. Scientists must accept uncertainty and acknowledge variation in their data and results.
State false-positive risk, too
David Colquhoun from University College London believes that what actually matters is the probability that a statistically significant
result is in fact false. This probability, the false-positive risk, is always bigger that the associated p-value.
Colquhoun explains that the probability that a significant
effect is in fact false depends in large part on the plausibility of the hypothesis before an experiment is done (the prior probability of there being a real effect). The problem is that most scientists have no way of knowing this prior probability.
The best solution is to specify what prior probability would be required to achieve a false-positive risk of 0.05, and also report the associated p-value and confidence interval. Alternatively, scientists could assume a prior probability of 0.5 and compute the minimum false-positive risk for the associated p-value. While combining conventional statistics with Bayes’s theorem can be a powerful tool, the added complexity may be too much for some.
Share analysis plans and results
Michèle Nuitjen of Tilburg University believes it will never be possible to come up with a set of rules to improve statistical practice because there are too many situations to account for. Her view is that scientists who hunt hard enough will turn up a result that fits statistical criteria, but their discovery will probably be a false positive.
Nuitjen argues that planning and openness can help avoid these false positive results. Key to this approach is the pre-registration of analyses plans. Importantly, this does not preclude exploratory analyses from taking place, simply that these should reported as such when published. Increases transparency would also involve sharing all data and analyses, as well as all relevant computer code.
Change norms from within
The final change was proposed by Steven Goodman of Stanford University. Goodman’s view is that many scientists want just enough knowledge to run statistical software that allows them to get their papers out quickly, and looking like all others in their field. Norms are established within communities partly through methodological mimicry. Quoting a notable paper on how systems change, Goodman’s states “Culture will trump rules, standards and control strategies every single time”.
Because each scientific field and sub-field is a community, no single approach can address all problems. Change must come from all fronts: funders, journal and leaders in the various sub-disciplines. Change can lead to change. Scientists will follow practices they see in publications; peer reviewers will demand what other reviewers demand of them.
As pointed out by Goodman, many young scientists are demanding change and field leaders much encourage efforts to properly train the next generation and re-train the existing one.
Conclusion
Five influential statisticians, five different solutions. None of the recommendations are mutually exclusive, and all have the potential to cause a positive change in how statistics are used by scientists. Which will you implement?
Reference
Five ways to fix statistics. Leek J, McShane B, Gelman A, Colquhoun D, Nuijten M, Goodman S. Nature. 551:557-559.