Poor statistical practices in a leading neuroscience journal

Earlier this year, I was asked to review a manuscript for the Journal of Neurophysiology. I was struck by the use of the standard error of the mean (SEM) to summarize data variability, the selective reporting of exact (e.g., p=0.067) and non-exact (p<0.05) p-values and the interpretation of non-significant results (e.g., p=0.067) as statistically significant.
Because the Journal of Neurophysiology refers authors to an article by Curran-Everett & Benos (2004) that contains Guidelines for reporting statistics, I was curious how prevalent the questionable statistical practices I noted were in the Journal of Neurophysiology. So I took on the somewhat tedious task of auditing all papers published in the Journal of Neurophysiology in 2015.
Audit results
Sadly, 65% of papers published in 2015 reported the SEM to (presumably) summarize data variability, while another 12.5% of papers included figures with error bars that were not defined (!). I say presumably because the SEM does not summarize variability of the sampled data; rather, it estimates how well the mean of the current sample approximates or captures the population mean. In this way, the SEM is an inferential summary statistic that can be used to test hypotheses. Unfortunately, most authors do not report the SEM (or the closely related 95% CI) for this reasons, they report it because plotted error bars are smaller.
There is growing awareness that p-values are fickle and often uninformative. Nevertheless, when reported, p-values should be exact. In the manuscript I reviewed for the Journal of Neurophysiology, the authors reported non-exact p-values for most results (e.g., p<0.05). As stated in Curran-Everett & Benos (2004), when reported, p-values should be exact. When authors report p<0.05, it is unclear whether the p-value is p=0.013 or p=0.049 or p=0.00001. In my audit of published papers, only 58% consistently reported exact p-values.
Although the paper I reviewed for the Journal of Neurophysiology tended to report non-exact p-values, the authors did report exact p-values when they were nearing significance (I cringe just writing this expression). Even worse, the authors chose to present and interpret p=0.067 as statistically significant! Other similar p-values were presented as statistical trends (ugh!). Unfortunately, these authors are not the only ones to spin results in this way. Of the papers audited that reported p-values between 0.05-0.1, 57% of them interpreted these p-values as statistically significant or statistical trends. A rather shocking finding.
Formal response from the editor
Professor Bill Yates was kind enough to pen a formal reply to my Letter to the Editor. In his reply, Professor Yates explains that the Guidelines provided by Curran-Everett & Benos (2004) were viewed as recommendations rather than requirements. The fact that these Guidelines are not enforced is exemplified by the fact that in the Instructions to Authors for the Journal of Neurophysiology, the section describing how tables should be formatted states that “Statistical measures of variations, SD, SE, etc., must be identified. (Example: ‘Values are means ± SE.’)”.
It is unfortunate that Professor Yates did not take this opportunity to turn these Guidelines into formal requirements. As it stands, it remains at the discretion of the editor whether or not the Guidelines provided by Currant-Everett & Benos (2004) are enforced, and as my audit reveals, the majority of editors seem to be satisfied with inappropriate and undefined error bars, non-exact p-values and non-statistical results being discussed as if they were statistically significant.
Informal response from the editor and fallacy that ignorance is to blame
In a personal correspondence, Professor Yates indicated that poor statistical practices were likely not intentional, but rather the result of poor statistical training. However, this view is likely false. Several months after I received the initial request to review the manuscript in question from the Journal of Neurophysiology, the authors submitted a revised version of their manuscript, presumably addressing the comments of reviewers.
In their response to my comments, the authors explained they changed the SEM to SD in text, but kept the SEM in figures because this was the convention for the Journal of Neurophysiology! Who cares about scientific and statistical correctness; others are doing it so I think that is a good enough reason for me to do it too! Similarly, the authors revised the manuscript to clarify that p-values in the 0.05-0.1 range were statistical trends rather than statistically significant, thus trading one statistical error for another. Furthermore, the authors still discussed these statistical trends as if they were significant. Truly frustrating. As always, in my review I invited the authors to reference a statistical textbook or article to justify such practices; as always, no reference was provided.
This is one of many examples I have that the problem is not statistical training. The pressure to publish and succeed in science is great, and questionable statistical practices are used to make results look cleaner and more convincing. My audit revealed that poor statistical practices are not an isolated problem and unless journals lead the way with simple, implementable requirements, such practices will thrive.
Reference
Curran-Everett D, Benos DJ (2004). Guidelines for reporting statistics in journals published by the American Physiological Society. Am J Physiol Regul Integr Comp Physiol 287, R247–R249.
Héroux ME (2016). Inadequate reporting of statistical results. J Neurophysiol 116, 1536-1537.
Yates BJ (2016). Strategies to Increase Rigor and Reproducibility of Data in Manuscripts: Reply to Héroux. J Neurophysiol 116, 1538.