Common statistical mistakes when writing or reviewing manuscripts

Contributing to journal peer review is a good way to observe and mitigate the research conducted in a scientific field, and contribute to the growth of knowledge. I have peer reviewed for some years, and assessing manuscripts for publication now comes more easily. As a peer reviewer, I think it curious how simple statistical oversights are common at submission. As a reader, it is even more curious how many such oversights seem to make it past peer and editorial review. How might reviewers and investigators look out for common statistical oversights and avoid these?

In a recent eLife paper, Makin and Orban de Xivry address this problem and highlight common statistical mistakes to avoid when writing or reviewing manuscripts. The authors helpfully summarise each type of mistake, describe how to detect it, and outline potential solutions for researchers. Let’s summarise 3 key ones:

Absence of a control condition/group

This mistake often presents in pre-test, post-test studies. For example, investigators examine the effect of an intervention or exposure by repeatedly measuring an outcome over time from subjects, then attribute changes in the outcome to the intervention or exposure. The problem is that changes in an outcome over time may occur simply because people naturally recover, measures of the outcome are noisy, or statistical artefacts such as regression to the mean occurs. This implies that conclusions about interventions or exposures in a pre-post design can be inaccurate without an appropriate control.

To prevent such errors, investigators should also measure changes over time in a control condition or group, then compare outcomes between the conditions or groups. A useful type of analysis is an analysis of covariance (ANCOVA) where subjects in each group act as their own controls, but a between-group comparison is performed. If the experimental design does not permit inclusion of a control condition or group, then conclusions on effects of exposures need to be more conservative and nuanced.

Interpreting comparisons between two conditions/groups without directly comparing them

It’s bizarre how often this mistake occurs. Imagine a study where investigators obtain outcome measures before and after treatment, in an intervention and control group. They observe a change in outcome from baseline to follow-up in the intervention group, but no change in outcome in the control group. They then conclude that the intervention effectively changed the outcome, even though no between-group test was performed.

Statisticians have written about this problem and I have also summarized here. I know of no introductory statistics class that teaches students to conclude between-group effects with no between-group test. Where do people learn this stuff??

Makin and Orban de Xivry explain this problem in a different way; they show how within-group correlations do not imply differences in correlations between groups. Figure 1 from their paper and accompanying figure legend are reproduced (CC BY 4.0):

Figure 1. Interpreting comparisons between two effects without directly comparing them. (A) Two variables, X and Y, were measured for two groups A and B. It looks clear that the correlation between these two variables does not differ across these two groups. However, if one compares both correlation coefficients to zero by calculating the significance of the Pearson’s correlation coefficient r, it is possible to find that one group (group A; black circles; n = 20) has a statistically significant correlation (based on a threshold of p<=0.05), whereas the other group (group B, red circles; n = 20) does not. However, this does not indicate that the correlation between the variables X and Y differs between these groups. Monte Carlo simulations can be used to compare the correlations in the two groups (Wilcox and Tian, 2008). (B) In another experimental context, one can look at how a specific outcome measure (e.g. the difference pre- and post-training) differs between two groups. The means for groups C and D are the same, but the variance for group D is higher. If one uses a one-sample t-test to compare this outcome measure to zero for each group separately, it is possible to find that, this variable is significantly different from zero for one group (group C; left; n = 20), but not for the other group (group D, right; n = 20). However, this does not inform us whether this outcome measure is different between the two groups. Instead, one should directly compare the two groups by using an unpaired t-test (top): this shows that this outcome measure is not different for the two groups. Code (including the simulated data) available at github.com/jjodx/InferentialMistakes (Makin and Orban de Xivry, 2019; https://github.com/elifesciences-publications/InferentialMistakes). DOI: https://doi.org/10.7554/eLife.48175.002

Suffice to say, if the aim is to make conclusions on between-group differences, do between-group tests.

Over-interpreting non-significant results

This mistake can also be called “spin”, in which conclusions are presented such that they misleadingly imply favourable effects when in fact, there are none. For example, authors can write about an effect that “trends towards statistical significance” or similar. There is much discussion around the difficulties of interpreting statistically significant and non-significant findings, especially when studies are often underpowered (as is common in neuroscience and physiology). Moreover, p values do not indicate how large or small the difference in outcomes are, and whether these differences are meaningful.

Investigators are encouraged to report sizes of effects and their precision (e.g. mean differences and 95% CI) so reviewers and readers can judge how large and how precisely effects have been estimated. When you peer review and all you see are p values, ask authors to report sizes of effects and 95% CI, at least for the key outcomes.

Summary

Statistical oversights are common and should be addressed by authors, peer reviewers and editors. Three common mistakes are highlighted here. The scientific community is encouraged to detect, address and avoid these and other common mistakes. If we act together, we can help improve the quality of published research.

Reference

Makin TR and Orban de Xivry JJ (2019) Ten common statistical mistakes to watch out for when writing or reviewing a manuscript. eLife 8:e48175.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s