The smoke and mirrors of plotting summary statistics
Every scientist has, at one point or another in their career, plotted results using bar graphs or dot plots with error bars. As pointed in a previous post, these kinds of summary graphs can be misleading, especially since a depressingly large number of scientists plot their error bars as the standard error of the mean (SEM) rather than the standard deviation (SD) or, when appropriate, the 95% confidence interval. As pointed out in that same post, a key problem with summary graphs is that they conceal the nature of the underlying data. Thus, scientists are increasingly encouraged to plot the data used to compute the summary statistics in summary graphs (e.g., Drummond & Vowler, 2011).
Plotting and interpreting results from experiments with repeated-measures
A plot of summary statistics calculated from related data points will reflect the average behaviour across subjects (or samples). However, when a plot reveals a pattern over time, humans may erroneously conclude, consciously or not, that each subject or sample used to compute these summary statistics follow the same overall pattern.
The figure below shows results where 10 subjects received an intervention at time point zero, and measures (presented in arbitrary units) were taken for 13 minutes after the intervention.
Notice that the mean value at each time point is the same for each subplot. Also, the error bars on figures A and B are the same, and those on figures C and D are also the same. Imagine yourself coming across figures A and C in a scientific paper, what question would you ask yourself? What would be your best guess at what the underlying data looks like? What would be your best guess at the measure of variance used in the figures?
SEM vs SD.
The reason the error bars are smaller in figures A and B is because they are SEMs. The SEM corresponds to the SD divided by the square-root of the sample size, so with a sample size of 10 the SEM will be roughly 3 times smaller than the SD. As you might expect, the error bars in figures C and D correspond to SDs. An alarming number of scientists erroneously report SEMs, either because that is what they were taught to do, or because they are well aware that SEMs much smaller than SDs, and thus make the data appear to be less variable.
Plotting individual sample data.
By also plotting the data from each of the 10 subjects, it is clear that figures A and C are from one experiment and figure B and D are from another. But remember, the mean values at each time point are the same for all figures, and the error bars are identical in figures A and B, as well as figures C and D. Consequently, plotting only summary mean and error bars, without plotting the individual data, masks the underlying patterns in the data.
Let’s assume that the authors reported that there was a main effect of time and that the intervention led to a significant reduction in response amplitude at some of the time points. Would you interpret the results of figure C and D any differently? What additional information does plotting individual subject data provide? Which outcome measure do you think is more reliable, the measure reported in experiment A or the measure reported in experiment B? Which type of error bar best represents the underlying data?
Reporting results visually can be a highly effective way of transmitting lots of rich and informative details. However, the opposite is also true: reporting results visually can be a highly effective way of transmitting a small amount of uninformative generalities. Readers, reviewers and editors should hold authors of scientific papers to a higher standard.
Let the data tell the story, rather than have the authors spin it for us.
Drummond GB, Vowler SL (2011). Show the data, don’t conceal them. J Physiol 589:1861-3.