Why showing raw data matters

Bar and line graphs with standard deviation (SD) or standard error (SE) error bars are often used to visually present continuous data from laboratory studies with small sample sizes. Last year the journal PloS Biology published a paper showing that presenting such data this way is problematic for the following reasons:

  1. bar and line graphs conceal how data are distributed
  2. summarising small datasets with mean and SD or SE error bars conceals outliers and forces readers to assume data are normally distributed
  3. paired or non-independent data cannot be accurately visualised with bar and line graphs

Figure 1 below, reproduced from the paper, demonstrates some of these reasons. Data points from four groups of participants are presented in bar graphs showing mean ± SE (panel A), or mean ± SD (panel B), or in a scatterplot (panel C). Panel A seems to show that participants in group 2 have higher data values compared to participants in other groups, however SD’s in panel B show there is considerable overlap in data values between groups. That is, presenting SE error bars magnifies an apparent visual difference between groups, but SE’s measure variation of the mean, not variation of the sample data. In contrast, a scatterplot of the raw data (panel C) clearly shows sample sizes are small with group 2 have the smallest sample size, there is substantial variability of the data in group 1, and group 3 has an outlier. None of these characteristics are evident in the bar graphs in panels A and B.


Figure 1:

 bargraph-scatterplot

 

Showing raw data (e.g. with a scatterplot) is needed so readers may critically assess the authors’ interpretation and conclusions of the data, and opens the findings to robust discussion. Showing raw data is important whether sample size is large (such as data used in the recent correlation analysis post) or small; inspection of raw data is probably even more important when sample size is small.

Reference

Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4): e1002128.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s