Implying “there’s a trend to statistical significance” is not trendy.

When a p value that fails to reach a threshold is reported, investigators sometimes imply there is a “trend towards statistical significance”. This interpretation expresses the view that if more subjects had been tested, the p value would have become more significant.

Epidemiologists Wood and colleagues examined the probability of how the p value of a treatment effect changes when more data are collected using a randomised controlled trial design. Specifically, they estimated the percentage of p values that become less significant if more data are collected. It turns out that this percentage only depends on the observed p value and the relative amount of extra data. What did they find?

 

 

Figure 1: Percentage of times the p value becomes less significant as more data are collected, given the current p value and amount of extra data (Table 1 in paper).

 

Figure 1 shows how the two-tailed p value changes as more subjects are tested, where the additional number of subjects is represented as a percentage of the original sample size. Changes in p values are shown for different thresholds of p. When the threshold is 0.08, which is the sort of marginal value where “trends” are often implied, increasing the sample size by 10% causes results to become less significant (i.e. p > 0.08) about 39% of the time. Doubling the sample size causes p > 0.08 about 23% of the time.

Changes in the percentage p values that become less significant are similar in magnitude even for the usual threshold of 0.05. That is, there are no “trends to statistical significance” whether the p value fails to reach, or is at, a threshold. The authors conclude that describing near significant p values as “trends towards statistical significance” or similar is not only inappropriate but is actively misleading, as p values are likely to become less significant if extra data are collected.

Wood and colleagues go on to show (1) whether it is likely that a p value reaches a certain threshold (instead of simply becoming more or less significant), and (2) whether it is likely that a significant result is obtained when an experiment is repeated, where the repeated experiment is analysed separately from the original experiment. Interested readers may peruse the article further.

Summary

Implying that nearly significant p values indicate “trends to significance” is not only inappropriate but actively misleading, because p values are quite likely to become less significant if extra data are collected.

Reference

Wood J, Freemantle N, King M, Nazareth I. Trap of trends to statistical significance: likelihood of near significant P value becoming more significant with extra data. BMJ 2014;348:g2215.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s