Why we need confidence intervals
At Scientifically Sound, we have reviewed ongoing discussions on the benefits of confidence intervals (CIs) over p values for statistical analysis and reproducibility in research. In a short editorial, the statistician Doug Altman summarised why we need confidence intervals and showed how confidence intervals force investigators to consider sizes of effects. Here are the key points:
- Two different but complementary approaches to statistical analysis are hypothesis testing and estimation. Using the hypothesis testing approach, we first say there is no difference between groups then calculate the probability (p value) of observing a difference at least as big as the one we observed. You could think of a p value as a measure of how surprised you might be at seeing a difference when there was no difference. In reality, a result where p < 0.05 provides marginal evidence of an effect ie. you’re only slightly surprised.
- Instead, using the estimation approach, we aim to quantify a difference between groups as an estimate of a clinically or physiologically relevant quantity, and measure how uncertain this estimate is using a CI. The 95% CI is a range of values on either side of the estimate between which we can be 95% sure that the true value lies. A narrow 95% CI indicates that the true value was estimated more precisely.
Altman explains that in a study comparing two groups such as a randomised controlled trial, a common mistake is to conclude from a non-significant result (ie. p > 0.05) that the groups are the same. CIs can be used to show why this is a problem.
For example, investigators carried out a randomised controlled trial on 337 patients comparing laparoscopy assisted resection versus open resection in patients with rectal cancer (Leung et al, 2004). They sought to find a 15% between-group difference in probability of survival at 5 years. The between-group difference was not statistically significant (p = 0.61) and the investigators concluded that laproscopic surgery did not affect survival. However, a confidence interval for the difference in survival was not presented in the paper.
Using the standard errors of survival probabilities provided for each group, the calculated 95% CI for the difference in 5-year survival ranged from -7.5% to 13.9%. In other words, laparoscopic resection could have resulted in 7.5% worse survival to 13.9% better survival on average, compared to open resection. The 15% difference sought by the investigators was just outside the CI. So, even with 337 patients, there is still a lot of uncertainty about the relative survival associated with the two procedures. This uncertainty was not evident in the paper when results were presented with p values, so it was not possible to know how much worse or better patients might fare. To conclude that the groups are the same based purely on p values does not take into account the precision of the estimate.
Another point to consider is that confidence intervals only indicate uncertainty due to sampling variation, not uncertainty due to protocol violations, loss to follow up, and so on. Consequently, true uncertainty is greater than what is indicated by confidence intervals.
Confidence intervals indicate how precisely an effect was estimated. Confidence intervals need to be used, together with other information on how the study was conducted, to judge efficacy of interventions.
Altman DG (2005) Why we need confidence intervals. World J Surg 29:554–556.
Leung KL, Kwok SP, Lam SC, Lee JF, Yiu RY, Ng SS, Lai PB, Lau WY. (2004) Laproscopic resection of rectosigmoid carcinoma: prospective randomized trial. Lancet 363:1187-1192.