## More on regression to the mean

Previously, we saw how regression to the mean can lead to false results. In a talk titled Regression towards the mean, or Why was Terminator III such a disappointment?, statistician Martin Bland explains this phenomenon and how it appears in different examples.

The Victorian statistician Francis Galton measured the heights of parents and children and found that taller parents tended to have shorter children, and shorter parents tended to have taller children. In other words, extreme values tend to “average out” when repeated measures are taken. Importantly, Galton showed that this is a statistical, not a physiological phenomenon. Galton titled his paper “Regression towards mediocrity in hereditary stature”. Today, we refer to this phenomenon as regression to the mean.

Regression to the mean is often most obvious when repeated measures are plotted over time. For example, Figure 1 shows data from a randomised controlled trial where dermatitis scores (SASSAD) of participants with eczema who received either borage oil or placebo. Over time, the average dermatitis score falls, whether participants are in the treatment or control groups. This is a classic example of regression to the mean.

### Examples of regression to the mean

The figure highlights how regression to the mean manifests in studies investigating treatments to reduce extreme outcomes. That is, improvement in a treated group cannot be attributed only to the treatment, because even if participants are not treated, the outcome would drop simply by taking a second measurement – this is what we observe in the control group. What we are really interested in is whether the outcome drops more in the treated group, compared to the control group – this is the true treatment effect.

Regression to the mean can manifest in other ways and with other study designs:

• Measuring differences from baseline: Sometimes researchers try to control for differences between groups at baseline by using the difference in repeated measures (i.e. change scores). Doing so ends up reversing a baseline difference between groups due to regression to the mean: participants with low baseline measurements tend to have high change scores.
• Testing strength of association between two variables: When the predictor variable is measured with error, the variable will regress to the mean on repeated measures, underestimating the strength of association between the predictor and outcome.
• Publication bias: Referees of papers do not always agree on which should be accepted. Since referees’ judgements of the quality of papers are made with error, they cannot perfectly correlate with any measure of the true quality of the paper. So when an editor accepts the “best” papers for publication, the average quality of these will be less than what the editor thinks, and the average quality of those rejected will be higher than what the editor thinks.
• Hollywood sequels: Hollywood sequels are only made if the original film is a “high quality” success. But the average “quality” of sequels will be closer to the mean than average “quality” of originals of sequels, because of regression to the mean. So sequels tend to be of lower “quality” than the original. (They should have stopped at Terminator 2: Judgement Day.)

### How to fix regression to the mean

• Use control groups: Avoid regression to the mean in studies with repeated measures (e.g. “before and after” studies) by using a control group.
• Duplicate baseline measurements: Use one baseline measurement to select subjects or calculate change scores, and use the other in analysis as the predictor variable. Duplicate baseline measurements are best collected on different occasions from the one used to group subjects because these will be less correlated.
• Use analysis of covariance (ANCOVA): An ANCOVA deals with regression to the mean by using change scores that are adjusted by baseline scores, so that each participant acts as their own control. See this previous post for more details.

### Summary

Regression to the mean–the averaging out of extreme values with repeated measures–is a statistical, not a physiological or a real phenomenon. It pops up in different study designs, and in all other sorts of places. We can deal with regression to the mean in experimental design (e.g. by using control groups) and in analysis (e.g. ANCOVA).

• Would you say RTM is still an issue when the baseline differences are not extreme? For example, if we look at heart rate, one group may have 70bpm average, and another 65bpm. Both are clinically normal and the 5bpm is not very meaningful, but may be statistically significant.

Secondly, would you say that it is adequate to use ONE of your suggested fixes? (using control, duplicate baselines OR ANCOVA?)

Like

• Joanna Diong

Hi Sara,