Cohen’s d: what standardiser to use?
In a previous post we learned about Cohen’s d, a standardized measure of effect size. In this post we will learn why it is important to consider what value is being used to standardize our effect size.
Cohen’s d and the standardiser
The basic formula to calculate Cohen’s d is:
d = [effect size / relevant standard deviation].
The denominator of the equation is the standardiser and, as mentioned in the previous post, it is important to select the most appropriate standardiser for a given dataset because it can have a big influence on Cohen’s d.
As pointed out by Geoff Cumming and Robert Calin-Jageman in their book Introduction to the New Statistics: Estimation, Open Science, and Beyond, we should ask ourselves if we know the standard deviation associated with the most relevant population for our outcome measure. If so, this is the standardiser we should use.
Standardiser for the studied population
Pretend we used the Perdue pegboard test to measure fine-motor dexterity in a group of elderly subjects undergoing an intervention aimed at improving hand function. Rather than use the standard deviation measured from our sample of elderly subjects, we should use the standard deviation calculated from normative values for elderly subjects. But why should we favour this population-based standardiser?
The standard deviation calculated from large normative studies will better estimate the standard deviation of the studied population. Using this population-based estimate, Cohen’s d will only depend on the size of the effect measured in our study; the variability of our studied sample will not influence the results.
Standardiser for our studied sample
But what happens if there is no population-level estimate of the standard deviation? We will have to use the standard deviation of our studied sample as the standardizer. As highlighted in the previous section, this means that Cohen’s d will be influenced by the size of the effect as well as the variability in our sample.
Let’s pretend we conducted a study looking at the effects of reading scientificallysound.org for a week on human IQ in a wide range of undergraduate students. Compared to the control week where subjects watched Youtube cat videos, reading scientificallysound.org caused a 5-point increase on average (standard deviation = 20) on a newly developed IQ scale.
The BBC got wind of our amazing finding and decided to replicate our study, but this time with the BBC blog. Rather than study undergraduate students from a wide range of disciplines, the BBC only included students from the highly competitive astrophysics department. Reading the BBC blog for a week also resulted in a 5-point IQ increase in subjects, but this time the standard deviation was 10.
Because there are no normative values for this IQ test, we can’t use a population-based standardiser. Thus, Cohen’s d for our study is 0.25 (5/20), whereas it is 0.5 (5/10) for the BBC study. Both studies found a 5 point increase in IQ scores, but because there was less variability in the BBC study, Cohen’s d is considerably larger.
Cohen’s d is a ratio. To properly interpret d we need to know what standardiser (i.e., denominator) is used to calculate it. If studies use the same outcomes, it is always possible –and often preferable– to compare effect sizes in original units. For example, reading the BBC blog or scientificallysound.org leads to a 5-point increase in IQ.
Things are more complicated if we compare effect sizes when different outcome measures are used. The only means we have to compare these effects is to compute a standardized effect size (i.e., Cohen’s d), which as we have discovered is highly dependent on the choice of standardizer. Thus, when standardized effect sizes are reported, the choice of standardizer should always be specified.
In the next post we will learn about how to interpret Cohen’s d.