Small sample sizes and the bias of small numbers
As scientists, we have all received some level of training in statistics. A fundamental concept is that we are trying to make inferences about a specific population, but that we only have access to a sample of the people, dogs, amoebas, etc that belong to that population. By randomly sampling amoebas for example, we collect data and conduct statistical tests to learn something about the entire population, not just the amoebas we happen to have tested.
Because we are not able to collect data from all amoebas, our conclusions come with uncertainty. How well our conclusions apply to the entire population, how generalizable they are, depends on how well our sample is representative of the population. It might be that the small number of amoebas we sampled were particularly aggressive. This characteristic is not shared by the majority of amoebas in the population, but because we have not included a measure of aggression in our current study, we have no way of knowing that our sample is not representative.
However, because our statistical analyses reveal an interesting finding, we draft a manuscript and submit it to the top amoebas journal. Importantly, we draft the manuscript from the point of view that our sample is in fact representative of the overall population. Because our results were highly significant, we are convinced we have discovered something important. But is this in fact true?
On average, larger samples that are truly selected at random will be more representative of the entire population than a smaller sample. Yet, science is riddled with studies performed on small samples, which in most instances do not represent the overall population. Why are there so many small studies? As pointed out by Nobel Laureate Daniel Kahneman more than 40 years ago, part of the problem is that humans are running the show…
Belief in the law of small numbers
In a paper published in 1971 in Psychological Bulletin entitled Belief in the law of small numbers, Tversky & Kahneman argue that because scientists, who are human, have poor intuition about the laws of chance (i.e. probability), there is an overwhelming (and erroneous) belief that a randomly selected sample is highly representative of the population studied. The authors tested (and confirmed) this hypothesis by conducting a series of surveys on scientists.
“A confidence interval, however, provides a useful index of sampling variability, and it is precisely this variability that we tend to underestimate.”
The authors summarized their key findings as follows:
- Scientists gamble research hypotheses on small samples without realizing that the odds against them are unreasonably high. Scientists overestimate power.
- Scientists have unreasonable confidence in early trends and in the stability of observed patterns. Scientists overestimate significance.
- In evaluating replications, scientists have unreasonably high expectations about the replicability of significant results. Scientists underestimate the magnitude of confidence intervals.
- Scientists rarely attribute a deviation of results from expectations to sampling variability, because they find a causal “explanation” for any discrepancy. Thus, they have little opportunity to recognize sampling variation in action. Scientists self-perpetuate the belief in small numbers.
Statistical power and sample sizes.
“[Tversky & Kahneman] refuse to believe that a serious investigator will knowingly accept a 50% risk of failing to confirm a valid research hypothesis.”
It was interesting to note that many of the topics currently being discussed in the context of reproducible science were also being discussed more than 30 years ago. For example, the presence of “ridiculously underpowered studies”, the importance of reproducing a key finding, the sample size to use in a replication study, the limitations of p-values, the bias present in interpreting and reporting scientific results.
With such clear thinkers at the helm, why were these issues not resolved and their solutions implemented decades ago?
Reliance on p-values.
“The emphasis on statistical significance levels tends to obscure a fundamental distinction between the size of an effect and it statistical significance. Regardless of sample size, the size of an effect in one study is a reasonable estimate of the size of the effect in replication. In contrast, the estimated significance level is a replication depends critically on sample size.”
The belief that results from small samples are representative of the overall population is a cognitive bias. As such, it is active without us even knowing about it. Effort must be exerted to recognize it in ourselves, and precautions put in place to limit its impact. Examples of such precautions include focusing on the size and certainty of an observed effect, pre-registration of study protocols and analyses plans, and blinded data analyses.
Tversky A & Kahneman D (1971).
Belief in the law of small numbers. Psychological Bull. 76: 105-110.