Research concepts: From sample to population
In doing research, we apply the scientific method to answer questions. For example, does cigarette smoking cause lung cancer? What are the mechanisms of weakness after stroke? Why do cells become cancerous? What properties are specific to the poison of South American tree frogs?
We want to understand all the individuals being studied (i.e. people, cells, frogs, etc.) but it is often impossible to collect data from all individuals in the population. Instead, we collect data from a sample (i.e. a smaller number of individuals) and use it to tell us about the population. What are important things to understand about samples and populations?
Sampling from a population
Statisticians say we extrapolate from a sample to make conclusions about a population. In other words, we use a sample to make inferences about a population. A sample needs to be representative of a population so we can make accurate conclusions about the population. Sometimes, this is problematic.
Sampling error and bias
In most statistical calculations, it is assumed that the data being analysed are randomly sampled from the population. Consequently, the statistical values calculated using the sample (e.g. the sample mean, variability, slope of line-of-best-fit in a linear regression, etc.) are estimates of the true population value. There are different reasons why the estimate from a sample is not the same as the true population value:
Sampling error. By chance, the sample may have a higher or lower statistic (e.g. mean, variability, slope of line, etc) than the true population statistic. This is caused by random variation in individuals in the population.
Selection bias. The sample may consistently have a higher or lower statistic than the true population statistic. This is caused by systematic (i.e. always in one direction) differences between the sample and population. For example, cigarette smoking may appear to cause cancer if heavy smokers only were sampled, instead of sampling across light and heavy smokers.
Other biases. Any other bias that cause systematic differences between the sample and population. Systematic differences are worse than random differences.
Big picture things
The mechanics of using a sample to make inferences about a population are made possible by describing samples and populations using mathematical distributions, which are defined by certain parameters. For example, The height of university students follows a bell-shaped Normal distribution with a mean (the average) and standard deviation (a measure of variability); any Normal distribution is completely described by these 2 statistics.
In research, the aim of data analysis is to make the most accurate conclusions where data are limited. Statistics is used to extrapolate properties of samples to make inferences about populations. Conclusions can be biased if samples are biased, so make sure that samples accurately represent the population.
In the next post, we will discuss how confidence intervals are used to make inferences about the population.
This link returns to the table of contents for this series.
Motulsky H (2018) Intuitive Biostatistics. A Nonmathematical Guide to Statistical Thinking. 4th Ed. Oxford University Press: Oxford, UK.