## Research concepts: From sample to population

In doing research, we apply the scientific method to answer questions. For example, does cigarette smoking cause lung cancer? What are the mechanisms of weakness after stroke? Why do cells become cancerous? What properties are specific to the poison of South American tree frogs?

We want to understand all the individuals being studied (i.e. people, cells, frogs, etc.) but it is often impossible to collect data from all individuals in the population. Instead, we collect data from a *sample* (i.e. a smaller number of individuals) and use it to tell us about the *population*. What are important things to understand about samples and populations?

### Sampling from a population

Statisticians say we extrapolate from a sample to make conclusions about a population. In other words, we use a sample to *make inferences* about a population. A sample needs to be *representative* of a population so we can make accurate conclusions about the population. Sometimes, this is problematic.

### Sampling error and bias

In most statistical calculations, it is assumed that the data being analysed are randomly sampled from the population. Consequently, the statistical values calculated using the sample (e.g. the sample mean, variability, slope of line-of-best-fit in a linear regression, etc.) are *estimates* of the true population value. There are different reasons why the estimate from a sample is not the same as the true population value:

**Sampling error.** By chance, the sample may have a higher or lower statistic (e.g. mean, variability, slope of line, etc) than the true population statistic. This is caused by *random variation* in individuals in the population.

**Selection bias.** The sample may consistently have a higher or lower statistic than the true population statistic. This is caused by *systematic* (i.e. always in one direction) differences between the sample and population. For example, cigarette smoking may appear to cause cancer if heavy smokers only were sampled, instead of sampling across light and heavy smokers.

**Other biases.** Any other bias that cause systematic differences between the sample and population. Systematic differences are worse than random differences.

### Big picture things

The mechanics of *using a sample to make inferences about a population* are made possible by describing samples and populations using mathematical *distributions*, which are defined by certain *parameters*. For example, The height of university students follows a bell-shaped Normal distribution with a mean (the average) and standard deviation (a measure of variability); any Normal distribution is completely described by these 2 statistics.

### Summary

In research, the aim of data analysis is to make the most accurate conclusions where data are limited. Statistics is used to extrapolate properties of samples to make inferences about populations. Conclusions can be biased if samples are biased, so make sure that samples accurately represent the population.

In the next post, we will discuss how confidence intervals are used to make inferences about the population.

This link returns to the table of contents for this series.

### Reference

Motulsky H (2018) Intuitive Biostatistics. A Nonmathematical Guide to Statistical Thinking. 4th Ed. Oxford University Press: Oxford, UK.