Incidence and prevalence: why incidence estimates are better

Measuring how commonly a disease occurs is the first step in studying patterns and causes of disease. But questions about disease occurrence can be asked in different ways. We could ask "how many older adults have low back pain at one point in time?" Or, "how many older adults develop back pain in one year?" These questions sound similar, but their answers mean different things. How are they different?

The incidence and prevalence of disease both represent proportions of people with the disease at a certain time. "Time" could mean lots of things: calendar time, age, time from diagnosis, time exposed to a risk factor, etc. So it’s important to define what "time" means before calculating either measure.

Suppose we have a group of people at risk of a disease. To be "at risk" means a person does not have the disease, or had the disease and has since recovered and could get it again. The prevalence of the disease in this at-risk group is the proportion of people with the disease at a specific point in time. In contrast, the incidence of the disease in this at-risk group is the proportion of people who, at the beginning of some interval of time, develop new cases of the disease by the end of that time interval. In this sense, prevalence reflects the burden of disease whereas incidence reflects the force of disease.

Let’s visualise what this means. The figure shows horizontal timelines of 6 people from a group of 100 who develop (and recover) from a disease. Disease states of individuals are recorded at 3 time points (Times 0, 1, 2). At Time 1, Cases 1, 3, 4, and 6 have the disease, So the prevalence of disease at Time 1 is 4/100 = 0.04 (or 4/99, depending on whether Case 2 is at risk of the disease). In contrast, over the interval of Time 0 to 1, only Cases 3 and 4 develop new incidents of disease. Cases 1, 2, and 6 could not be at risk of disease because they already have the disease. So the incidence of disease over the interval Time 0-1 is 2/97 = 0.02.

Fig 1. Seven individuals with disease in a population. Horizontal lines represent duration of disease. Vertical lines represent points in time when disease occurrence is measured.

The problem with using prevalence to understand causes of disease is that prevalence estimates depend both on when the disease starts, and how long the disease lasts. An at-risk population might appear to have a low prevalence of disease when (1) the disease rarely occurs, or (2) the disease occurs frequently but affected individuals do not stay diseased for long, either due to recovery or death. Conversely, prevalence of disease may be high because (1) the disease truly is common, or (2) the disease is rare but affected individuals stay diseased for a long time.

If the aim is to understand causes of disease, estimates of incidence serve as better measures of disease occurrence than prevalence.


Jewell NP (2004) Statistics for epidemiology. (Chp 2) CRC Press: Florida, USA.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s