## Survival analysis 2: Survivor and hazard functions

The aim of survival analysis is to analyse the time to which an event occurs. In other words, survival analysis involves analysing time to event outcomes. As seen in the previous post, this analysis can involve non-parametric (Kaplan-Meier or Nelson-Aalen methods), semiparametric (Cox regression) or parametric methods.

As Cleves et al (2008) succinctly put it,

The key to mastering survival analysis lies in grasping its jargon.

Sometimes, learning a new technique requires the hard work of mastering its language. Unfortunately for survival analysis, this is unavoidable. (I still need refreshers from time to time as well; pun not intended)

This post helps describe statistical terms and concepts unique to survival analysis, and how they are related to each other. Use it as a reference guide when you come across some jargony word in future.

### Survivor and hazard functions

Suppose T is a random variable of time to an event. Survival analysis aims to analyse T.

We want to describe how T is distributed. Typically, we could describe the distribution of T with a *probability distribution function* or a *cumulative distribution function*. But for time to event outcomes, people prefer to talk about the *survivor function* or the *hazard function* of T. These 4 functions of T and a fifth function, the *cumulative hazard* of T, are described.

The *survivor function S(t)*

- is the probability of surviving beyond time t, or
- the probability that no failure event occurs before time t.
- Since the survivor function is a probability, it is equal to 1 at time = 0 (since everyone is still surviving), and decreases to 0 as time approaches infinity.

The *probability distribution function f(t)*

- is also called the probability density function,
- and is the distribution of all possible values of t.

The *cumulative distribution function F(t)*

- is the total amount of probability that has accumulated up to time t, where ‘instantaneous probability’ is obtained from the probability distribution function.
- The reverse of the cumulative distribution function is the survivor function.

The *hazard function h(t)*

- is also called the hazard rate,
- and is the
*rate*of failure at any instant (emphasis on "rate"), which is the same as the rate at which risk is accumulated. - The probability of surviving past a certain time is related to the amount of risk that has accumulated up to that time in a 1:1 ratio. So the survival function is related to the hazard function in a 1:1 ratio.
- Since the hazard function is a rate, it has units of 1/time. The hazard rate ranges from 0 (no risk of failure) to infinity (certain to fail). Over time, it can be constant, increase, decrease, or take some other pattern.

The *cumulative hazard function H(t)*

- is the total amount of risk that has been accumulated up to time t.
- It is the integral of the hazard function from 0 to time t.

These functions are mathematically related like so:

- S(t) = exp{ -H(t) }, where H(t) is the integral of h(t) from 0 to time t
- F(t) = 1 – exp{ -H(t) }
- f(t) = h(t) exp{ -H(t) } = h(t) S(t)

**Note,** I’m showing these relations so the more intrepid readers can see for themselves how they are related. In practice, the software you use takes these relations into account; you don’t have to memorise them. (I certainly don’t.) It’s just useful to keep in back of mind that modelling one function tells us about all the other functions.

Survival analysis will produce a hazard ratio (95% CI), which is the ratio of the hazard in the exposed group to the hazard in the control group. The definitions above are intended to help you interpret the findings.

### Interpreting the hazard function

The hazard function (or hazard rate) is central to survival analysis. Underlying processes (e.g. disease, mechanical wear) determine its shape, and reflect the rate at which risk is accumulated:

- When risk of an event (e.g. death) is zero, the hazard is zero
- When risk increases with time, so does the hazard. The future looks bleak.
- When risk decreases with time, so does the hazard. The future looks better if we can survive past the present

The hazard rate is a rate, with units 1/time. Multiplying a hazard rate by 1 unit of time produces the number of expected failure events over that time period. E.g. For a hazard rate of 2/year, if the rate remains constant over a year, we would expect 2 failures.

The reciprocal of a hazard rate, 1/(1/time) = time, produces the expected amount of time for a failure event to occur, for a constant hazard rate. E.g. For a hazard rate of 2/year, if the rate remains constant, we would expect to wait half a year for 1 failure to occur.

### Interpreting the cumulative hazard function

The hazard rate is a rate, much like a sampling rate, and measures the rate at which risk is accumulated. The cumulative hazard is the total amount of risk accumulated up to a time point. Let’s use an example of machine sampling rates to understand this:

For a constant sampling rate (e.g. 50 Hz), sampling for 4 sec accumulates 4 x 50 = 200 samples of data. Likewise, for a constant hazard rate (e.g. 2/year), observing for 4 years accumulates 4 x 2 = 8 failure events.

If the hazard rate changes over time, we could observe the same or different accumulated number of failures. E.g. if the hazard rate is 1/year for 3 years, then 5/year for 1 year, observing for the same 4 years still accumulates the same number of failures, (3 x 1) + (1 x 5) = 8 failures. However, note that individuals experienced different risks/rates of failure over the 4 year period.

### Summary

The hazard function (or hazard rate) is the *rate* of failure at any instant, or the rate at which risk is accumulated. The shape of the hazard function is determined by underlying disease, physiological or physical processes. Knowing the hazard function determines the survivor and cumulative hazard functions. The hazard function is central to the analysis of time to event outcomes.

Next time, we’ll briefly discuss survival models and the phenomenon of data *censoring*.

### Reference

Cleves M, Gutierrez R, Gould W, Marchenko Y (2008) An Introduction to Survival Analysis Using Stata. Chp 2. (2nd Ed) Stata Press: Texas, USA.