Survival analysis 3: Modelling time to event
Recall that the aim of survival analysis is to analyse the time to which an event occurs (see survival posts 1 and 2). Specifically, survival analysis answers how time to event changes as other variables change. Remember that there is a 1:1 relationship between the probability of surviving past a certain time, and the rate at which risk is accumulated. Since the rate at which risk is accumulated is the no. of events per unit time, or the hazard rate, modeling the hazard rate allows us to model survival.
Proportional hazard models
A popular way to model time to event is to model the hazard rate as a function of a baseline hazard and other linear predictors:
- h(t) = h0(t) exp(intercept + slope * x)
where h(t) is the hazard rate, and h0(t) is the baseline hazard.
This model says that the hazard a participant faces is a function of the baseline hazard that everyone faces, modified by the participant’s value of some variable x. The term intercept + slope * x is known as the linear predictor (see post on generalised linear models, part 2).
This model is known as a proportional hazards model: the hazard a participant faces is proportional to the baseline hazard by multiplying it by the linear predictor. The exponent exp() is used to avoid mathematical impossibilities if h(t) is ever negative.
Proportional hazard models are common in survival analysis. Sometimes, it may be inappropriate to assume that hazards vary proportionally in a multiplicative way. In that case, we could use a proportional hazards model that is additive:
- h(t) = h0(t) + exp(intercept + slope * x)
Proportional hazard models can be applied by assuming whether time to event outcomes and predictors assume distributions (described by parameters; parametric models) or not (semi- or non-parametric models).
Parametric vs semiparametric survival models
Parametric survival models assume that time to event outcomes and variables in the linear predictor follow distributions, and model the variables using these distributions. The exponential and Weibull distributions are commonly used to model hazard functions because of their mathematical properties.
Semiparametric survival models don’t assume that time to event outcomes follow a distribution, but assume that other variables do follow distributions. These models perform separate analyses at each time point to determine the probability that the event occurs, then combine the separate probabilities into a single analysis (see survival post 1). The Cox proportional hazard model is an example of a popular semiparametric model.
In practice, your statistical program will have commands for different types of survival models. Understanding what these models do and their assumptions will help you choose a model and know how it should be specified.
Analysing time to event is not the same as analysing what caused the event
Regardless whether parametric or non-parametric models are used, both types of models imply that time causes the outcome. But sometimes this doesn’t make sense. For example, an increasing viral load might be correlated with time to death and might increase the risk of death, but time itself is not the cause of death.
If we understood the mechanisms that caused an outcome to occur, there would be no need to analyse time. But when we don’t understand the mechanism, we can analyse time in survival models as a proxy to these unknown mechanisms. Thus, we should be careful about how time is measured/defined so it fulfills its roles as a proxy. As a guide,
For any survival model, ensure that when two people have equal time to event values, the risk they face would be the same if they also had the same covariates. E.g. suppose we analyse the association between lung cancer and smoking, it would not make sense to define time as age, because this implies that two people would have the same risk of cancer if they were the same age, even if one had smoked for 20 years and one for 2 months.
When fitting parametric survival models, decide when the onset of risk occurs.
Proportional hazards models are a common way to model time to event outcomes, and can be applied in both parametric and semiparametric models. In parametric models, the exponential and Weibull distributions are commonly used to model hazard functions. In contrast, the Cox proportional hazard model is a commonly used semiparametric model.
Cleves M, Gutierrez R, Gould W, Marchenko Y (2008) An Introduction to Survival Analysis Using Stata. Chp 3. (2nd Ed) Stata Press: Texas, USA.