The likelihood ratio test: relevance and application

Suppose you conduct a study to compare an outcome between two independent groups of people, but you realised later that the groups were unexpectedly different at baseline. This difference might affect how you interpret the findings.
For example, you measured muscle stiffness in people with stroke and in healthy people. At the end of the study, you realised that on average, the people with stroke were older than the healthy people. If older people tend to have stiffer muscles anyway, it is not fair to say muscles are stiffer in people with stroke compared to healthy people when the same people with stroke are also older. What you need to know is whether older people do tend to have stiffer muscles. That is, is a potential confounder (age) associated with the outcome (muscle stiffness)? We can use a likelihood ratio test to answer this question.
The likelihood ratio test is used to compare how well two statistical models, one with a potential confounder and one without, fit a set of observations. For the example above, first, we fit a model to see how well having a stroke explains whether people have increased muscle stiffness (null model). Next, we fit a model to see how well having a stroke and being older explains whether people have increased muscle stiffness (alternative model). We then use the likelihood functions from both models to test whether the alternative model fits the data better, compared to the null model. Mathematically, the comparison is made easier by using the log of the likelihoods. The statistic calculated from a likelihood ratio test follows a chi-square distribution with degrees of freedom equal to the difference in degrees of freedom of the two models; short details on Stack Overflow here and here.
Here is a function that performs a likelihood ratio test. We implement the function using an example dataset to test how well pig weight increases with time (null model) compared to how well pig weight increases with time given the litter the pig is born in (alternative model).
import statsmodels.api as sm
import statsmodels.formula.api as smf
from scipy import stats
stats.chisqprob = lambda chisq, df: stats.chi2.sf(chisq, df)
def lrtest(llmin, llmax):
lr = 2 * (llmax - llmin)
p = stats.chisqprob(lr, 1) # llmax has 1 dof more than llmin
return lr, p
# import example dataset
data = sm.datasets.get_rdataset("dietox", "geepack").data
# fit time only to pig weight
md = smf.mixedlm("Weight ~ Time", data, groups=data["Pig"])
mdf = md.fit(reml=False)
print(mdf.summary())
llf = mdf.llf
# fit time and litter to pig weight
mdlitter = smf.mixedlm("Weight ~ Time + Litter", data, groups=data["Pig"])
mdflitter = mdlitter.fit(reml=False)
print(mdflitter.summary())
llflitter = mdflitter.llf
lr, p = lrtest(llf, llflitter)
print('LR test, p value: {:.2f}, {:.4f}'.format(lr, p))
Mixed linear regressions are used to fit the models. The llf
attribute is generated for each model—this is the log likelihood statistic. The likelihood ratio test then compares the log likelihood values and tests whether the alternative model is significantly different to the null model. The code above generates the following output:
Mixed Linear Model Regression Results
========================================================
Model: MixedLM Dependent Variable: Weight
No. Observations: 861 Method: ML
No. Groups: 72 Scale: 11.3525
Min. group size: 11 Likelihood: -2402.9325
Max. group size: 12 Converged: Yes
Mean group size: 12.0
--------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
--------------------------------------------------------
Intercept 15.724 0.783 20.083 0.000 14.189 17.258
Time 6.943 0.033 208.071 0.000 6.877 7.008
groups RE 39.821 2.107
========================================================
Mixed Linear Model Regression Results
========================================================
Model: MixedLM Dependent Variable: Weight
No. Observations: 861 Method: ML
No. Groups: 72 Scale: 11.3525
Min. group size: 11 Likelihood: -2402.8752
Max. group size: 12 Converged: Yes
Mean group size: 12.0
--------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
--------------------------------------------------------
Intercept 16.140 1.458 11.073 0.000 13.283 18.997
Time 6.943 0.033 208.071 0.000 6.877 7.008
Litter -0.034 0.101 -0.339 0.735 -0.233 0.164
groups RE 39.756 2.103
========================================================
LR test, p value: 0.11, 0.7350
Since the likelihood ratio test was not statistically significant, the litter the pig was born in does not explain its weight, so we reject the alternative model because the growth in pig weight is sufficiently explained by time.
Summary
The likelihood ratio test compares how well a model with a potential predictor explains an outcome, compared to a model without the predictor. That is, the test indicates whether a potential predictor is associated with an outcome. If the predictor is not associated with the outcome, we reject the alternative model in favour of the null model.
The likelihood ratio test similar to above is also available within the biosig Python package I wrote for biological and transducer signal processing. Users are invited to road test this and other functions.
Thank you!!!! You were better than chatGPT in this one 😉
LikeLike
Why, thank you!
But then in ChatGPT, no one is really there.. 🙂
LikeLike
Very informative tutorial, may I ask if there will be any difference in the approach if the outcomes are binary?
LikeLike
In this example I performed a likelihood ratio test to compare linear regression models for a continuous outcome. If the outcome is binary, run logistic regression models instead to compare them with a likelihood ratio test.
This UCLA tutorial using the Stata program shows how.
LikeLike