From t-tests to linear mixed models

Introduction

Everyday statistics –those I use regularly to answer research questions –have evolved since my first statistics course over 20 years ago. I learned about the t-test, ANOVA, repeated-measures ANOVA, correlation, simple and multiple regression, and non-parametric statistics like the Wilcoxon test, the Mann-Whitney U-test and the Chi-square test. While these tests are still commonplace in many disciplines, there is growing interest in Bayesian statistics, the New Statistics (i.e. estimation approach), and other modern approaches, including linear mixed models (LMM).

Although LMM have been around for over three decades, they have gained mainstream traction over the past decade.
This new series of blog posts is going to be a deep dive into LMM:
What are they?
How do they work?
How do you run them?
How do you interpret their output?
What diagnostic tests should you use?

Learning linear mixed models

As is often the case with new statistical approaches, early books are written by experts for experts. Gradually, online tutorials made their appearance, and more accessible books were published. However, no single book can serve as the perfect introduction to LMM, especially given the varied research questions people want to answer, and their different backgrounds and expertise. Moreover, what I found was lacking, especially online, was a repository of accessible examples applying LMM to research questions similar to those that I was asking. And to me, an example should have plots of the data, code examples of how to run the analysis, output from the analysis, and an explanation and interpretation of this analysis.

When learning something new, especially something as complex as statistics, I need to work through several examples before I really start to understand and feel comfortable with the subject matter. Simply reading books is not enough,
and copy-pasting code snippets from the internet (or ChatGPT) is not going to lead to deep, practical understanding of LMM. Therefore, there needs to be a mix of theory and practice: reading and understanding concepts, writing and running code.

Statistical software and programming languages

Personally, I am most comfortable programming in Python. While there are some good statistical packages in Python,
they are not as mature and complete as those available in R, or commercial software like SPSS, Stata or SAS. Books and online blog posts often focus on a single statistical programming language. While some people might be able to follow along and translate code and examples from one statistical programming language to another, I am not that person. Thus, I sometimes get stuck following only tutorials because I can’t follow the syntax, or a given option is only available for that specific statistical programming language.

In this series, I will use R and Python. Where possible, I will perform the analyses in both R and Python and compare the outputs. When something is not possible in Python, I will highlight it. Because Python is popular and easy to learn (and because it is my preferred programming language), where possible, I will use the rpy2 Python package to run the R examples. rpy2 is a Python package that is a robust interface between Python and R, which will allow us to run R commands from within Python. For example, it will allow us to (a) call R functions and access R objects directly from Python, (b) run R scripts within Python code, and (c) convert data structures between R and Python (e.g., pandas DataFrame ↔ R data.frame).

Conclusion

I have used LMM for several years now. However, due to the nature of my work and the types of research questions I tackle, my use of LMM has been somewhat sporadic. I have started and restarted learning LMM on several occasions,
and have accumulated many useful resources over this time.

This series is an attempt to consolidate this learning and understanding, and to provide my future self a useful set of tutorials that I can consult when I need a refresher. And if it happens to also help you along the way, that is an added bonus!

Leave a comment