## What are degrees of freedom in statistics? – Part 1

When we perform a t test or calculate confidence intervals about an effect for a small study, we specify a t value from one of a family of t distributions depending on the number of degrees of freedom. What are degrees of freedom in statistics?

The number of degrees of freedom refers to the number of separate, relevant pieces of information that are available. For example, you are at a dinner party of 6 when you become suspicious that everyone else in the room seems to be a lot younger than you! Your host tells you that the mean age of people in the room is 23, and also tells you the age of 4 other people. Given these 5 pieces of information, and knowing your own age, you can work out the age of the 6th person. So, given only the mean age, there are 5 degrees of freedom in the set of 6 people at dinner.

How does this relate back to research? Suppose we examine the effect of vodka vs beer on pain during a pain provocation test in 30 university students in Australia, and the mean between-conditions difference in our study showed beer was better than vodka at dulling pain response. Our colleagues in Russia might read our study and wonder whether the findings would be the same in university students in Russia. Since our Russian colleagues will never test the same subjects as the ones in our Australian study, they are more interested in how the between-conditions difference will vary if our study was repeated many times; that is, they are interested in the confidence intervals about the mean difference.

To calculate a 95% confidence interval about our mean difference, we specify a t value associated with a 95% cut-off. But the t value itself follows a t distribution that depends on how many subjects were tested in the study. Testing more subjects provides more degrees of freedom (ie. more pieces of information) for the study, so that t distributions with more degrees of freedom approximate the Normal distribution more closely (Figure 1; distributions are generated using simulated data):

```
import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt
# generate an array of linearly spaced values between -5 and 5
x = np.linspace(-5, 5, 1000)
# generate a list of different degrees of freedom
dof = [2, 5, 29]
colors = ['b', 'g', 'r']
plt.figure()
# plot the probability density function of the Normal distribution over x
plt.plot(x, st.norm.pdf(x), 'k', linewidth=4, label='norm')
# plot the probability density functions of t distributions over x, for different dof
for d, color in zip(dof, colors):
plt.plot(x, st.t.pdf(x, d), color=color, linewidth=2, label='t (df={:.0f})'.format(d))
plt.xlim(-5, 5)
plt.ylim(0, 0.45)
plt.locator_params(axis = 'y', nbins=5)
plt.ylabel('Probability per unit of x')
plt.xlabel('x')
plt.legend()
plt.savefig('figure1.png', dpi=300)
```

Figure 1: Simulation of how t distributions with 2, 5 or 29 degrees of freedom approximate a Normal distribution, for the same data. t distributions with greater degrees of freedom approximate the Normal distribution more closely.

Since we tested 30 subjects in our Australian study, we would specify a t value for a 95% cut-off from a t distribution with 30 – 1 = 29 degrees of freedom. In the next post, we will see how degrees of freedom affect t values for the same cut-off.

### Summary

Degrees of freedom refers to the number of pieces of information that are available, and are determined by sample size. T distributions with more degrees of freedom approximate the Normal distribution more closely. For any given study, always try to test more subjects to increase precision (ie. 95% CI) of the estimate. Next, we will see how degrees of freedom influence t values when calculating 95% CI.

### Reference

Cumming and Calin-Jageman (2017) Introduction to the new statistics: Estimation, open science, & beyond. Routledge: New York. p 107.