Calculating sample size for a 2 independent sample t-test

Scientists often plan for studies by calculating how many subjects or units need to be tested in order to find an effect. That is, they plan for a study using statistical power according to principles of hypothesis testing. Sample size calculations are usually required in ethics applications and grant proposals to justify the study.

We previously learned how to calculate sample size for a 2 independent t-test in R. If you do most of your work in Python, you could instead use the statsmodels package to perform the same calculation. statsmodels is a Python module that provides functionality for conducting many statistical tests and analyses. It has been tested against R and other statistical packages, and implements R-style formulas with pandas dataframes or numpy functions to fit models.

Calculating sample size for a 2 independent sample t-test in Python requires specifying similar parameters to performing the calculation in R, but there are some differences. Here’s how to do it in statsmodels (output shown using >>> prompt, and documentation available here):

from statsmodels.stats.power import tt_ind_solve_power

mean_diff, sd_diff = 0.5, 0.5
std_effect_size = mean_diff / sd_diff

n = tt_ind_solve_power(effect_size=std_effect_size, alpha=0.05, power=0.8, ratio=1, alternative='two-sided')
print('Number in *each* group: {:.5f}'.format(n))

>>> Number in *each* group: 16.71472

The tt_ind_solve_power() function requires the following parameters to calculate sample size:

  • effect_size: The standardised effect size ie. difference between the two means divided by the standard deviation; this value has to be positive. (This is different to R’s delta parameter, which requires the mean difference only.)
  • alpha: Significance level or probability of Type I error (false positives), usually set at 0.05.
  • power: Power of the test, or 1 – probability of Type II error (false negatives), usually set at 0.8.
  • ratio: Ratio of sample size in sample 2 relative to sample 1, default set at 1. (This function can be used to calculate power for unevenly-sized samples.)
  • alternative: Power the test to detect two-sided effects (eg. the effect could be an increase or a reduction in outcome, not forced to be only an increase in outcome.)

In the code above, we specified the difference between two means and the standard deviation of the difference as 0.5 each, producing a standardised effect size of 1. This means we are calculating sample size (or powering the study) to detect quite a big effect! Performing the sample size calculation in Python obtains the same answer, to 4 decimal places, as the output from R.

It is easy to see that changes in the standardised mean difference we want to detect will change the sample size. For example, for a given mean difference of 0.5, sample size increases as standard deviation of the difference increases:

for sd in [0.4, 0.5, 0.6]:
    n = tt_ind_solve_power(effect_size=mean_diff/sd, alpha=0.05, power=0.8, ratio=1, alternative='two-sided')
    print('Number in *each* group when SD is {:<4.1f}: {:.2f}'.format(sd, n))

>>> Number in *each* group when SD is 0.4 : 11.09
>>> Number in *each* group when SD is 0.5 : 16.71
>>> Number in *each* group when SD is 0.6 : 23.60

Summary

We used Python’s statsmodels module to calculate sample size for a 2 independent sample t-test. Sample size is sensitive to the size and variability of the difference between groups, and tolerance to Type I and II errors.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s