## Don’t repeat yourself: Python modules We previously learned to create our own Python functions to reduce how much we repeat ourselves in our code. In this post we see another example of the DRY principle (don’t repeat yourself) and we will learn how to ensure we don’t repeat ourselves between the different programs we write.

### A typical (bad) script to process data

Below is an example of a script we might use to process data from five subjects. The data consists of subject initials, height, weight, age, systolic and diastolic blood pressure. We want to calculate each subjects’ BMI, their predicted maximum heart rate and their blood pressure status and then print out the results.

```# subject =  [initials, height, weight, age, systolic, diastolic]
subject1 = ['GA', 80, 1.62, 70, 120, 80]
subject2 = ['KT', 69, 1.53, 65, 136, 75]
subject3 = ['MN', 80, 1.66, 89, 113, 72]
subject4 = ['PW', 80, 1.79, 55, 141, 96]
subject5 = ['HJ', 72, 1.60, 61, 121, 78]

# process data for subject1
initials, weight, height, age, systolic, diastolic = subject1

# Calculate BMI
bmi = int(weight / height**2)
# Caculate predicted maximum heart rate
max_HR = 208 - 0.7 * age
# Caculate blood pressure risk
if systolic >= 120 and systolic < 130 and diastolic < 80:
bprisk = 'elevated BP'
elif (systolic >= 130 and systolic < 140) or (diastolic >= 80 and diastolic < 90):
bprisk = 'stage 1 hypertension'
elif systolic >= 140 or diastolic >= 90:
bprisk = 'stage 2 hypertension'
else:
bprisk = 'invalid values'
# Print summary
print("\n\t" + initials)
print("\tweight = {}kg".format(weight))
print("\theight = {}m".format(height))
print("\tage = {} years old".format(age))
print("\tblood pressure = {}/{}".format(systolic, diastolic))
print("\n\tbmi = {}".format(bmi))
print("\tpredicted maximal heart rate = {} bpm".format(max_HR))
print("\tblood pressure = " + bprisk)
print("\n")

# process data for subject2
initials, weight, height, age, systolic, diastolic = subject2

# [copy past code from above]
...
```

As you can see, we would need to cut-and-paste the majority of the above code another 4 times to process the data for each subject. If we later discovered a mistake in one of our formulas, we would have to fix the mistake in no less than 5 locations in our code. Things get even more complicated if the code was copied-and-pasted into another program.

### Using functions to avoid repeating ourselves

```def bmi_calc(weight_kg, height_m):
"""Calculate BMI from weight in kg and height in meters"""
bmi = int(weight_kg / height_m**2)
return bmi

def predict_max_HR(age):
"""Age predicted maximal heart rate"""
max_HR = 208 - 0.7 * age
return max_HR

def bp_risk(systolic, diastolic):
"""Categorises whether blood pressure is elevated,
stage 1 hypertension or stage 2 hypertension"""
if systolic >= 120 and systolic < 130 and diastolic < 80:
bprisk = 'elevated BP'
elif (systolic >= 130 and systolic < 140) or (diastolic >= 80 and diastolic < 90):
bprisk = 'stage 1 hypertension'
elif systolic >= 140 or diastolic >= 90:
bprisk = 'stage 2 hypertension'
else:
bprisk = 'invalid values'
return bprisk

def print_results(initials, weight, height, age, systolic, diastolic):
bmi = bmi_calc(weight, height)
max_HR = predict_max_HR(age)
bprisk = bp_risk(systolic, diastolic)
print("\n\t" + initials)
print("\tweight = {}kg".format(weight))
print("\theight = {}m".format(height))
print("\tage = {} years old".format(age))
print("\tblood pressure = {}/{}".format(systolic, diastolic))
print("\n\tbmi = {}".format(bmi))
print("\tpredicted maximal heart rate = {} bpm".format(max_HR))
print("\tblood pressure = " + bprisk)
print("\n")

# subject =  [initials, height, weight, age, systolic, diastolic]
subject1 = ['GA', 80, 1.6, 70, 120, 80]
subject2 = ['KT', 69, 1.5, 65, 136, 75]
subject3 = ['MN', 80, 1.6, 89, 113, 75]
subject4 = ['PW', 80, 1.7, 55, 141, 96]

subjects = [subject1, subject2, subject3, subject4]

for sub in subjects:
initials, weight, height, age, systolic, diastolic = sub
print_results(initials, weight, height, age, systolic, diastolic)
```

The output for the first subject looks like this:

```GA
weight = 80kg
height = 1.6m
age = 70 years old
blood pressure = 120/80

bmi = 31
predicted maximal heart rate = 159.0 bpm
blood pressure = stage 1 hypertension
```

This is a big improvement over the previous version of our code. However, this is still a processing script: code that we copy-and-paste into a Python command line or run as program from the command line. It has a single purpose, which is to process that data from the 5 subjects manually entered.

What if we had a few studies that required us to calculate and print these outcomes? Should we copy-and-paste the code to other scripts? No! Don’t repeat yourself. The best thing to do is create a Python module.

### Creating a Python module to reuse code

Creating a Python module is simple. We put all our our functions (just the functions, nothing else) in a file and save it with a `.py` file extension.

For our current example, we can put all of our function into a file called `fitness.py`.

```def bmi_calc(weight_kg, height_m):
"""Calculate BMI from weight in kg and height in meters"""
bmi = int(weight_kg / height_m**2)
return bmi

def predict_max_HR(age):
"""Age predicted maximal heart rate"""
max_HR = 208 - 0.7 * age
return max_HR

def bp_risk(systolic, diastolic):
"""Categorises whether blood pressure is elevated,
stage 1 hypertension or stage 2 hypertension"""
if systolic >= 120 and systolic < 130 and diastolic < 80:
bprisk = 'elevated BP'
elif (systolic >= 130 and systolic < 140) or (diastolic >= 80 and diastolic < 90):
bprisk = 'stage 1 hypertension'
elif systolic >= 140 or diastolic >= 90:
bprisk = 'stage 2 hypertension'
else:
bprisk = 'invalid values'
return bprisk

def print_results(initials, weight, height, age, systolic, diastolic):
bmi = bmi_calc(weight, height)
max_HR = predict_max_HR(age)
bprisk = bp_risk(systolic, diastolic)
print("\n\t" + initials)
print("\tweight = {}kg".format(weight))
print("\theight = {}m".format(height))
print("\tage = {} years old".format(age))
print("\tblood pressure = {}/{}".format(systolic, diastolic))
print("\n\tbmi = {}".format(bmi))
print("\tpredicted maximal heart rate = {} bpm".format(max_HR))
print("\tblood presure = " + bprisk)
print("\n")
```

### Using a module

We have create a module called `fitness.py` that contains four functions. We can now use these functions in any project. Importantly, if we later find a bug in our code, we only have to fix it in one location.

There are a few ways to access (or import) the functions we placed in our module.

import. The simplest is to import our module by its name and access its functions using dot-notation. This approach is very transparent because someone reading our code will immediately see that the function comes from a specific module.

```import fitness

bmi = fitness.bmi(80, 1.6)  # weight (kg), height (m)
max_HR = fitness.predict_max_HR(76)
bprisk =  fitness.bp_risk(143, 91)
```

from x import y. If we only want to use one or two of the function from our module, we can specifically import them. This will allow us to use the functions without using the dot-notation.

```from fitness import bmi, max_HR

bmi = bmi(80, 1.6)  # weight (kg), height (m)
max_HR = predict_max_HR(76)
```

import x as w. It is also possible to import a module and give it an alias. This is often done to reduce the amount of typing. This type of import if common with numerical python (numpy) `import numpy as np` and pandas (for panel data; dataframes similar to R) `import pandas as pd`. For our current example:

```import fitness as fit

bmi = fit.bmi(80, 1.6)  # weight (kg), height (m)
max_HR = fit.predict_max_HR(76)
bprisk =  fit.bp_risk(143, 91)
```

import x.y as z. It is also possible to provide an alias to a sub-module or function. This approach is often use when import matplotlib for plotting `import matplotlib.pyplot as plt`. This is the same as `from matplotlib import pyplot as plt`. Both produce access to `pyplot` using the alias `plt`. For our current example:

```import fitness.predict_HR_max as HRmax

max_HR = HRmax(76)
```

This is the same as:

```from fitness import predict_HR_max as HRmax

max_HR = HRmax(76)
```

### Putting it all together

We now have a module called `fitness.py` that contains our four functions. We can now import and use these functions to process subject data from any study.

Returning to our original example, we can now write a short processing script that imports our functions and process data from our five subjects:

```from fitness import print_results

# subject =  [initials, height, weight, age, systolic, diastolic]
subject1 = ['GA', 80, 1.6, 70, 120, 80]
subject2 = ['KT', 69, 1.5, 65, 136, 75]
subject3 = ['MN', 80, 1.6, 89, 113, 75]
subject4 = ['PW', 80, 1.7, 55, 141, 96]

subjects = [subject1, subject2, subject3, subject4]

for sub in subjects:
initials, weight, height, age, systolic, diastolic = sub
print_results(initials, weight, height, age, systolic, diastolic)
```

### Summary

We have learned how to use Python functions and modules to not repeat ourselves in the code we write. In addition to confirming to the DRY principle (don’t repeat yourself), using functions and modules help us write easy to read code. Consider our last example. It is clear what the code is doing. The details of how the `fitness` module and the `print_results` function are hidden away from the user in a separate file (i.e., `fitness.py`). Once we have debugged and ensured that the functions in our `fitness` module are correct, we don’t have to see the code each time we use it.

In our next post we will learn more about modules and how we can turn them into stand-alone programs.

• Good summary

Like