Don’t repeat yourself: Python modules

We previously learned to create our own Python functions to reduce how much we repeat ourselves in our code. In this post we see another example of the DRY principle (don’t repeat yourself) and we will learn how to ensure we don’t repeat ourselves between the different programs we write.

A typical (bad) script to process data

Below is an example of a script we might use to process data from five subjects. The data consists of subject initials, height, weight, age, systolic and diastolic blood pressure. We want to calculate each subjects’ BMI, their predicted maximum heart rate and their blood pressure status and then print out the results.

# subject =  [initials, height, weight, age, systolic, diastolic]
subject1 = ['GA', 80, 1.62, 70, 120, 80]
subject2 = ['KT', 69, 1.53, 65, 136, 75]
subject3 = ['MN', 80, 1.66, 89, 113, 72]
subject4 = ['PW', 80, 1.79, 55, 141, 96]
subject5 = ['HJ', 72, 1.60, 61, 121, 78]

# process data for subject1
initials, weight, height, age, systolic, diastolic = subject1 

# Calculate BMI
bmi = int(weight / height**2)
# Caculate predicted maximum heart rate
max_HR = 208 - 0.7 * age
# Caculate blood pressure risk 
if systolic >= 120 and systolic < 130 and diastolic < 80:
    bprisk = 'elevated BP'
elif (systolic >= 130 and systolic < 140) or (diastolic >= 80 and diastolic < 90):
    bprisk = 'stage 1 hypertension'
elif systolic >= 140 or diastolic >= 90:
    bprisk = 'stage 2 hypertension'
else:
    bprisk = 'invalid values'
# Print summary
print("\n\t" + initials)
print("\tweight = {}kg".format(weight))
print("\theight = {}m".format(height))
print("\tage = {} years old".format(age))
print("\tblood pressure = {}/{}".format(systolic, diastolic))
print("\n\tbmi = {}".format(bmi))
print("\tpredicted maximal heart rate = {} bpm".format(max_HR))
print("\tblood pressure = " + bprisk)
print("\n")

# process data for subject2
initials, weight, height, age, systolic, diastolic = subject2

# [copy past code from above]
...

As you can see, we would need to cut-and-paste the majority of the above code another 4 times to process the data for each subject. If we later discovered a mistake in one of our formulas, we would have to fix the mistake in no less than 5 locations in our code. Things get even more complicated if the code was copied-and-pasted into another program.

Using functions to avoid repeating ourselves

def bmi_calc(weight_kg, height_m):
    """Calculate BMI from weight in kg and height in meters"""
    bmi = int(weight_kg / height_m**2)
    return bmi
    
def predict_max_HR(age):
    """Age predicted maximal heart rate"""
    max_HR = 208 - 0.7 * age
    return max_HR
    
def bp_risk(systolic, diastolic):
    """Categorises whether blood pressure is elevated, 
 stage 1 hypertension or stage 2 hypertension"""
    if systolic >= 120 and systolic < 130 and diastolic < 80:
        bprisk = 'elevated BP'
    elif (systolic >= 130 and systolic < 140) or (diastolic >= 80 and diastolic < 90):
        bprisk = 'stage 1 hypertension'
    elif systolic >= 140 or diastolic >= 90:
         bprisk = 'stage 2 hypertension'
    else:
        bprisk = 'invalid values'
    return bprisk
    
def print_results(initials, weight, height, age, systolic, diastolic):
    bmi = bmi_calc(weight, height)
    max_HR = predict_max_HR(age)
    bprisk = bp_risk(systolic, diastolic)
    print("\n\t" + initials)
    print("\tweight = {}kg".format(weight))
    print("\theight = {}m".format(height))
    print("\tage = {} years old".format(age))
    print("\tblood pressure = {}/{}".format(systolic, diastolic))
    print("\n\tbmi = {}".format(bmi))
    print("\tpredicted maximal heart rate = {} bpm".format(max_HR))
    print("\tblood pressure = " + bprisk)
    print("\n")
    
# subject =  [initials, height, weight, age, systolic, diastolic]
subject1 = ['GA', 80, 1.6, 70, 120, 80]
subject2 = ['KT', 69, 1.5, 65, 136, 75]
subject3 = ['MN', 80, 1.6, 89, 113, 75]
subject4 = ['PW', 80, 1.7, 55, 141, 96]

subjects = [subject1, subject2, subject3, subject4]

for sub in subjects:
    initials, weight, height, age, systolic, diastolic = sub
    print_results(initials, weight, height, age, systolic, diastolic)

The output for the first subject looks like this:

GA
    weight = 80kg
    height = 1.6m
    age = 70 years old
    blood pressure = 120/80

    bmi = 31
    predicted maximal heart rate = 159.0 bpm
    blood pressure = stage 1 hypertension

This is a big improvement over the previous version of our code. However, this is still a processing script: code that we copy-and-paste into a Python command line or run as program from the command line. It has a single purpose, which is to process that data from the 5 subjects manually entered.

What if we had a few studies that required us to calculate and print these outcomes? Should we copy-and-paste the code to other scripts? No! Don’t repeat yourself. The best thing to do is create a Python module.

Creating a Python module to reuse code

Creating a Python module is simple. We put all our our functions (just the functions, nothing else) in a file and save it with a .py file extension.

For our current example, we can put all of our function into a file called fitness.py.

def bmi_calc(weight_kg, height_m):
    """Calculate BMI from weight in kg and height in meters"""
    bmi = int(weight_kg / height_m**2)
    return bmi
    
def predict_max_HR(age):
    """Age predicted maximal heart rate"""
    max_HR = 208 - 0.7 * age
    return max_HR
    
def bp_risk(systolic, diastolic):
    """Categorises whether blood pressure is elevated, 
 stage 1 hypertension or stage 2 hypertension"""
    if systolic >= 120 and systolic < 130 and diastolic < 80:
        bprisk = 'elevated BP'
    elif (systolic >= 130 and systolic < 140) or (diastolic >= 80 and diastolic < 90):
        bprisk = 'stage 1 hypertension'
    elif systolic >= 140 or diastolic >= 90:
         bprisk = 'stage 2 hypertension'
    else:
        bprisk = 'invalid values'
    return bprisk
    
def print_results(initials, weight, height, age, systolic, diastolic):
    bmi = bmi_calc(weight, height)
    max_HR = predict_max_HR(age)
    bprisk = bp_risk(systolic, diastolic)
    print("\n\t" + initials)
    print("\tweight = {}kg".format(weight))
    print("\theight = {}m".format(height))
    print("\tage = {} years old".format(age))
    print("\tblood pressure = {}/{}".format(systolic, diastolic))
    print("\n\tbmi = {}".format(bmi))
    print("\tpredicted maximal heart rate = {} bpm".format(max_HR))
    print("\tblood presure = " + bprisk)
    print("\n")

Using a module

We have create a module called fitness.py that contains four functions. We can now use these functions in any project. Importantly, if we later find a bug in our code, we only have to fix it in one location.

There are a few ways to access (or import) the functions we placed in our module.

import. The simplest is to import our module by its name and access its functions using dot-notation. This approach is very transparent because someone reading our code will immediately see that the function comes from a specific module.

import fitness

bmi = fitness.bmi(80, 1.6)  # weight (kg), height (m)
max_HR = fitness.predict_max_HR(76)
bprisk =  fitness.bp_risk(143, 91)

from x import y. If we only want to use one or two of the function from our module, we can specifically import them. This will allow us to use the functions without using the dot-notation.

from fitness import bmi, max_HR

bmi = bmi(80, 1.6)  # weight (kg), height (m)
max_HR = predict_max_HR(76)

import x as w. It is also possible to import a module and give it an alias. This is often done to reduce the amount of typing. This type of import if common with numerical python (numpy) import numpy as np and pandas (for panel data; dataframes similar to R) import pandas as pd. For our current example:

import fitness as fit

bmi = fit.bmi(80, 1.6)  # weight (kg), height (m)
max_HR = fit.predict_max_HR(76)
bprisk =  fit.bp_risk(143, 91)

import x.y as z. It is also possible to provide an alias to a sub-module or function. This approach is often use when import matplotlib for plotting import matplotlib.pyplot as plt. This is the same as from matplotlib import pyplot as plt. Both produce access to pyplot using the alias plt. For our current example:

import fitness.predict_HR_max as HRmax

max_HR = HRmax(76)

This is the same as:

from fitness import predict_HR_max as HRmax

max_HR = HRmax(76)

Putting it all together

We now have a module called fitness.py that contains our four functions. We can now import and use these functions to process subject data from any study.

Returning to our original example, we can now write a short processing script that imports our functions and process data from our five subjects:

from fitness import print_results

# subject =  [initials, height, weight, age, systolic, diastolic]
subject1 = ['GA', 80, 1.6, 70, 120, 80]
subject2 = ['KT', 69, 1.5, 65, 136, 75]
subject3 = ['MN', 80, 1.6, 89, 113, 75]
subject4 = ['PW', 80, 1.7, 55, 141, 96]

subjects = [subject1, subject2, subject3, subject4]

for sub in subjects:
    initials, weight, height, age, systolic, diastolic = sub
    print_results(initials, weight, height, age, systolic, diastolic)

Summary

We have learned how to use Python functions and modules to not repeat ourselves in the code we write. In addition to confirming to the DRY principle (don’t repeat yourself), using functions and modules help us write easy to read code. Consider our last example. It is clear what the code is doing. The details of how the fitness module and the print_results function are hidden away from the user in a separate file (i.e., fitness.py). Once we have debugged and ensured that the functions in our fitness module are correct, we don’t have to see the code each time we use it.

In our next post we will learn more about modules and how we can turn them into stand-alone programs.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s