Structuring our Python packages

In the previous posts, we learned how to use python scripts and modules to avoid repeating ourselves when we write programs. This is important. If you subsequently discover an error in your code (i.e. a bug), you have to fix it in only one place. Problems arise when you have to find and fix the same bug in multiple places. We also saw how to use the __name__ method to run python modules (a single Python file with useful functions, classes, and variables) as a stand-alone program.

In this post we will learn one option to organise several modules into folders, and provide a nice way for the user to call our various Python classes and functions when they import our package.

Package structure

Let’s pretend that we created a Python package called scipoly. The package allows us to model scientific institutes.

The top-most folder is called package_demo. In contains 4 folders that contain various types of files.

  1. docs: Documentation for our package.
  2. examples: Contains examples of how to use our package.
  3. tests: Contains unit tests; tests that verify our package and all its parts are working correctly and as expected.
  4. scipoly: Contains files to our scipoly package.

package_demo/
├── docs
├── examples
├── tests
└── scipoly
    ├── __init__.py
    ├── actions
    │   ├── confuse.py
    │   ├── help.py
    │   ├── hinder.py
    │   ├── __init__.py
    ├── people
    │   ├── __init__.py
    │   ├── person.py
    │   ├── scientist.py
    │   └── student.py
    └── places
        ├── __init__.py
        ├── lab.py
        ├── office.py
        └── room.py

The files related to this package are available from GitHub here. Note that only the scipoly package is provided in the GitHub repo; the docs, examples and tests are not included.

__init__.py files

You will have noticed that scipoly and the three folders it contains (actions, people, and places) all have an __init__.py file. At his point, these files are empty. But having them in these folders tells the Python interpreter that the files in each of these folders are modules; this means we will be able to import them into our python programs.

As explain in a previous post, you will have to point the Python interpreter to scipoly so that it knows these files and folders exists. This can be done by adding the scipoly folder to your PYTHONPATH system variable, or by adding it at runtime with:

import sys
sys.path.insert(0,'/path/to/folder')

Using our package

Here is a simple example of our package in action:

>>> import scipoly.actions.hinder
>>> scipoly.actions.hinder.days_to_completion(56)
156
>>> from scipoly.actions import confuse
>>> confuse.addition(2, 2)
5
>>> import scipoly.actions.help as h
>>> h.dishes()
I will do the dishes.

As you can see, we were able to import modules from our package (remember that modules are simply individual Python files that contain useful code such as classes and functions) and call some of the functions. We used various import statements for each example. While the shortcut h in the last example results in much less typing in our program, it will be much less clear to someone else (or our future self) reading our code where h came from, especially if the import statement occurred a few hundred lines above.

To ensure we understand what is going on, the folder actions contains a module called confuse.py. This module contains a function called addition:

def addition (a, b):
    return a + b + 1

That is why were able to write the following code:

>>> from scipoly.actions import confuse
>>> confuse.addition(2, 2)
5

Populating one of our __init__.py files

Thus far, with empty __init__.py files, we were able to import modules via their full path. That is, by specifying scipoly, then the subfolder, for example actions and then the module name, for example confuse. While completely transparent, we might like to shorten some of these import statements, but keep them highly informative and transparent.

Lets add the following text the __init__.py file in scipoly/people/:

from .person import Person
from .scientist import Scientist
from .student import Student

What are these lines of code doing? They are importing classes we have coded (Person, Scientist, Student) in the various modules located in the people folder: person.py, scientist.py, and student.py. Note that the . before person, scientist and student tells the Python interpreter to look for these modules in the current folder.

With this code now located in our scipoly/people/__init__.py file, we can use our package as follows:

>>> import scipoly.people as people
>>> w = people.Student(age=22, name="Willson", student_id=23434298)

Why is this better? It will now be very clear where Student comes from (from people). This also means that we will not run into issues with namespaces; that is, even if another package we use contains a class called Student, it will not conflict with our Student class because we will always be calling it using people.Student. This also means that if you create a variable called Student, it will not shadow (i.e. hide) our Student class.

Going all the way

What if we really wanted to provide all our functions and classes from a single scipoly import? We use various modules and folders to organise our code so that it is more logical to work with. But people using our package don’t necessarily need to know or worry about how we organised our package. Moreover, by having everything accessible from a single scipoly import, it means that we can restructure our package as much as we like, as long as we keep the user interface the same (that is, the user will continue to use scipoly.Student() to create new students, but the actual code describing the Student class might be located in an entirely different module, or in a module of itself).

How can we achieve this? First, let’s delete the code we previously added to scipoly/people/__init__.py. Next, let’s add the following code to scipoly/__init__.py:

from .actions.confuse import addition
from .actions.confuse import all_caps
from .actions.confuse import picker

from .actions.help import dishes
from .actions.help import speedup

from .actions.hinder import days_to_completion
from .actions.hinder import guidance

from .people.person import Person
from .people.scientist import Scientist
from .people.student import Student

from .places.room import Room
from .places.lab import Lab
from .places.office import Office

With this in place, we can now use our package as follows:

>>> import scipoly
>>> scipoly.days_to_completion(56)
156
>>> scipoly.addition(2, 2)
5
>>> scipoly.dishes()
I will do the dishes.
>>> w = scipoly.Student(age=22, name="Willson", student_id=23434298)
>>> muscle = scipoly.Lab(number=132, capacity=32, equipment=['stimulator', 'pressure sensor'])
>>> muscle.accident()
Unfortunately, your graduate student just broke your stimulator.
>>> doug = scipoly.Scientist(name='Dr. Peters', age=62, discipline='Phrenology')
>>> room121 = scipoly.Office(number=121, capacity=1, person=doug)
>>> room121.clean()
True
>>> room121.person.name
'Dr. Peters'
>>> room121.person.interact('Julie McNeil')
Hi Julie McNeil, my name is Dr. Peters.

This also means that smart independent development environments (IDEs) will give you hints of what is available from your package when it is imported. Below is a screenshot of on interactive session in Pycharm.

pycharm

Summary

Wow! That was a lot of code and a lot of new concepts. This level of complexity is not always required, but it is something to consider if you have created a large package that will be reused by yourself and others.

 

3 comments

  • Samuel D Gasster

    If my software uses other packages such as numpy, where should the import statements for these other packages go? Should it go in the main program, or an init.py file at the project root? I’m working on cleaning up some code I have and my import statements are scatter across all the module files and main program. A mess. 😦

    BTW: You site is great! I find your python blogs really helpful.

    Like

    • Hi Samuel,

      Glad to hear you enjoy blogs.

      You probably don’t want your various imports in an __init__.py file. You want to important the various packages you need in each of your .py files. However, when you group functions/classes that are related into the same module (i.e. .py file), you often reduce how often you import certain package.

      It is often a good idea to look at other projects to see how they are structured. It might help you figure out if your current structure is on pare with that of others.

      Marty

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s