Python virtual environments for scientists with conda part 4

In our previous post we learned how to verify what Python virtual environments were installed on our machine and what Python packages they contained. We also learned how to delete unwanted environments.
In this post we are going to learn how to share our virtual environment with others. This is incredibly useful in this day and age of research reproducibility. By creating a file that details our virtual environment – the version of Python, the packages, the versions of these packages – others will be able to recreate the virtual environment on their machine. With an environment file, they will be able to run our code on our data, using the same Python set-up that we used to generate the results from our publication.
Environment files
As mentioned in our first post, venv
is the default Python package to create and manage virtual environments, and it uses simple text files for its environment files.
However, because of its ease of use and popularity amongst scientists, this series of posts has focuses on conda to manage virtual environments. conda
uses the YAML file format for its environment files. YAML, which stands for “YAML Ain’t Markup Language”, is a human-readable data-serialization language that is commonly used for configuration files.
Generating an environment file
Once we have our Python virtual environment set-up with the correct version of Python and required packages, we can create our environment file. After activating our virtual environment, we can run the following command in a terminal window (Mac, Linux) or the Anaconda prompt (Windows).
(base) /home/martin$ conda activate sci_sound (sci_sound) /home/martin$ conda env export > environment.yml
Simple as that! We now have an environment.yml
file that we can include in top level of our project repository.
For those of you that are curious, the environment.yml
file that we just created looks like this:
name: sci_sound channels: - conda-forge - defaults dependencies: - bzip2=1.0.6=h14c3975_1002 - ca-certificates=2019.3.9=hecc5488_0 - certifi=2019.3.9=py37_0 - libblas=3.8.0=8_openblas - libcblas=3.8.0=8_openblas - libffi=3.2.1=he1b5a44_1006 - libgcc-ng=8.2.0=hdf63c60_1 - libgfortran-ng=7.3.0=hdf63c60_0 - liblapack=3.8.0=8_openblas - libstdcxx-ng=8.2.0=hdf63c60_1 - ncurses=6.1=hf484d3e_1002 - numpy=1.16.3=py37he5ce36f_0 - openblas=0.3.6=h6e990d7_1 - openssl=1.1.1b=h14c3975_1 - pip=19.1=py37_0 - python=3.7.3=h5b0a415_0 - readline=7.0=hf8c457e_1001 - setuptools=41.0.1=py37_0 - sqlite=3.26.0=h67949de_1001 - tk=8.6.9=h84994c4_1001 - wheel=0.33.1=py37_0 - xz=5.2.4=h14c3975_1001 - zlib=1.2.11=h14c3975_1004 - pip: - pygame==1.9.6 prefix: /home/martin/anaconda3/envs/sci_sound
Creating a virtual environment from a YAML environment file
It is easy to create a Python virtual environment from a YAML environment file.
(base) /home/martin$ conda env create -f environment.yml
Review of conda commands
Functionality | Command |
---|---|
new virtual environment | conda create –name python= |
conda create –name sci_sound python=3.7 numpy pandas | |
add package | conda install |
conda install numpy pandas | |
add package from channel | conda install -c |
conda install -c cogsci pygame | |
add package from PyPI | pip install |
pip install pygame | |
activate env | conda activate |
deactivate env | conda deactivate |
deactivate env | conda deactivate |
list environments | conda info –envs |
list env packages | conda list –name |
remove environment | conda remove –name –all |
create environment file | conda env export > environment.yml |
create env from file | conda install -f environment.yml |
Summary
In this series we learned how to create new virtual environments and add packages, add packages available from the wider Anaconda and Python community via channels and pip, list our available environments and their packages and remove environments. In this latest post we learned how to create environment files, which allow others to create the same virtual environment on their own computer.
It is now increasingly encouraged (or required) to include the data and code when publishing a scientific paper. The reproducibility of your analysis pipeline can be enhanced by including a conda
environment file as this allows others to create the same Python set up on their own machines.