How to document scientific software

Posted on January 16, 2019 by Joanna Diong Leave a comment

Many people learn to write computer code by attending short courses, or learning off supervisors or colleagues, then figure things out as they go. As beginner or novice programmers, it is not always easy to work out whether code is well documented or efficient. So it is nice when more experienced users offer advice on good coding practices.

A recent PLOS Biology editorial highlights good practices for documenting scientific software. Disclaimer: I’m no computational biologist. And not all points in the editorial will be relevant to everyone. But some are broadly applicable to all users, and especially to researchers in biology and applied sciences. Here is a short summary:

1. Write comments as you code

Comments are probably the most important part of software documentation. In the end, you comment so that others (most often, your future self!) can read and understand your code. Comments are like your lab notebook – they help you remember what you were thinking long after you were doing that thinking.

The best way to write comments is to do it as you write code, rather than at the end. It’s important to comment sufficiently so comments are helpful but not redundant. In general, don’t comment on the detailed actions that the code peforms, but just enough to give an overview. For example:

Redundant comments:

# iterate over the genes in the genome
for sequence in parsed_sequences:
    # call the analyze function, passing it each gene as its argument
    analyze(sequence)

Just enough comments:

# analyze the genome
for sequence in parsed_sequences:
    analyze(sequence)

I also use comments to break up long bits of code:

# --------
# Do stuff
# --------

2. Include a README file with basic information

A README file acts like a homepage for your project’s code: it is used to describe the overall aims and structure of your code (e.g. project title and investigators, input and output files, what actions the source code performs, input and output files, etc.). README files should be saved in formats that can be read across operating systems; text files are a good option. The information in the README file could be the only documentation users will read, so include all necessary and relavant info.

3. Version control your code and documentation

This was a very different learning experience for me, but I can’t overstate how helpful it has been. Using a computer program to record and control different versions of your computer code and documentation is a useful way to keep track of changes over time and record contributions by different investigators. Sometimes I need to run code across more than one computer (e.g. lab computer to collect and post-process data, laptop to analyse data offsite). Version controlling code on a local computer and pushing to or pulling it from a “cloud” allows me to maintain the same version of code seamlessly across multiple computers, as well as maintain the record of changes to the code over time; this provides greater functionality than simple synchronization. Git is a popular version control system, together with its cloud-based platform GitHub. Other systems include Subversion (SVN) and Mercurial.

See our tutorials on Git and GitHub for strategies to learn and use these tools.

Summary

These three strategies are broadly applicable and help make code more user-friendly and generalisable, both for other users and your future self.

The editorial also includes other good practice strategies for code documentation, such as including examples on what the code does, including a help command for programming from the command line, and using automated documentation tools. Readers may follow up on these in time.

Reference

Lee BD (2018) Ten simple rules for documenting scientific software. PLoS Comput Biol 14(12): e1006561.

tagged with Git, GitHub, scientific computing, version control

News & research

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31