Good enough practices in scientific computing
Three years ago, Greg Wilson of Software Carpentry fame, along with fellow members of the Software Carpentry Community, published an article entitled Best Practices for Scientific Computing. This article is a great reference and covers all aspects of scientific computing. However, the paper could be intimidating to novices. To remedy this, Greg Wilson et al. recently published a follow-up paper entitled Good Enough Practices in Scientific Computing, which aims to present a set of “good enough” practices that represent a minimum set of tools and techniques that they believe every researcher can and should adopt.
The authors emphasize that these recommendation are primarily to help your most important collaborator, your future self. Importantly, they also acknowledge that change is hard and that if researchers do not quickly see the benefits of adopting these recommendations, they will likely revert back to their old practices. Thus, the early benefits need to outweigh the pain that comes with learning a new way doing things.
Wilson and colleagues make the following recommendations to researchers:
- Save the raw data.
- Create the data you wish to see in the world (file formats, variable names, filenames, etc).
- Create analysis-friendly data.
- Make each column a variable.
- Make each row an observation.
- Record all the steps used to process data.
- Anticipate the need to use multiple tables.
- Submit data to a reputable DOI-issuing repository so that others can access and cite it.
- Place a brief explanatory comment at the start of every program.
- Decompose programs into functions.
- Be ruthless about eliminating duplication.
- Give functions and variables meaningful names.
- Make dependencies and requirements explicit.
- Do not comment and uncomment section of code to control a program’s behaviour.
- Provide a simple example or test data set.
- Submit code to a reputable DOI-issuing repository.
- Create an overview of your project.
- Create a shared public “to do” list.
- Make the software license explicit.
- Make the project citable.
- Put each project in its own directory, which is named after the project.
- Put text documents associated with the project in the
- Put raw data and metadata in a
datadirectory, and files generated during cleanup and analysis in a
- Put project source code in the
- Put external scripts, or compiled programs in the
- Name all files to reflect their content or function.
Keeping Track of Changes
- Back up (almost) everything created by a human being as soon as it is created.
- Keep changes small.
- Share changes frequently.
- Create, maintain, and use a checklist for saving and sharing changes.
- Store each project in a folder that is mirrored off the researcher’s working machine.
- Add a file called
CHANGELOG.txtto the project’s
- Copy the entire project whenever a significant change has been made.
Version Control System.
- Use a version control system.
Instead of an email-based workflow, try to make writing scalable, collaborative, and reproducible. Wilson et al. point out that the workflow you choose is less important than having all authors agree on the workflow before writing starts.
- Write manuscripts using online tools with rich formatting, change tracking, and reference management (e.g., Google Docs).
- Write the manuscript in a plain text format that permits version control.
Manuscript writing.The recommendation for writing manuscripts in plain text formats will likely be met with resistance by some collaborators. People are attached to the graphical user interface word processors and various tools such as reference managers. Wilson et al. make some good points on the issue of manuscript writing, and also include feedback provided to them by reviewers.
The recommendations put forth in this paper are simple and to the point. However, their implementation does require some effort and planning. Rather than get overwhelmed by how many recommendations there are, consider reading through a section and implementing those in your next project. Or select a few of the lowest hanging fruit—the easiest recommendations to implement—to start with.
At a broader level, institutions, universities and granting agencies will value the importance of good computing practices and support their researchers with training opportunities.