GitHub as a scientific tool

There is much talk about open science lately. One aspect of this movement towards more transparent and reproducible science is the open sharing of data and computer code. As highlighted in a recent article in Nature, Github is increasingly being used by scientists to do just that!

Github in science

Git is a powerful distributed version control tool that can be used to track the evolution of computer code, manuscripts, study notes, etc. Think of it as a better, and more scientific, way to “Track Changes”. GitHub is a web-based Git repository hosting service that makes it easy to collaborate with colleagues and share your work with others.

File types. Because Git records line by line how files have changes, it works well with text files such as source code, manuscripts written in LaTeX, and CSV files. Git cannot keep track of changes to non-human-readable binary files such as .docx, images, etc.

GitHub has over 15 million users and is increasingly popular with researchers to share, maintain and update scientific data sets and code (see 1). Many other websites also allow data to be shared, but GitHub is specifically designed for transparent, open collaboration because it uses version control software to track every change made to code, data, manuscripts, lab notebooks, etc. In this way, Git and GitHub provide a means of keeping a lasting record of events.

nature_github2


Figure 1: Copyright © 2016, Rights Managed by Nature Publishing Group

GitHub for data

It makes most sense for researchers to use GitHub for relatively small, text-based data sets that are actively being updated, curated and maintained by groups of scientists. Because data sets on GitHub can be changed or deleted, the site should not be used as a permanently citable archive. However, there are tools and services that allow researchers to generate snapshots of GitHub repositories with a citable Digital Object Idenfier (DIO).

Summary

Sharing data and code is essential to open science, and scientists are encouraged to look to well-established tools such as Git and GitHub as a means of achieving this goal.

References.

Perkel J (2016). Democratic databases: science on GitHub. Nature, 538:127–128.

 

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s