going the extra yard: Sharing scientific data and computer code
Some scientists have never written a line of code while other scientists have made a career out of it. Between these two extremes is where most of the rest of us live.
Podcaster Michael Kennedy often says that for people who do not have computer science degrees and do not write code for a living, the ability to code is a superpower. You don’t need to know how to code to do human neurophysiology research. But being able to code does give you a leg up. You can analyse more data, you can automate and reuse processing pipelines, and you can generate (and re-generate) figures or tables to your hearts content.
Something I have noticed over the years is that many scientists do not share their code. Also, I suspect that many coding scientists have never had another capable coder go over their code (I know I haven’t!). This is not by choice. Most (if not all) coding scientists would love to have their collaborators and supervisors ask to see their code. But this rarely happens, which is surprising given that code often underpins the results and figures that these collaborators and supervisors put their name (and scientific reputation) to.
Given just how prone we are as humans to making mistakes, especially with boring, repetitive tasks, it seems only logical that the data and statistical analyses that fuel science should be open for inspection and scrutiny. But who has time for this? Your collaborators and supervisors? Funders? Journal reviewers? Journal editors?
Research reproducibility, and to a lesser extent research quality, has gained popularity in recent years. Unfortunately, the vast majority of metrics used by institutions and funders continue to assess, value and reward research quantity. Impact and novelty are also valued, but these too have no causal link with quality. Until there is a substantial and genuine change in the scientific incentive structure, assessing quality and implementing processes that improve quality will for the most part continue be live in the ‘too hard’ category.
The pressure to publish an ever growing number of publications, each having to be important, impactful and reporting statistically significant findings, creates a pressure to work fast (and loose) and find significant, interesting results. The merry-go-round of science seems to be spinning faster and faster. Who can afford to get off and dedicate time, funds and resources to quality?
This may seem depressing. And it is. I rarely hear a fellow researcher or academic speak positively about the future. In fact, senior scientists are telling students that going into medicine and becoming a doctor is the best decision they could make.
But people who get into science do so because they are curious and because they are passionate. They have an itch that needs to be scratched. Thus, for those of you who, like me, love good science and love being scientists, don’t forget why you became a scientist in the first place (it sure as hell isn’t for the money!). Quality is more important than quantity.
While somewhat circuitous, the message I wanted to communicate is that to do quality research, at a societal level, researchers need to open their books and share their code, share their data and share their processes. It may be humbling at first because it also means potentially exposing your mistakes, but being open and transparent can only help. So the next time you are getting reading to submit a paper, consider tidying up your code and your data and making it available to others. And, if like me, your code and data are a mess, submit that paper and never look back. But the next time you plan a study, start by assuming that you will make your code and data public. You will be surprised by how your thinking changes at every step of the way!