Errors in science: I make them, do you?

Posted on August 6, 2019 by Martin Héroux Leave a comment

In this, the digital age, we rely heavily on computers.

We use them to store, share and analyse our data. We use them to write and submit our scientific manuscripts. We use them to make figures and present our results. We use them to access others research.

Yes, we scientists use computers a lot.

But are we infallible in our computer use? Do we make zero mistakes organizing, cleaning, and analysing our data?

Of course not. We are human, and that means we make mistakes.

My data analysis story

I conducted my first study almost 20 years ago. There were no USB thumb drives and no .pdf articles. Since then, I have conducted dozens of studies at three universities and one research institution. Despite the thousands of lines of code I have written in that time, not a single supervisor or collaborator has asked to see my code.

To her credit, my PhD supervisor -the first person who taught me to code- did ask to see figures of all the data I collected. And when I reached the point where I was ready to run my statistical analyses, Kathleen, my supervisor, always asked to see the data that was going to be put into the analyses: she wanted to see the dots. Visualizing the data in this way allowed us to identify outliers. It also allowed Kathleen to spot errors in my code, inasmuch as my coding errors resulted in nonsensical or oddly patterned data.

While this ‘look at the data’ lesson was an important one, it falls short of the mark when it comes to software testing and code improvement.

I have never participated in a code review. In fact, I only learned about code reviews a few years ago. I have also never written a formal test for my code. Again, I only learned about formal software testing this year.

That is not to say I have not tested my code. I always run each part of my code to make sure it runs as expected, and then I run larger parts of my code and plot the results. Also, when I am learning a new statistical analysis, I often generate simulated data sets to ensure the results reflect the pattern I injected into the data. When possible, I also ask a colleague or a supervisor to run the analyses in their statistical software of choice to make sure we get the same results. Nevertheless, I have never adopted a systematic approach to testing my software, and I have never written an automated test.

Do I make mistakes? Do I ever! The good news is that by testing my code as I go and plotting all of the raw and processed data and asking colleagues to scrutinise these figures, I have been able to catch many mistakes before results are published. My only official published error occurred with a relatively simple dataset that was analysed in a spreadsheet program similar to Excel. That is not to say that a similar mistake would not have occurred if I had coded the analysis, nor does it mean that I have not made mistakes that were not noticed and are now part of the published literature.

Does this worry me? Yes.

I would like to have 100% confidence in my results, but that might be unrealistic. Nevertheless, I feel there is room for improvement.

Summary

Making errors is part of life. Without them we can’t learn. But I am curious as to whether my story is unique, or if others have also been fudging their way, analysing data hoping that it is correct.

Errors in science make front page news. But those are errors that cost lives or money or both. What I am curious about are the errors that go unnoticed. What is their prevalence in the current literature? How many of these errors are important enough to change the findings of a study?

In my next post, I highlight some of the things I seen along the way.

tagged with code testing, coding, errors, software testing

News & research

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31