The natural selection of bad science

As stated in a recent post, the majority of scientific findings are likely false (Ioannidis, 2005). Although there are multiple reasons for this, it has been suggested that the pressure to publish is the driving incentive behind scientists adopting poor research practices that lead to false findings. In a recent paper, Smaldino & McElreath (2016) argue that the pressure to publish is so strong that it is equivalent to the pressure to survive, in a Darwinian sense.
The natural selection of bad science
Bad research and statistical practices have plagued science for decades. In response, researchers have repeatedly highlighted these issues and suggested remedies. However, bad practices persist. Why? The blunt answer is poor methods get results.
If there is to be a change, the incentive that rewards poor research methods needs to be removed. “Publish or perish” is a powerful incentive, one that Smaldino & McElreath (2016) argue has led to the natural selection of bad science. This process does not require conscious strategizing or cheating by researchers. It stems from the positive selection of methods and habits that lead to publications, and follows the same process of natural selection that Darwin used to explain, among other things, the varying shape of bird beaks. When researchers are rewarded for the number of papers they publish, habits which promote publication are naturally selected.
Institutional incentives for scientific researchers
The authors explain how institutional incentives for more papers have created a system where to get ahead and survive, researchers have to learn to play the game of science. Authorship has thus become a coveted commodity, and the intellectual contribution required to be an author has, in some cases, decreased to the point of non-existence (usually benefiting more politically savvy or senior researchers).
But is the number of published papers the sole means of judging the quality of a researcher? Over the years, various measures have been devised to try and quantify scientific impact. One such example is the H-index, which considers not only the number of papers, but also the number of times these papers have been cited in the literature. Unfortunately, “when a measure becomes a target, it ceases to be a good measure”. I for one have been forced by an editor to cite a paper because a reviewer demanded that I reference their work. Furthermore, a colleague was recently encouraged quite strongly by an editor to cite several of the editor’s own papers. This type of behaviour is fraudulent. However, raising the issue will get you nowhere (that was my experience). But can you blame these people? They are only trying to survive.
When researchers learn to play the game of science, the metrics used to measure research quality become strong incentives. Researchers learn which boxes need to be ticked to be successful in grant competitions and promotion applications, even if this success is associated with a higher rate of false discoveries.
Low statistical power: an example of natural selection in science
Many of the effects scientists study are small. Thus, to ensure reliable results, it is important scientists study sufficiently large samples. While studies with appropriate statistical power, say 80%, are what scientists have repeatedly been told to strive for, low statistical power continues to be a problem in science.
Smaldino & McElreath (2016) reviewed the results of review papers on low statistical power in the psychological and social sciences. Overall, mean power was 24%, and it did not increase over six decades!
But why does this practice persist? The answer is that low statistical power has its benefits if researchers want to publish more papers: it increases the rate of false positive findings and inflates the size of reported effects.
An evolutionary model of science
In their paper, Smaldino & McElreath (2016) developed and analyzed a dynamic population model where researchers compete for prestige and jobs in which the currency of fitness is the number of publications, and more successful labs have more offspring that inherit their methods.
The model showed that when the main incentive is to publish more papers, studies with low statistical power will be favoured. Even when statistical power was kept constant, the model revealed that natural selection favoured labs that exert less effort to ensure the quality of their work. While these practices increase publication rates, they also increase the number of false discoveries.
It was interesting to note that the lack of replication, or the inability of some researchers to reproduce the work of others, did not have a large effect on the continued success of researchers who publish false discoveries. As stated by Smaldino & McElreath (2016), replication is not sufficient to stop the selection of bad science because the top performing labs will always be those who are able to cut corners. Replication will penalize bad science, but unless all published studies are replicated several times, some researchers will avoid being caught. Given that there are finite career opportunities and a high degree of connectivity between researchers and institutions, the benefit of being in the top tier of publications may be orders of magnitude higher than a respectable publication record. Thus, replication will not stop bad science.
So much for the idea that science is self-correcting!
What can be done?
The present study confirmed what many, including myself, have know intuitively: incentives that reward publication quantity will lead to the natural selection of methods that produce publishable results.
Unfortunately, reversing this evolutionary process is not simple. As pointed out by the authors, institutional change is difficult to accomplish and it is usually detrimental to early adopters. Furthermore, simply asking journals and peer reviewers to raise their standards is unlikely to be successful if success continues to be tied to publication numbers.
Finding a new way to assess researchers is not a simple task. Research quality is difficult to quantify. While I have no doubt that a room full of the top researchers would be able to distinguish the good from the bad, who has the time to evaluate each piece of published research with such rigour? Nevertheless, researchers and institutions are encouraged to resist bean counting. Quality research should be encouraged and the whole of the research process, not only the papers it produces, should be valued. For example, sharing data and computer code with the wider research community is an additional step that requires time, effort and planning. Such practices should be commended and valued. Maybe each publication should be weighted based on whether these additional steps are followed.
Unfortunately, researchers in positions of power likely got there as the result of this natural selection, and I am not sure these researchers will like hearing that the length of their beaks or the pinkness of their plumage is no longer the sole measure of scientific merit. Thus, it will likely take some time to change the rules of the game (of science).
Reference
Ioannidis JP (2005). Why most published research findings are false. PLoS Med 2: e124.
Smaldino PE, McElreath R (2016). The natural selection of bad science. arXiv:1605.09511.