The Natural Selection of Bad Science. Smaldino, P. E. & McElreath, R. Royal Society Open Science, 3(9):160384+, September, 2016.
doi  abstract   bibtex   
Poor research design and data analysis encourage false-positive findings. Such poor methods persist despite perennial calls for improvement, suggesting that they result from something more than just misunderstanding. The persistence of poor methods results partly from incentives that favour them, leading to the natural selection of bad science. This dynamic requires no conscious strategizing – no deliberate cheating nor loafing – by scientists, only that publication is a principal factor for career advancement. Some normative methods of analysis have almost certainly been selected to further publication instead of discovery. In order to improve the culture of science, a shift must be made away from correcting misunderstandings and towards rewarding understanding. We support this argument with empirical evidence and computational modelling. We first present a 60-year meta-analysis of statistical power in the behavioural sciences and show that power has not improved despite repeated demonstrations of the necessity of increasing power. To demonstrate the logical consequences of structural incentives, we then present a dynamic model of scientific communities in which competing laboratories investigate novel or previously published hypotheses using culturally transmitted research methods. As in the real world, successful labs produce more 'progeny,' such that their methods are more often copied and their students are more likely to start labs of their own. Selection for high output leads to poorer methods and increasingly high false discovery rates. We additionally show that replication slows but does not stop the process of methodological deterioration. Improving the quality of research requires change at the institutional level. [Excerpt: Discussion] Incentives drive cultural evolution. In the scientific community, incentives for publication quantity can drive the evolution of poor methodological practices. We have provided some empirical evidence that this occurred, as well as a general model of the process. If we want to improve how our scientific culture functions, we must consider not only the individual behaviours we wish to change, but also the social forces that provide affordances and incentives for those behaviours. [...] [] An incentive structure that rewards publication quantity will, in the absence of countervailing forces, select for methods that produce the greatest number of publishable results. This, in turn, will lead to the natural selection of poor methods and increasingly high false discovery rates. Although we have focused on false discoveries, there are additional negative repercussions of this kind of incentive structure. Scrupulous research on difficult problems may require years of intense work before yielding coherent, publishable results. If shallower work generating more publications is favoured, then researchers interested in pursuing complex questions may find themselves without jobs, perhaps to the detriment of the scientific community more broadly. [] Good science is in some sense a public good, and as such may be characterized by the conflict between cooperation and free riding. We can think of cooperation here as the opportunity to create group-beneficial outcomes (i.e. quality research) at a personal cost (i.e. diminished 'fitness' in terms of academic success). To those familiar with the game theory of cooperative dilemmas, it might therefore appear that continued contributions to the public good – cooperation rather than free riding – could be maintained through the same mechanisms known to promote cooperation more generally, including reciprocity, monitoring and punishment. However, the logic of cooperation requires that the benefit received by cooperators can be measured in the same units as the pay-off to free riders: i.e. units of evolutionary fitness. It is possible that coalitions of rigorous scientists working together will generate greater output than less rigorous individuals working in isolation. And indeed, there has been an increase in highly collaborative work in many fields. Nevertheless, such collaboration may also be a direct response to incentives for publication quantity, as contributing a small amount to many projects generates more publications than does contributing a large amount to few projects. Cooperation in the sense of higher quality research provides a public good in the sense of knowledge, but not in the sense of fitness for the cultural evolution of methodology. Purely bottom-up solutions are therefore unlikely to be sufficient. That said, changing attitudes about the assessment of scientists is vital to making progress, and is a driving motivation for this presentation. [] [...] [] Whenever quantitative metrics are used as proxies to evaluate and reward scientists, those metrics become open to exploitation if it is easier to do so than to directly improve the quality of research. Institutional guidelines for evaluation at least partly determine how researchers devote their energies, and thereby shape the kind of science that gets done. A real solution is likely to be patchwork, in part because accurately rewarding quality is difficult. Real merit takes time to manifest, and scrutinizing the quality of another's work takes time from already busy schedules. Competition for jobs and funding is stiff, and reviewers require some means to assess researchers. Moreover, individuals differ on their criteria for excellence. Boiling down an individual's output to simple, objective metrics, such as number of publications or journal impacts, entails considerable savings in terms of time, energy and ambiguity. Unfortunately, the long-term costs of using simple quantitative metrics to assess researcher merit are likely to be quite great. If we are serious about ensuring that our science is both meaningful and reproducible, we must ensure that our institutions incentivize that kind of science.
@article{smaldinoNaturalSelectionBad2016,
  title = {The Natural Selection of Bad Science},
  author = {Smaldino, Paul E. and McElreath, Richard},
  year = {2016},
  month = sep,
  volume = {3},
  pages = {160384+},
  issn = {2054-5703},
  doi = {10.1098/rsos.160384},
  abstract = {Poor research design and data analysis encourage false-positive findings. Such poor methods persist despite perennial calls for improvement, suggesting that they result from something more than just misunderstanding. The persistence of poor methods results partly from incentives that favour them, leading to the natural selection of bad science. This dynamic requires no conscious strategizing -- no deliberate cheating nor loafing -- by scientists, only that publication is a principal factor for career advancement. Some normative methods of analysis have almost certainly been selected to further publication instead of discovery. In order to improve the culture of science, a shift must be made away from correcting misunderstandings and towards rewarding understanding. We support this argument with empirical evidence and computational modelling. We first present a 60-year meta-analysis of statistical power in the behavioural sciences and show that power has not improved despite repeated demonstrations of the necessity of increasing power. To demonstrate the logical consequences of structural incentives, we then present a dynamic model of scientific communities in which competing laboratories investigate novel or previously published hypotheses using culturally transmitted research methods. As in the real world, successful labs produce more 'progeny,' such that their methods are more often copied and their students are more likely to start labs of their own. Selection for high output leads to poorer methods and increasingly high false discovery rates. We additionally show that replication slows but does not stop the process of methodological deterioration. Improving the quality of research requires change at the institutional level.

[Excerpt: Discussion]

Incentives drive cultural evolution. In the scientific community, incentives for publication quantity can drive the evolution of poor methodological practices. We have provided some empirical evidence that this occurred, as well as a general model of the process. If we want to improve how our scientific culture functions, we must consider not only the individual behaviours we wish to change, but also the social forces that provide affordances and incentives for those behaviours. [...]

[] An incentive structure that rewards publication quantity will, in the absence of countervailing forces, select for methods that produce the greatest number of publishable results. This, in turn, will lead to the natural selection of poor methods and increasingly high false discovery rates. Although we have focused on false discoveries, there are additional negative repercussions of this kind of incentive structure. Scrupulous research on difficult problems may require years of intense work before yielding coherent, publishable results. If shallower work generating more publications is favoured, then researchers interested in pursuing complex questions may find themselves without jobs, perhaps to the detriment of the scientific community more broadly.

[] Good science is in some sense a public good, and as such may be characterized by the conflict between cooperation and free riding. We can think of cooperation here as the opportunity to create group-beneficial outcomes (i.e. quality research) at a personal cost (i.e. diminished 'fitness' in terms of academic success). To those familiar with the game theory of cooperative dilemmas, it might therefore appear that continued contributions to the public good -- cooperation rather than free riding -- could be maintained through the same mechanisms known to promote cooperation more generally, including reciprocity, monitoring and punishment. However, the logic of cooperation requires that the benefit received by cooperators can be measured in the same units as the pay-off to free riders: i.e. units of evolutionary fitness. It is possible that coalitions of rigorous scientists working together will generate greater output than less rigorous individuals working in isolation. And indeed, there has been an increase in highly collaborative work in many fields. Nevertheless, such collaboration may also be a direct response to incentives for publication quantity, as contributing a small amount to many projects generates more publications than does contributing a large amount to few projects. Cooperation in the sense of higher quality research provides a public good in the sense of knowledge, but not in the sense of fitness for the cultural evolution of methodology. Purely bottom-up solutions are therefore unlikely to be sufficient. That said, changing attitudes about the assessment of scientists is vital to making progress, and is a driving motivation for this presentation.

[] [...]

[] Whenever quantitative metrics are used as proxies to evaluate and reward scientists, those metrics become open to exploitation if it is easier to do so than to directly improve the quality of research. Institutional guidelines for evaluation at least partly determine how researchers devote their energies, and thereby shape the kind of science that gets done. A real solution is likely to be patchwork, in part because accurately rewarding quality is difficult. Real merit takes time to manifest, and scrutinizing the quality of another's work takes time from already busy schedules. Competition for jobs and funding is stiff, and reviewers require some means to assess researchers. Moreover, individuals differ on their criteria for excellence. Boiling down an individual's output to simple, objective metrics, such as number of publications or journal impacts, entails considerable savings in terms of time, energy and ambiguity. Unfortunately, the long-term costs of using simple quantitative metrics to assess researcher merit are likely to be quite great. If we are serious about ensuring that our science is both meaningful and reproducible, we must ensure that our institutions incentivize that kind of science.},
  archivePrefix = {arXiv},
  eprint = {1605.09511},
  eprinttype = {arxiv},
  journal = {Royal Society Open Science},
  keywords = {*imported-from-citeulike-INRMM,~INRMM-MiD:c-14143905,~to-add-doi-URL,competition,cooperation,emergent-property,epistemology,evolution,feedback,game-theory,publication-bias,publish-or-perish,replicability,reproducibility,reproducible-research,research-management,research-metrics,rewarding-best-research-practices,science-ethics},
  lccn = {INRMM-MiD:c-14143905},
  number = {9}
}

Downloads: 0