Do social science research findings replicate?

An international collaborative team of five laboraties (including one from Innsbruck) published the results of 21 high-powered replications of social science experiments published in ​ Science ​ and ​ Nature. Eight of the 21 studies failed to find significant evidence for the original finding, and the replication effect sizes were about 50% smaller than the original studies.

On 27 August 2018, in ​Nature Human Behavior, a collaborative team of five laboratories published the results of 21 high-powered replications of social science experiments published in ​ Science ​ and ​ Nature, two of the most prestigious journals in science. Among them were Anne Dreber, Felix Holzmeister, Jürgen Huber, Michael Kirchler und Julia Rose from our research platform. The team tried to replicate one main finding from every eligible experimental social science paper published between 2010 and 2015. To extend and improve on prior replication efforts, the team obtained the original materials and received the review and endorsement of the protocols from almost all of the original authors before conducting the studies.

The studies were preregistered to publicly declare the design and analysis plan, and the study design was very high-powered so that the replications would be likely to detect support for the findings even if they were as little as half the size of the original result. “To ensure high statistical power, the average sample size of the replication studies was about five times larger than the average sample size of the original studies”, said Felix Holzmeister, one of the project leaders.

Caution with "statistical significance"

The team found that 13 of the 21 (62%) replications showed significant evidence consistent with the original hypothesis, and other methods of evaluating replication success indicated similar results (ranging from 57% to 67%). Also, on average, the replication studies showed effect sizes that were about 50% smaller than the original studies. Together this suggests that reproducibility is imperfect even among studies published in the most prestigious journals in science. “These results show that “statistically significant” scientific findings need to be interpreted very cautiously until they have been replicated even if published in the most
prestigious journals,” said Michael Kirchler, another of the project leaders.

Wisdom of Crowds

Prior to conducting the replications, the team set up prediction markets for other researchers to bet and earn (or lose) money based on whether they thought each of the findings would replicate. The markets were highly accurate in predicting which studies would later succeed or fail to replicate. The prediction markets correctly predicted the replication outcomes for 18 of the 21 replications and market beliefs about replication were highly correlated with replication effect sizes. Jürgen Huber, another of the project leaders, noted “The findings of the prediction markets suggest that researchers have advance knowledge about the likelihood that some findings will replicate.” It is not yet clear what knowledge is critical, but two possibilities are the plausibility of of the original finding and the strength of the original statistical evidence. The apparent robustness of this phenomenon suggests that prediction markets could be used to help prioritize replication efforts for those studies that have highly important findings, but relatively uncertain or weak likelihood of replication success. Michael Kirchler, added: “Using prediction markets could be another way for the scientific community to use resources more efficiently and accelerate discovery.”

Challenge with replication studies

This study provides additional evidence of the challenges in reproducing published results, and addresses some of the potential criticisms of prior replication attempts. For example, it is possible that higher-profile results would be more reproducible because of high standards and the prestige of the publication outlet. This study selected papers from the most prestigious journals in science. Likewise, a critique of the Reproducibility Project in Psychology suggested that higher powered research designs and fidelity to the original studies would result in high reproducibility. This study had very high powered tests, original materials for all but one study, and the endorsement of protocols for all but two studies and still failed to replicate some findings and found a substantially smaller effect sizes in the replications. “That some of the findings could not be reproduced and some of the replications showed a substantially lower effect size, shows that a substantial increase in statistical power is not sufficient to reproduce all published findings ,” said Julia Rose, one of the co-authors.

That there were replication failures does not mean that those original findings are false. “It is possible that errors in the replication or differences between the original and replication studies are responsible for some failures to replicate, but the fact that the markets predicted replication success and failure accurately in advance reduces the plausibility of these explanations” said Felix Holzmeister. Nevertheless, some original authors provided commentaries with potential reasons for failures to replicate. These productive ideas are worth testing in future research to determine whether the original findings can be reproduced under some conditions.

The nature of science

Brian Nosek, executive director of the Center for Open Science, professor at the University of Virginia, and one of the co-authors, noted “Someone observing these failures to replicate might conclude that science is going in the wrong direction. In fact, science’s greatest strength is its constant self-scrutiny to identify and correct problems and increase the pace of discovery.” This large-scale replication project is just one part of a ongoing reformation of research practices. Researchers, funders, journals, and societies are changing policies and practices to nudge the research culture toward greater openness, rigor, and reproducibility.

Nach oben scrollen