A new method for testing reproducibility in systematic reviews was developed, but needs more testing
(2021) 21:157
Pieper et al. BMC Med Res Methodol
https://doi.org/10.1186/s12874-021-01342-6
Open Access
RESEARCH ARTICLE
A new method for testing reproducibility
in systematic reviews was developed, but needs
more testing
Dawid Pieper1* , Simone Heß1 and Clovis Mariano Faggion Jr.2
Abstract
Background: To develop and test an approach to test reproducibility of SRs.
Methods: Case study. We have developed an approach to test reproducibility retrospectively while focusing on the
whole conduct of an SR instead of single steps of it. We replicated the literature searches and drew a 25% random
sample followed by study selection, data extraction, and risk of bias (ROB) assessments performed by two reviewers
independently. These results were compared narratively with the original review.
Results: We were not able to fully reproduce the original search resulting in minor differences in the number of citations retrieved. The biggest disagreements were found in study selection. The most difficult section to be reproduced
was the RoB assessment due to the lack of reporting clear criteria to support the judgement of RoB ratings, although
agreement was still found to be satisfactory.
Conclusion: Our approach as well as other approaches needs to undergo testing and comparison in the future as
the area of testing for reproducibility of SRs is still in its infancy.
Keywords: Systematic reviews, Reproducibility of Results, Methodological quality, Data extraction, Risk of bias,
Information storage and retrieval
Introduction
Systematic reviews (SRs) are essential to inform evidence-based decision making in health care across different groups such as clinicians, patients and policy makers.
Despite this huge importance and potentially resulting
implications for patients-related outcomes, it has been
argued that currently there is a massive production
of unnecessary, misleading, and conflicted systematic
reviews and meta-analyses [1]. Among others, the Lancet
series reducing waste in research recommended research
studies to undergo rigorous independent replication and
*Correspondence:
1
Institute for Research in Operative Medicine, Faculty of Health, School
of Medicine, Witten/Herdecke University, Ostmerheimer Str. 200,
51109 Cologne, Germany
Full list of author information is available at the end of the article
reproducibility checks [2]. In short, replication means
that independent people will collect new data, while
answering the same question. In contrast, reproducibility means that independent people will analyze the same
data [3]. Given the definitions of replication and reproducibility from above, it becomes clear that replicability
should be the ultimate goal and can be regarded to be
placed over reproducibility. However, full and independent replication might not be feasible due to resource constraints. In this case, reproducibility can be seen as a way
to serve as a minimum standard for judging scientific
claims [4].
It was found that reproducible research practices are
uncommon in SRs, and thus limiting the possibility of
testing for reproducibility [5]. Others dealt with single
steps of conducting SRs. For example, studies found the
© The Author(s) 2021. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the
original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or
other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line
to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this
licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativeco
mmons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Pieper et al. BMC Med Res Methodol
(2021) 21:157
reproducibility of search strategies to be poor [6, 7]. Others indicated that study selection, data extraction, risk of
bias assessments and meta-analyses might also lead to
different results depending on the author group involved
[8–11]. This implicates that these steps might also be
difficult to be fully reproducible. Gaps in reproducibility in several steps of a SR potentially results in a lack of
replicability.
Some first ideas have been presented how testing for
reproducibility in SRs could work [12]. However, to the
best of our knowledge no testing of the whole SR instead
of single steps has been conducted. Therefore, we set out
to develop and execute a strategy to test for reproducibility in a SR. Our strategy comprised the reproducibility of
the following steps of a SR: search, selection, data extraction and risk of bias (RoB) assessment.
Methods
The methods section is divided into two parts. The first
part (2.1) describes our developed idea for proportional
testing for reproducibility in systematic reviews (PTRSR).
This approach is tested on a single SR. This is described
in the second part (2.2).
Proportional testing for reproducibility in systematic
reviews (PTRSR)
One of the main ideas of the PTRSR is that it can be
conducted at any time after a SR has been published
(retrospective). This will allow for testing older SRs for
reproducibility as well. At the same time, more than
one reproduction of a SR can be conducted (e.g. by several author groups), and thus giving more power to the
reproducibility test, when assuming that they come to
the same result. Other approaches to test reproducibility
could also include prospective elements (e.g. two independent pairs of researchers working in parallel).
The general idea of the PTRSR is that the formerly
published SR is not reproduced in full, but only for a
given proportion of it. This might increase feasibility
given that obtaining funding and being rewarded in any
Page 2 of 8
way might be difficult to achieve. According to Page
et al. 2016 a therapeutic non-Cochrane SR includes a
median of 14 included studies [13]. Thus, we suggest
starting with a 25% proportion test, i.e. only 25% of the
SR will undergo the reproducibility test. This would
result in approximately 3.5 studies per SR what we
have considered to be the minimal value allowing for
a meaningful test. However, this is an arbitrary choice.
This number needs to be adjusted when the SR does
only include few studies. It should be noted that the
25% do refer to the number of hits obtained from the
literature search, but not to the finally included number
of studies.
In a first step, the reproducibility team (RT) (...truncated)