A new method for testing reproducibility in systematic reviews was developed, but needs more testing (pdf)

Article PDF cannot be displayed. You can download it here:

https://bmcmedresmethodol.biomedcentral.com/track/pdf/10.1186/s12874-021-01342-6

A new method for testing reproducibility in systematic reviews was developed, but needs more testing

(2021) 21:157 Pieper et al. BMC Med Res Methodol https://doi.org/10.1186/s12874-021-01342-6 Open Access RESEARCH ARTICLE A new method for testing reproducibility in systematic reviews was developed, but needs more testing Dawid Pieper1* , Simone Heß1 and Clovis Mariano Faggion Jr.2 Abstract Background: To develop and test an approach to test reproducibility of SRs. Methods: Case study. We have developed an approach to test reproducibility retrospectively while focusing on the whole conduct of an SR instead of single steps of it. We replicated the literature searches and drew a 25% random sample followed by study selection, data extraction, and risk of bias (ROB) assessments performed by two reviewers independently. These results were compared narratively with the original review. Results: We were not able to fully reproduce the original search resulting in minor differences in the number of citations retrieved. The biggest disagreements were found in study selection. The most difficult section to be reproduced was the RoB assessment due to the lack of reporting clear criteria to support the judgement of RoB ratings, although agreement was still found to be satisfactory. Conclusion: Our approach as well as other approaches needs to undergo testing and comparison in the future as the area of testing for reproducibility of SRs is still in its infancy. Keywords: Systematic reviews, Reproducibility of Results, Methodological quality, Data extraction, Risk of bias, Information storage and retrieval Introduction Systematic reviews (SRs) are essential to inform evidence-based decision making in health care across different groups such as clinicians, patients and policy makers. Despite this huge importance and potentially resulting implications for patients-related outcomes, it has been argued that currently there is a massive production of unnecessary, misleading, and conflicted systematic reviews and meta-analyses [1]. Among others, the Lancet series reducing waste in research recommended research studies to undergo rigorous independent replication and *Correspondence: 1 Institute for Research in Operative Medicine, Faculty of Health, School of Medicine, Witten/Herdecke University, Ostmerheimer Str. 200, 51109 Cologne, Germany Full list of author information is available at the end of the article reproducibility checks [2]. In short, replication means that independent people will collect new data, while answering the same question. In contrast, reproducibility means that independent people will analyze the same data [3]. Given the definitions of replication and reproducibility from above, it becomes clear that replicability should be the ultimate goal and can be regarded to be placed over reproducibility. However, full and independent replication might not be feasible due to resource constraints. In this case, reproducibility can be seen as a way to serve as a minimum standard for judging scientific claims [4]. It was found that reproducible research practices are uncommon in SRs, and thus limiting the possibility of testing for reproducibility [5]. Others dealt with single steps of conducting SRs. For example, studies found the © The Author(s) 2021. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativeco mmons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Pieper et al. BMC Med Res Methodol (2021) 21:157 reproducibility of search strategies to be poor [6, 7]. Others indicated that study selection, data extraction, risk of bias assessments and meta-analyses might also lead to different results depending on the author group involved [8–11]. This implicates that these steps might also be difficult to be fully reproducible. Gaps in reproducibility in several steps of a SR potentially results in a lack of replicability. Some first ideas have been presented how testing for reproducibility in SRs could work [12]. However, to the best of our knowledge no testing of the whole SR instead of single steps has been conducted. Therefore, we set out to develop and execute a strategy to test for reproducibility in a SR. Our strategy comprised the reproducibility of the following steps of a SR: search, selection, data extraction and risk of bias (RoB) assessment. Methods The methods section is divided into two parts. The first part (2.1) describes our developed idea for proportional testing for reproducibility in systematic reviews (PTRSR). This approach is tested on a single SR. This is described in the second part (2.2). Proportional testing for reproducibility in systematic reviews (PTRSR) One of the main ideas of the PTRSR is that it can be conducted at any time after a SR has been published (retrospective). This will allow for testing older SRs for reproducibility as well. At the same time, more than one reproduction of a SR can be conducted (e.g. by several author groups), and thus giving more power to the reproducibility test, when assuming that they come to the same result. Other approaches to test reproducibility could also include prospective elements (e.g. two independent pairs of researchers working in parallel). The general idea of the PTRSR is that the formerly published SR is not reproduced in full, but only for a given proportion of it. This might increase feasibility given that obtaining funding and being rewarded in any Page 2 of 8 way might be difficult to achieve. According to Page et al. 2016 a therapeutic non-Cochrane SR includes a median of 14 included studies [13]. Thus, we suggest starting with a 25% proportion test, i.e. only 25% of the SR will undergo the reproducibility test. This would result in approximately 3.5 studies per SR what we have considered to be the minimal value allowing for a meaningful test. However, this is an arbitrary choice. This number needs to be adjusted when the SR does only include few studies. It should be noted that the 25% do refer to the number of hits obtained from the literature search, but not to the finally included number of studies. In a first step, the reproducibility team (RT) (...truncated)