Towards reproducible computational drug discovery (pdf)

Article PDF cannot be displayed. You can download it here:

https://jcheminf.biomedcentral.com/track/pdf/10.1186/s13321-020-0408-x

Towards reproducible computational drug discovery

(2020) 12:9 Schaduangrat et al. J Cheminform https://doi.org/10.1186/s13321-020-0408-x Journal of Cheminformatics Open Access REVIEW Towards reproducible computational drug discovery Nalini Schaduangrat1†, Samuel Lampa2†, Saw Simeon3†, Matthew Paul Gleeson4*, Ola Spjuth2* and Chanin Nantasenamat1* Abstract The reproducibility of experiments has been a long standing impediment for further scientific progress. Computational methods have been instrumental in drug discovery efforts owing to its multifaceted utilization for data collection, pre-processing, analysis and inference. This article provides an in-depth coverage on the reproducibility of computational drug discovery. This review explores the following topics: (1) the current state-of-the-art on reproducible research, (2) research documentation (e.g. electronic laboratory notebook, Jupyter notebook, etc.), (3) science of reproducible research (i.e. comparison and contrast with related concepts as replicability, reusability and reliability), (4) model development in computational drug discovery, (5) computational issues on model development and deployment, (6) use case scenarios for streamlining the computational drug discovery protocol. In computational disciplines, it has become common practice to share data and programming codes used for numerical calculations as to not only facilitate reproducibility, but also to foster collaborations (i.e. to drive the project further by introducing new ideas, growing the data, augmenting the code, etc.). It is therefore inevitable that the field of computational drug design would adopt an open approach towards the collection, curation and sharing of data/code. Keywords: Reproducibility, Reproducible research, Drug discovery, Drug design, Open science, Open data, Data sharing, Data science, Bioinformatics, Cheminformatics Introduction Traditional drug discovery and development is well known to be time consuming and cost-intensive encompassing an average of 10 to 15 years until it is ready to reach the market with an estimated cost of 58.8 billion USD as of 2015 [1]. These numbers are a dramatic 10% increase from previous years for both biotechnology *Correspondence: ; ; chanin. † Nalini Schaduangrat, Samuel Lampa and Saw Simeon contributed equally to this work 1 Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, 10700 Bangkok, Thailand 2 Department of Pharmaceutical Biosciences, Uppsala University, 751 24 Uppsala, Sweden 4 Department of Biomedical Engineering, Faculty of Engineering, King Mongkut’s Institute of Technology Ladkrabang, 10520 Bangkok, Thailand Full list of author information is available at the end of the article and pharmaceutical companies. Of the library of 10,000 screened chemical compounds, only 250 or so will move on to further clinical testings. In addition, those that are tested in humans typically do not exceed more than 10 compounds [2]. Furthermore, from a study conducted during 1995 to 2007 by the Tufts Center for the Study of Drug Development revealed that out of all the drugs that make it to Phase I of clinical trials, only 11.83% were eventually approved for market [3]. In addition, during 2006 to 2015, the success rate of those drugs undergoing clinical trials was only 9.6% [4]. The exacerbated cost and high failure rate of this traditional path of drug discovery and development has prompted the need for the use of computer-aided drug discovery (CADD) which encompasses ligand-based, structure-based and systems-based drug design (Fig. 1). Moreover, the major side effects of drugs resulting in severe toxicity evokes the screening of © The Author(s) 2020. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativeco mmons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/ zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Schaduangrat et al. J Cheminform (2020) 12:9 Page 2 of 30 Hit Target Discovery Identify disease modulating target protein Screen for Hit compounds to inhibit target protein Lead Optimization Pre-clinical trials Hit-to-Lead conversion and lead optimization Evaluate pharmacokinetic properties Ligand-based Structure-based Clinical trials Evaluate safety, dosage, e cacy and adverse e ects QSAR modeling Computational chemistry Chemical space Cheminformatics Systems-based Molecular modeling Protein structure prediction Molecular docking Molecular dynamics Network pharmacology Proteochemometric modeling Pathway analysis Fig. 1 Schematic summary of the drug discovery process overlayed with corresponding computational approaches ADMET (adsorption, distribution, metabolism, excretion and toxicity) properties at the early stage of drug development in order to increase the success rate as well as reduce time in screening candidates [5]. The process of CADD begins with the identification of target or hit compound using wet-lab experiments and subsequently via high-throughput screening (HTS). In particular, the typical role of CADD is to screen a library of compounds against the target of interest thereby narrowing the candidates to a few smaller clusters [6]. However, owing to the high requirement of resources for CADD coupled with its extensive costs, opens the door for virtual screening methods such as molecular docking where the known target of interest is screened against a virtual library of compounds. Although this method is highly effective, a crystal structure of the target of interest remains the main criteria required of this approach in generating an in silico binding model. However, in the absence of a crystal structure, homology modeling or de novo prediction models can still be obtained against the large library of compounds to acquire compounds with good binding affinity to the target [7] which are identified as hits and could be further developed as lead compounds [8]. A conceptual map on the experimental and computational methodologies as applied to the drug discovery process is summarized in Fig. 2. In recent years, the expansion of data repositories including those with chemical and phar (...truncated)