Improving protein-ligand docking and screening accuracies by incorporating a scoring function correction term. (pdf)

Article PDF cannot be displayed. You can download it here:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9116214/pdf/

Improving protein-ligand docking and screening accuracies by incorporating a scoring function correction term.

Briefings in Bioinformatics, 2022, 23(3), 1–15 https://doi.org/10.1093/bib/bbac051 Problem Solving Protocol Improving protein–ligand docking and screening accuracies by incorporating a scoring function correction term Liangzhen Zheng , Jintao Meng, Kai Jiang, Haidong Lan, Zechen Wang , Mingzhi Lin, Weifeng Li, Hongwei Guo, Yanjie Wei and Yuguang Mu Corresponding authors: Yanjie Wei, Institute of Advanced Computing and Digital Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen, P.R. 518055, China. Tel: +86-021-26907048; E-mail: ; Yuguang Mu, School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, 637551, Singapore. Tel: +65-63162885 E-mail: Abstract Scoring functions are important components in molecular docking for structure-based drug discovery. Traditional scoring functions, generally empirical- or force field-based, are robust and have proven to be useful for identifying hits and lead optimizations. Although multiple highly accurate deep learning- or machine learning-based scoring functions have been developed, their direct applications for docking and screening are limited. We describe a novel strategy to develop a reliable protein–ligand scoring function by augmenting the traditional scoring function Vina score using a correction term (OnionNet-SFCT). The correction term is developed based on an AdaBoost random forest model, utilizing multiple layers of contacts formed between protein residues and ligand atoms. In addition to the Vina score, the model considerably enhances the AutoDock Vina prediction abilities for docking and screening tasks based on different benchmarks (such as cross-docking dataset, CASF-2016, DUD-E and DUD-AD). Furthermore, our model could be combined with multiple docking applications to increase pose selection accuracies and screening abilities, indicating its wide usage for structurebased drug discoveries. Furthermore, in a reverse practice, the combined scoring strategy successfully identified multiple known receptors of a plant hormone. To summarize, the results show that the combination of data-driven model (OnionNet-SFCT) and empirical scoring function (Vina score) is a good scoring strategy that could be useful for structure-based drug discoveries and potentially target fishing in future. Keywords: scoring function, machine learning, molecular docking, reversal virtual screening, virtual screening Introduction Molecular docking is a useful tool for structure-based drug discoveries [1–4]. Scoring function is one of the most important components of a docking application [1]. A well-balanced scoring function should have the following three abilities. First, it should be fast and accurate to score and rank the large number of docking poses generated in docking simulations [1, 5, 6]. From the application aspect, accurate ligand pose selection would guide inhibitor design, agonist design, and enzyme–substrate catalytic mechanism exploration [7– 10]. Second, a good scoring function should be able to screen the compound libraries, select the right binding poses and identify the active molecules, which are usually an extremely small portion of libraries having affordable computation resources and calculation times [5, 6, 11]. This ability would facilitate the ‘hits’ molecules identification via high throughput virtual screening at quite an early stage in a drug discovery project for small molecules. Finally, good scoring should allow strong Liangzhen Zheng, is a postdoctoral research fellow at Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences. He is also a research scientist at Shanghai Zelixir Biotech Company Ltd. He is working on protein modeling and protein-ligand interaction modeling. Jintao Meng is an associate researcher at Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences and National Supercomputing Center in Shenzhen. His research focus on high performance computing, deep learning and bioinformatics. Kai Jiang is a research associate professor at Department of Biology, Southern University of Science and Technology (SUSTech). He is working on plant hormone signal transduction and development of plant growth regulators with chemical genetics. Haidong Lan is a research scientist at Tencent AI Lab. He is working on high performance computing. Zechen Wang is a master student at School of Physics, Shandong University. He is working on protein-ligand interaction prediction and drug molecule screening. Mingzhi Lin is a senior engineer at Shanghai Zelixir Biotech Company Ltd. He works on high performance computing and infastructure design. Weifeng Li is a full professor at School of Physics, Shandong University. He is interested in multi-scale simulations of biological systems and drug development. Hongwei Guo is a chair professor at Department of Biology and the director of the Institute of Plant and Food Science at Southern University of Science and Technology (SUSTech). He is expertised in plant hormone signaling and small RNA biology. Yanjie Wei is the executive director of Center for High Performance Computing at Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences. His research area is bioinformatics and high performance computing. Yuguang Mu is an associate professor at School of Biological Sciences, Nanyang Technological University, Singapore. He is working on protein dynamics, molecular docking and drug discovery. Received: October 26, 2021. Revised: January 30, 2022. Accepted: January 31, 2022 © The Author(s) 2022. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. 2 | Zheng et al. binding between a small molecule and its native receptor but weak binding toward other off-target proteins [12]. Naturally, by adopting the reverse virtual screening pipeline, the capacity to identify the native receptor for a small molecule would be extremely useful in small molecule off-target evaluation, drug safety evaluation and target fishing [12–15]. The identification of native targets for natural products [15] or hormones (both in animals or plants) is an important area assuming there exists a reliable reverse screening pipeline that would help shortlist possible binding proteins for the query compound with a relatively lower false positive rate. Moreover, to locate the true target protein, limited wet lab experiments would be required. The current scoring functions generally are quite good at the first direction but not the last two. Some of the most successful traditional scoring functions for docking and screening tasks are Vina score (in AutoDock Vina [16]), ChemPLP (in GOLD [17]) and Glide sco (...truncated)