Improving protein-ligand docking and screening accuracies by incorporating a scoring function correction term.
Briefings in Bioinformatics, 2022, 23(3), 1–15
https://doi.org/10.1093/bib/bbac051
Problem Solving Protocol
Improving protein–ligand docking and screening
accuracies by incorporating a scoring function
correction term
Liangzhen Zheng
, Jintao Meng, Kai Jiang, Haidong Lan, Zechen Wang
, Mingzhi Lin, Weifeng Li, Hongwei Guo, Yanjie Wei and
Yuguang Mu
Corresponding authors: Yanjie Wei, Institute of Advanced Computing and Digital Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of
Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen, P.R. 518055, China. Tel: +86-021-26907048; E-mail: ; Yuguang Mu, School
of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, 637551, Singapore. Tel: +65-63162885 E-mail:
Abstract
Scoring functions are important components in molecular docking for structure-based drug discovery. Traditional scoring functions,
generally empirical- or force field-based, are robust and have proven to be useful for identifying hits and lead optimizations. Although
multiple highly accurate deep learning- or machine learning-based scoring functions have been developed, their direct applications
for docking and screening are limited. We describe a novel strategy to develop a reliable protein–ligand scoring function by augmenting
the traditional scoring function Vina score using a correction term (OnionNet-SFCT). The correction term is developed based on an
AdaBoost random forest model, utilizing multiple layers of contacts formed between protein residues and ligand atoms. In addition
to the Vina score, the model considerably enhances the AutoDock Vina prediction abilities for docking and screening tasks based on
different benchmarks (such as cross-docking dataset, CASF-2016, DUD-E and DUD-AD). Furthermore, our model could be combined
with multiple docking applications to increase pose selection accuracies and screening abilities, indicating its wide usage for structurebased drug discoveries. Furthermore, in a reverse practice, the combined scoring strategy successfully identified multiple known
receptors of a plant hormone. To summarize, the results show that the combination of data-driven model (OnionNet-SFCT) and
empirical scoring function (Vina score) is a good scoring strategy that could be useful for structure-based drug discoveries and
potentially target fishing in future.
Keywords: scoring function, machine learning, molecular docking, reversal virtual screening, virtual screening
Introduction
Molecular docking is a useful tool for structure-based
drug discoveries [1–4]. Scoring function is one of the
most important components of a docking application
[1]. A well-balanced scoring function should have the
following three abilities. First, it should be fast and
accurate to score and rank the large number of docking
poses generated in docking simulations [1, 5, 6]. From
the application aspect, accurate ligand pose selection
would guide inhibitor design, agonist design, and
enzyme–substrate catalytic mechanism exploration [7–
10]. Second, a good scoring function should be able to
screen the compound libraries, select the right binding
poses and identify the active molecules, which are
usually an extremely small portion of libraries having
affordable computation resources and calculation times
[5, 6, 11]. This ability would facilitate the ‘hits’ molecules
identification via high throughput virtual screening at
quite an early stage in a drug discovery project for small
molecules. Finally, good scoring should allow strong
Liangzhen Zheng, is a postdoctoral research fellow at Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences. He is also a research scientist at
Shanghai Zelixir Biotech Company Ltd. He is working on protein modeling and protein-ligand interaction modeling.
Jintao Meng is an associate researcher at Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences and National Supercomputing Center in
Shenzhen. His research focus on high performance computing, deep learning and bioinformatics.
Kai Jiang is a research associate professor at Department of Biology, Southern University of Science and Technology (SUSTech). He is working on plant hormone
signal transduction and development of plant growth regulators with chemical genetics.
Haidong Lan is a research scientist at Tencent AI Lab. He is working on high performance computing.
Zechen Wang is a master student at School of Physics, Shandong University. He is working on protein-ligand interaction prediction and drug molecule screening.
Mingzhi Lin is a senior engineer at Shanghai Zelixir Biotech Company Ltd. He works on high performance computing and infastructure design.
Weifeng Li is a full professor at School of Physics, Shandong University. He is interested in multi-scale simulations of biological systems and drug development.
Hongwei Guo is a chair professor at Department of Biology and the director of the Institute of Plant and Food Science at Southern University of Science and
Technology (SUSTech). He is expertised in plant hormone signaling and small RNA biology.
Yanjie Wei is the executive director of Center for High Performance Computing at Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences. His
research area is bioinformatics and high performance computing.
Yuguang Mu is an associate professor at School of Biological Sciences, Nanyang Technological University, Singapore. He is working on protein dynamics,
molecular docking and drug discovery.
Received: October 26, 2021. Revised: January 30, 2022. Accepted: January 31, 2022
© The Author(s) 2022. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which
permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
2
|
Zheng et al.
binding between a small molecule and its native receptor
but weak binding toward other off-target proteins [12].
Naturally, by adopting the reverse virtual screening
pipeline, the capacity to identify the native receptor for
a small molecule would be extremely useful in small
molecule off-target evaluation, drug safety evaluation
and target fishing [12–15]. The identification of native
targets for natural products [15] or hormones (both in
animals or plants) is an important area assuming there
exists a reliable reverse screening pipeline that would
help shortlist possible binding proteins for the query
compound with a relatively lower false positive rate.
Moreover, to locate the true target protein, limited wet
lab experiments would be required. The current scoring
functions generally are quite good at the first direction
but not the last two. Some of the most successful
traditional scoring functions for docking and screening
tasks are Vina score (in AutoDock Vina [16]), ChemPLP (in
GOLD [17]) and Glide sco (...truncated)