HawkRank: a new scoring function for protein–protein docking based on weighted energy terms
Feng et al. J Cheminform (2017) 9:66
https://doi.org/10.1186/s13321-017-0254-7
RESEARCH ARTICLE
Open Access
HawkRank: a new scoring function
for protein–protein docking based on weighted
energy terms
Ting Feng1, Fu Chen1, Yu Kang1, Huiyong Sun1, Hui Liu1, Dan Li1, Feng Zhu1 and Tingjun Hou1,2*
Abstract
Deciphering the structural determinants of protein–protein interactions (PPIs) is essential to gain a deep understanding of many important biological functions in the living cells. Computational approaches for the structural modeling
of PPIs, such as protein–protein docking, are quite needed to complement existing experimental techniques. The
reliability of a protein–protein docking method is dependent on the ability of the scoring function to accurately
distinguish the near-native binding structures from a huge number of decoys. In this study, we developed HawkRank,
a novel scoring function designed for the sampling stage of protein–protein docking by summing the contributions
from several energy terms, including van der Waals potentials, electrostatic potentials and desolvation potentials. First,
based on the solvation free energies predicted by the Generalized Born model for ~ 800 proteins, a SASA (solvent
accessible surface area)-based solvation model was developed, which can give the aqueous solvation free energies
for proteins by summing the contributions of 21 atom types. Then, the van der Waals potentials and electrostatic
potentials based on the Amber ff14SB force field were computed. Finally, the HawkRank scoring function was derived
by determining the most optimal weights for five energy terms based on the training set. Here, MSR (modified
success rate), a novel protein–protein scoring quality index, was used to assess the performance of HawkRank and
three other popular protein–protein scoring functions, including ZRANK, FireDock and dDFIRE. The results show that
HawkRank outperformed the other three scoring functions according to the total number of hits and MSR. HawkRank
is available at http://cadd.zju.edu.cn/programs/hawkrank.
Keywords: Protein–protein interaction, Docking, Scoring, HawkRank
Background
Protein–protein interactions (PPIs) are involved in a
wide variety of biological processes, such as signal transduction [1, 2], transmembrane transport [3, 4], and
antibody-antigen pairing [5, 6]. Deciphering structural
and energetic determinants of PPIs is a prerequisite
to understanding the PPIs-mediated functions in living cells. Unfortunately, only a tiny fraction of protein–
protein complex structures have been characterized by
high-resolution experimental techniques, such as X-ray
crystallography, solution nuclear magnetic resonance
*Correspondence: ;
1
College of Pharmaceutical Sciences, Zhejiang University,
Hangzhou 310058, Zhejiang, China
Full list of author information is available at the end of the article
(NMR) spectroscopy and cryo-electron microscopy
(cryo-EM), which cannot keep pace with the growing
demand in structure-based interactome analysis. Moreover, many weak and/or transient PPIs that play essential
roles in regulating dynamic networks in bio-systems cannot be easily captured by experiments due to their unstable nature. On that account, computational approaches,
especially protein–protein docking, are expected to
provide an alternative and efficient way based on the
unbound protein structures for predicting the binding
complexes and understanding the recognition mechanisms at the atomic level [7–9].
The ultimate goal of protein–protein docking is the
prediction of a near-native structure of the complex from
many docking decoys, which generally falls into two
stages: sampling and refinement. In the sampling stage, a
© The Author(s) 2017. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License
(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium,
provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license,
and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/
publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Feng et al. J Cheminform (2017) 9:66
large number of docking poses are generated and scored
by various scoring functions; and in the refinement stage,
the top-hit poses (or decoys) given by the first stage are
re-scored and re-ranked by more rigorous scoring functions. Apparently, the success of protein–protein docking
is, to a large degree, dependent on the ability of the scoring function to score and rank the decoys accurately. So
far a large number of scoring functions have been developed, ranging from force field-based scoring functions
such as ZRANK and FireDock [10–13], to knowledgebased ones such as dDFIRE and InterEvScore [14–16]
and machine-learning scoring functions [17, 18]. However, recognizing near-native structures from a huge pool
of alternatives is still quite challenging because the accuracy of most scoring functions needs to be improved.
Besides, the ease of use, efficiency and general utility of
the scoring functions should also be taken into account.
Since the establishment of the Critical Assessment of
PRedicted Interactions (CAPRI) campaign [19] in 2001
offers a community-wide platform that assesses the accuracy of protein–protein docking approaches, all related
scoring functions and algorithms can be evaluated by
comparing the submitted structures with the unpublished crystal structures from wide range of participants
including predictors, servers and scorers. In 2010, Kastritis and Bonvin assessed the performances of 9 commonly
used scoring functions and a free energy prediction
algorithm on their ability to predict the binding affinities for 81 complexes [20]. They found that all the tested
scoring functions could not provide reliable predictions
because they all failed to correlate the experimental binding affinities (pKd) with the scores predicted by the corresponding scoring function, with the highest correlation
of only − 0.32. Recently, our group analyzed the prediction results for the 24 targets tested from ROUND14 to
ROUND 28 of CAPRI [21], and we found that, although
the scorers perform better than the uploaders and predictors, they could give relatively high success rates (> 50%)
for only two targets. Therefore, more approaches should
be explored in order to improve the prediction accuracy
of scoring functions for more reliable protein–protein
docking.
In the past decade, more theoretically rigorous free
energy calculation methods, such as Molecular Mechanics/Poisson Boltzmann Surface Area (MM/PBSA) and
Molecular Mechanics/Generalized Born Surface Area
(MM/GBSA), have been employed to predict binding
affinities and identify correct binding structures for protein–protein systems [22–29]. For exam (...truncated)