HawkRank: a new scoring function for protein–protein docking based on weighted energy terms (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1186%2Fs13321-017-0254-7.pdf

HawkRank: a new scoring function for protein–protein docking based on weighted energy terms

Feng et al. J Cheminform (2017) 9:66 https://doi.org/10.1186/s13321-017-0254-7 RESEARCH ARTICLE Open Access HawkRank: a new scoring function for protein–protein docking based on weighted energy terms Ting Feng1, Fu Chen1, Yu Kang1, Huiyong Sun1, Hui Liu1, Dan Li1, Feng Zhu1 and Tingjun Hou1,2* Abstract Deciphering the structural determinants of protein–protein interactions (PPIs) is essential to gain a deep understanding of many important biological functions in the living cells. Computational approaches for the structural modeling of PPIs, such as protein–protein docking, are quite needed to complement existing experimental techniques. The reliability of a protein–protein docking method is dependent on the ability of the scoring function to accurately distinguish the near-native binding structures from a huge number of decoys. In this study, we developed HawkRank, a novel scoring function designed for the sampling stage of protein–protein docking by summing the contributions from several energy terms, including van der Waals potentials, electrostatic potentials and desolvation potentials. First, based on the solvation free energies predicted by the Generalized Born model for ~ 800 proteins, a SASA (solvent accessible surface area)-based solvation model was developed, which can give the aqueous solvation free energies for proteins by summing the contributions of 21 atom types. Then, the van der Waals potentials and electrostatic potentials based on the Amber ff14SB force field were computed. Finally, the HawkRank scoring function was derived by determining the most optimal weights for five energy terms based on the training set. Here, MSR (modified success rate), a novel protein–protein scoring quality index, was used to assess the performance of HawkRank and three other popular protein–protein scoring functions, including ZRANK, FireDock and dDFIRE. The results show that HawkRank outperformed the other three scoring functions according to the total number of hits and MSR. HawkRank is available at http://cadd.zju.edu.cn/programs/hawkrank. Keywords: Protein–protein interaction, Docking, Scoring, HawkRank Background Protein–protein interactions (PPIs) are involved in a wide variety of biological processes, such as signal transduction [1, 2], transmembrane transport [3, 4], and antibody-antigen pairing [5, 6]. Deciphering structural and energetic determinants of PPIs is a prerequisite to understanding the PPIs-mediated functions in living cells. Unfortunately, only a tiny fraction of protein– protein complex structures have been characterized by high-resolution experimental techniques, such as X-ray crystallography, solution nuclear magnetic resonance *Correspondence: ; 1 College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China Full list of author information is available at the end of the article (NMR) spectroscopy and cryo-electron microscopy (cryo-EM), which cannot keep pace with the growing demand in structure-based interactome analysis. Moreover, many weak and/or transient PPIs that play essential roles in regulating dynamic networks in bio-systems cannot be easily captured by experiments due to their unstable nature. On that account, computational approaches, especially protein–protein docking, are expected to provide an alternative and efficient way based on the unbound protein structures for predicting the binding complexes and understanding the recognition mechanisms at the atomic level [7–9]. The ultimate goal of protein–protein docking is the prediction of a near-native structure of the complex from many docking decoys, which generally falls into two stages: sampling and refinement. In the sampling stage, a © The Author(s) 2017. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/ publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Feng et al. J Cheminform (2017) 9:66 large number of docking poses are generated and scored by various scoring functions; and in the refinement stage, the top-hit poses (or decoys) given by the first stage are re-scored and re-ranked by more rigorous scoring functions. Apparently, the success of protein–protein docking is, to a large degree, dependent on the ability of the scoring function to score and rank the decoys accurately. So far a large number of scoring functions have been developed, ranging from force field-based scoring functions such as ZRANK and FireDock [10–13], to knowledgebased ones such as dDFIRE and InterEvScore [14–16] and machine-learning scoring functions [17, 18]. However, recognizing near-native structures from a huge pool of alternatives is still quite challenging because the accuracy of most scoring functions needs to be improved. Besides, the ease of use, efficiency and general utility of the scoring functions should also be taken into account. Since the establishment of the Critical Assessment of PRedicted Interactions (CAPRI) campaign [19] in 2001 offers a community-wide platform that assesses the accuracy of protein–protein docking approaches, all related scoring functions and algorithms can be evaluated by comparing the submitted structures with the unpublished crystal structures from wide range of participants including predictors, servers and scorers. In 2010, Kastritis and Bonvin assessed the performances of 9 commonly used scoring functions and a free energy prediction algorithm on their ability to predict the binding affinities for 81 complexes [20]. They found that all the tested scoring functions could not provide reliable predictions because they all failed to correlate the experimental binding affinities (pKd) with the scores predicted by the corresponding scoring function, with the highest correlation of only − 0.32. Recently, our group analyzed the prediction results for the 24 targets tested from ROUND14 to ROUND 28 of CAPRI [21], and we found that, although the scorers perform better than the uploaders and predictors, they could give relatively high success rates (> 50%) for only two targets. Therefore, more approaches should be explored in order to improve the prediction accuracy of scoring functions for more reliable protein–protein docking. In the past decade, more theoretically rigorous free energy calculation methods, such as Molecular Mechanics/Poisson Boltzmann Surface Area (MM/PBSA) and Molecular Mechanics/Generalized Born Surface Area (MM/GBSA), have been employed to predict binding affinities and identify correct binding structures for protein–protein systems [22–29]. For exam (...truncated)