MSPocket: an orientation-independent algorithm for the detection of ligand binding pockets
Hongbo Zhu
0
M. Teresa Pisabarro
0
Associate Editor: Burkhard Rost
0
Structural Bioinformatics, BIOTEC Technical University of Dresden
, Tatzberg 47-51,
01307 Dresden, Germany
Motivation: Identification of ligand binding pockets on proteins is crucial for the characterization of protein functions. It provides valuable information for protein-ligand docking and rational engineering of small molecules that regulate protein functions. A major number of current prediction algorithms of ligand binding pockets are based on cubic grid representation of proteins and, thus, the results are often protein orientation dependent. Results: We present the MSPocket program for detecting pockets on the solvent excluded surface of proteins. The core algorithm of the MSPocket approach does not use any cubic grid system to represent proteins and is therefore independent of protein orientations. We demonstrate that MSPocket is able to achieve an accuracy of 75% in predicting ligand binding pockets on a test dataset used for evaluating several existing methods. The accuracy is 92% if the top three predictions are considered. Comparison to one of the recently published best performing methods shows that MSPocket reaches similar performance with the additional feature of being protein orientation independent. Interestingly, some of the predictions are different, meaning that the two methods can be considered complementary and combined to achieve better prediction accuracy. MSPocket also provides a graphical user interface for interactive investigation of the predicted ligand binding pockets. In addition, we show that overlap criterion is a better strategy for the evaluation of predicted ligand binding pockets than the single point distance criterion. Availability: The MSPocket source code can be downloaded from http://appserver.biotec.tu-dresden.de/MSPocket/. MSPocket is also available as a PyMOL plugin with a graphical user interface. Contact: ; Supplementary information: Supplementary data are available at Bioinformatics online. The Author 2010. Published by Oxford University Press. All rights reserved. For Permissions, please email:
1 INTRODUCTION
The prediction of ligand binding sites on proteins provides
important information for proteinligand docking and
structuralbased rational engineering of small molecules that modulate protein
functions (Campbell et al., 2003; Sotriffer and Klebe, 2002).
Furthermore, comparative analysis of ligand binding pockets is
found to provide valuable information for the understanding of
proteinligand binding specificity (Chen and Honig, 2010).
It has been observed that ligand binding sites often locate in
the largest pockets on protein surfaces (London et al., 2010;
Nayal and Honig, 2006). Thus, the identification of pockets on
protein surfaces plays a key role in the prediction of protein
functional sites, in particular, ligand binding sites. A variety of
computational approaches have been proposed for the prediction
of ligand binding pockets. These methods can be divided into two
categories according to the information they utilize to detect pockets:
geometric approaches that are purely based on the geometric
characteristics of proteins, and comprehensive approaches that
not only consider geometric criteria but also take into account
evolutionary information, interaction energy or chemical properties
of proteins. A major number of these methods, in both categories,
are based on the cubic grid representation of protein structures.
Geometric methods like POCKET (Levitt and Banaszak, 1992),
LIGSITE (Hendlich et al., 1997) and LIGSITEcs (Huang and
Schroeder, 2006) generate 3D grids for proteins and identify surface
pockets as the set of solvent grid points that are situated between
protein grid points. PocketPicker (Weisel et al., 2007) uses grids to
represent proteins and search the environment of each surface grid
along 30 directions for defining pockets. Tripathi and Kellogg (2010)
introduced the VICE program as part of the HINT toolkit (Kellogg
et al., 2005). Similar to PocketPicker, VICE scans grid points along
the path in various directions at each grid points and defines pocket
grids as those with at least half of the scan directions blocked. The
VICE program represents proteins as binary grid maps, in which grid
points occupied by atoms are set to one and the rest zero, such that
the VICE algorithm is performed on only integers and thus is very
efficient. Yu et al. (2010) suggested the Roll algorithm, in which a
probe sphere of radius 2 is used to roll on each slice of the 3D
grid representations of proteins. Pockets are defined to be the regions
between the probe sphere and the protein surface.
The grid representation of proteins is dependent on the orientation
of proteins in the coordinate system. Inconsistent results may
be observed for grid-based methods if the atomic coordinates of
proteins are transformed. One solution to address the problem
of inconsistent results is to incr (...truncated)