Fpocket: An open source platform for ligand pocket detection
Vincent Le Guilloux
2
Peter Schmidtke
1
Pierre Tuffery
0
3
0
Molecules Therapeutiques in silico, INSERM, UMR-S 973, University Paris Diderot - Paris 7
,
Paris
,
France
1
Dpto Fisicoquimica
,
Fac Farmacia, Univ Barcelona, Barcelona
,
Spain
2
ICOA - Institut de chimie organique et analytique - UMR CNRS 6005, Div. of chemoinformatics and molecular modeling, University of Orleans
,
Orleans
,
France
3
Ressource Parisienne en Bioinformatique Structurale, University Paris-Diderot
,
Paris
,
France
Background: Virtual screening methods start to be well established as effective approaches to identify hits, candidates and leads for drug discovery research. Among those, structure based virtual screening (SBVS) approaches aim at docking collections of small compounds in the target structure to identify potent compounds. For SBVS, the identification of candidate pockets in protein structures is a key feature, and the recent years have seen increasing interest in developing methods for pocket and cavity detection on protein surfaces. Results: Fpocket is an open source pocket detection package based on Voronoi tessellation and alpha spheres built on top of the publicly available package Qhull. The modular source code is organised around a central library of functions, a basis for three main programs: (i) Fpocket, to perform pocket identification, (ii) Tpocket, to organise pocket detection benchmarking on a set of known protein-ligand complexes, and (iii) Dpocket, to collect pocket descriptor values on a set of proteins. Fpocket is written in the C programming language, which makes it a platform well suited for the scientific community willing to develop new scoring functions and extract various pocket descriptors on a large scale level. Fpocket 1.0, relying on a simple scoring function, is able to detect 94% and 92% of the pockets within the best three ranked pockets from the holo and apo proteins respectively, outperforming the standards of the field, while being faster. Conclusion: Fpocket provides a rapid, open source and stable basis for further developments related to protein pocket detection, efficient pocket descriptor extraction, or drugablity prediction purposes. Fpocket is freely available under the GNU GPL license at http://fpocket.sourceforge.net.
-
Background
In the recent years, in silico structure based ligand design
(SBLD) has become a major approach for the exploration
of protein function and drug discovery. It has been proven
to be efficient in the identification of molecular probes, in
investigation of molecular recognition, or in the
identification of candidate therapeutic compounds (see for
instance [1,2]). Whereas SBLD encompasses a wide range
of aspects, one approach of importance is structure based
virtual screening (SBVS). In SBVS, one searches, given the
structure of a protein, to dock candidate compounds to
identify those likely to bind into a candidate ligand
binding site (see for instance [3] and references included).
The identification and characterization of pockets and
cavities of a protein structure is a key issue of such process
that has been the subject of an increasing number of
studies in the last decade. Several difficult aspects have to be
considered among which: (i) the candidate pocket
identification itself [4-26]. Here, one needs methods to identify
and delimit cavities at the protein surface that are likely to
bind small compounds. (ii) pocket ranking according to
their likeliness to accept a small drug-like compound as
ligand, for instance. Since often several pockets are
detected at a protein surface, it is necessary to have some
characterization of them in order to select the relevant
ones. Although the largest pocket tends to frequently
correspond to the observed ligand binding site (e.g. [18]),
this rule cannot be generalised. Different studies have
tackled this problem, see for instance [18,19,21,27,28]. It
has in particular been shown that the use of evolutionnary
information such as residue conservation helps re-ranking
the pockets [19,21]. (iii) Last, but not least, there is often
an adaptation the so called induced fit of the pocket
geometry to the formation of a complex with the ligand
(see for instance [29-32]). This last point creates several
issues in terms of pocket detection the pocket could or
could not be properly detected in absence of ligand and
in terms of scoring since scoring functions are strongly
dependent on the quality of the pocket identification and
delimitation, but also are sensitive to conformational
changes. Here, we focus on the primary but central aspect
of candidate pocket identification from structure.
It is not easy to summarise the diversity of approaches that
have been proposed so far for candidate pocket
identification. Roughly, some are based on pure geometric analysis
of the surface of the protein [4-15,18,20,22-26], whereas
some others involve energy calculations [16,17]. Another
way of distinguishing between the various appr (...truncated)