Nucleos: a web server for the identification of nucleotide-binding sites in protein structures
Luca Parca
0
1
Fabrizio Ferre
0
1
Gabriele Ausiello
0
1
Manuela Helmer-Citterich
0
1
0
Present address: Luca Parca,
Structural and Computational Biology Unit, European Molecular Biology Laboratory
,
Heidelberg, Germany
1
Department of Biology, Centre for Molecular Bioinformatics, University of Rome 'Tor Vergata'
, Via della Ricerca Scientifica snc,
00133 Rome, Italy
Nucleos is a web server for the identification of nucleotide-binding sites in protein structures. Nucleos compares the structure of a query protein against a set of known template 3D binding sites representing nucleotide modules, namely the nucleobase, carbohydrate and phosphate. Structural features, clustering and conservation are used to filter and score the predictions. The predicted nucleotide modules are then joined to build whole nucleotide-binding sites, which are ranked by their score. The server takes as input either the PDB code of the query protein structure or a user-submitted structure in PDB format. The output of Nucleos is composed of ranked lists of predicted nucleotide-binding sites divided by nucleotide type (e.g. ATP-like). For each ranked prediction, Nucleos provides detailed information about the score, the template structure and the structural match for each nucleotide module composing the nucleotide-binding site. The predictions on the query structure and the templatebinding sites can be viewed directly on the web through a graphical applet. In 98% of the cases, the modules composing correct predictions belong to proteins with no homology relationship between each other, meaning that the identification of brandnew nucleotide-binding sites is possible using information from non-homologous proteins. Nucleos is available at http://nucleos.bio.uniroma2.it/nucleos/.
-
The majority of cellular key processes involves a transfer
of energy and genetic information. These processes have in
common the same biological currency, represented by
nucleotides. Different types of nucleotides exist, but all share
the same chemical groups, or modules: the nucleobase, the
carbohydrate and the phosphate group. Given the
ubiquitous nature of nucleotides, it is not surprising that
they were among the earliest cofactors bound by proteins
during evolution (1). The interaction between nucleotides
and proteins has been extensively studied so that many
features that proteins must possess to interact with a
nucleotide have been discovered (24), such as the P-loop
and the Walker A motifs. Some structural features have
been also derived such as the acceptordonoracceptor
environment necessary for the binding of the nucleobase
group (5) and several phosphate-binding structural motifs
(6,7). However, the binding site of a nucleotide cannot be
simply reduced to these features, as some studies
highlighted the large amount of possible conformations, even
not energetically favorable, that can be presented by
nucleotides when bound by proteins (8). Therefore the
identification of binding sites for nucleotides in protein
structures is not an easy task. Different web servers are
available for the identification of nucleotide-interacting
residues in protein sequences, mostly based on machine
learning approaches, like ATPint, GTPbinder,
NADbinder and NsitePred (912). From the structural
point of view, no web server has been dedicated to the
identification of nucleotide-binding sites in protein
structures. Some methods have been developed for the
identification of carbohydrate- and nucleobase-binding sites
(13,14), but no related web services have been produced.
We developed in the past years, a method and a web server
for the identification of phosphate-binding sites in protein
structures (15,16), called Phosfinder. Given this scenario,
we decided to build a web server for the identification of
nucleotide-binding sites based on the concept of
nucleotide modularity, described by Gherardini et al. (17) and
used to predict nucleotide-binding sites in protein
structures (18). This concept is based on the observation that
nucleotides, and their binding sites, are composed of
modules shared by evolutionary unrelated proteins and
combinable in different ways to form binding sites even
for different types of nucleotides. This web server, called
Nucleos, searches for structural similarities between the
query protein structures and a dataset of template
binding sites for nucleotide modules: the nucleobase, the
carbohydrate and the phosphate. Each similarity identifies
a putative binding site for a nucleotide module, evaluated
according to its position in space with respect of the
protein surface and taking into account the conservation
of the involved residues. Complete nucleotide-binding
sites are built combining predicted nucleotide modules
following distance thresholds observed in crystallized
structures of bound nucleotides. Nucleos allows the biologist
user to scan protein structures of interest for binding sites
for different types of nucleotides directly on the web, at
the address http://nucleos.bio.uniroma2.it/nucleos/.
MATERIALS AND METHODS
The Nucleos web server is based on a previously
developed methodology (18) for the identification of
nucleotide-binding sites in protein structures based on the
concept of nucleotide modularity. Binding sites for
nucleotide modules (the nucleobase, the carbohydrate and
the phosphate) are predicted independently; subsequently,
they are joined together to build complete
nucleotidebinding sites.
The Superpose3D (19) structural comparison algorithm is
used to find structural similarities between the query
protein structure and a dataset of template-binding sites
for nucleobase, carbohydrate and phosphate modules
(4657, 3073 and 10 185, respectively). The
templatebinding sites are composed of at least three residues of a
binding pocket interacting with at least one atom of the
ligand. Structural similarities are evaluated by the Root
Mean Square Deviation (RMSD) of the matching residue
atoms and by the BLOSUM62 substitution value of the
residues involved in the similarity. Whenever a structural
similarity is found, the nucleotide module bound by the
template-binding site is transposed onto the query protein
structure following the structural match with the residues
of the query protein. Any predicted module-binding site
placed inside the protein or at less then a specified distance
from the solvent accessible surface of the protein is
discarded. These distances are derived after analyzing the
minimum distances observed by nucleotide modules
from the protein surface in nucleotideprotein complexes;
therefore, a threshold for each nucleotide module is
derived. The remaining predictions of the same type are
clustered together with a hierarchical clustering procedure.
Scoring of predicted binding sites
A clustering score is assigned to each prediction as the
amount of predictions in its cluster. A conservation
score is assigned to each prediction as the sum of the
conservation value of the query pr (...truncated)