Nucleos: a web server for the identification of nucleotide-binding sites in protein structures (pdf)

Article PDF cannot be displayed. You can download it here:

https://nar.oxfordjournals.org/content/41/W1/W281.full.pdf

Nucleos: a web server for the identification of nucleotide-binding sites in protein structures

Luca Parca 0 1 Fabrizio Ferre 0 1 Gabriele Ausiello 0 1 Manuela Helmer-Citterich 0 1 0 Present address: Luca Parca, Structural and Computational Biology Unit, European Molecular Biology Laboratory , Heidelberg, Germany 1 Department of Biology, Centre for Molecular Bioinformatics, University of Rome 'Tor Vergata' , Via della Ricerca Scientifica snc, 00133 Rome, Italy Nucleos is a web server for the identification of nucleotide-binding sites in protein structures. Nucleos compares the structure of a query protein against a set of known template 3D binding sites representing nucleotide modules, namely the nucleobase, carbohydrate and phosphate. Structural features, clustering and conservation are used to filter and score the predictions. The predicted nucleotide modules are then joined to build whole nucleotide-binding sites, which are ranked by their score. The server takes as input either the PDB code of the query protein structure or a user-submitted structure in PDB format. The output of Nucleos is composed of ranked lists of predicted nucleotide-binding sites divided by nucleotide type (e.g. ATP-like). For each ranked prediction, Nucleos provides detailed information about the score, the template structure and the structural match for each nucleotide module composing the nucleotide-binding site. The predictions on the query structure and the templatebinding sites can be viewed directly on the web through a graphical applet. In 98% of the cases, the modules composing correct predictions belong to proteins with no homology relationship between each other, meaning that the identification of brandnew nucleotide-binding sites is possible using information from non-homologous proteins. Nucleos is available at http://nucleos.bio.uniroma2.it/nucleos/. - The majority of cellular key processes involves a transfer of energy and genetic information. These processes have in common the same biological currency, represented by nucleotides. Different types of nucleotides exist, but all share the same chemical groups, or modules: the nucleobase, the carbohydrate and the phosphate group. Given the ubiquitous nature of nucleotides, it is not surprising that they were among the earliest cofactors bound by proteins during evolution (1). The interaction between nucleotides and proteins has been extensively studied so that many features that proteins must possess to interact with a nucleotide have been discovered (24), such as the P-loop and the Walker A motifs. Some structural features have been also derived such as the acceptordonoracceptor environment necessary for the binding of the nucleobase group (5) and several phosphate-binding structural motifs (6,7). However, the binding site of a nucleotide cannot be simply reduced to these features, as some studies highlighted the large amount of possible conformations, even not energetically favorable, that can be presented by nucleotides when bound by proteins (8). Therefore the identification of binding sites for nucleotides in protein structures is not an easy task. Different web servers are available for the identification of nucleotide-interacting residues in protein sequences, mostly based on machine learning approaches, like ATPint, GTPbinder, NADbinder and NsitePred (912). From the structural point of view, no web server has been dedicated to the identification of nucleotide-binding sites in protein structures. Some methods have been developed for the identification of carbohydrate- and nucleobase-binding sites (13,14), but no related web services have been produced. We developed in the past years, a method and a web server for the identification of phosphate-binding sites in protein structures (15,16), called Phosfinder. Given this scenario, we decided to build a web server for the identification of nucleotide-binding sites based on the concept of nucleotide modularity, described by Gherardini et al. (17) and used to predict nucleotide-binding sites in protein structures (18). This concept is based on the observation that nucleotides, and their binding sites, are composed of modules shared by evolutionary unrelated proteins and combinable in different ways to form binding sites even for different types of nucleotides. This web server, called Nucleos, searches for structural similarities between the query protein structures and a dataset of template binding sites for nucleotide modules: the nucleobase, the carbohydrate and the phosphate. Each similarity identifies a putative binding site for a nucleotide module, evaluated according to its position in space with respect of the protein surface and taking into account the conservation of the involved residues. Complete nucleotide-binding sites are built combining predicted nucleotide modules following distance thresholds observed in crystallized structures of bound nucleotides. Nucleos allows the biologist user to scan protein structures of interest for binding sites for different types of nucleotides directly on the web, at the address http://nucleos.bio.uniroma2.it/nucleos/. MATERIALS AND METHODS The Nucleos web server is based on a previously developed methodology (18) for the identification of nucleotide-binding sites in protein structures based on the concept of nucleotide modularity. Binding sites for nucleotide modules (the nucleobase, the carbohydrate and the phosphate) are predicted independently; subsequently, they are joined together to build complete nucleotidebinding sites. The Superpose3D (19) structural comparison algorithm is used to find structural similarities between the query protein structure and a dataset of template-binding sites for nucleobase, carbohydrate and phosphate modules (4657, 3073 and 10 185, respectively). The templatebinding sites are composed of at least three residues of a binding pocket interacting with at least one atom of the ligand. Structural similarities are evaluated by the Root Mean Square Deviation (RMSD) of the matching residue atoms and by the BLOSUM62 substitution value of the residues involved in the similarity. Whenever a structural similarity is found, the nucleotide module bound by the template-binding site is transposed onto the query protein structure following the structural match with the residues of the query protein. Any predicted module-binding site placed inside the protein or at less then a specified distance from the solvent accessible surface of the protein is discarded. These distances are derived after analyzing the minimum distances observed by nucleotide modules from the protein surface in nucleotideprotein complexes; therefore, a threshold for each nucleotide module is derived. The remaining predictions of the same type are clustered together with a hierarchical clustering procedure. Scoring of predicted binding sites A clustering score is assigned to each prediction as the amount of predictions in its cluster. A conservation score is assigned to each prediction as the sum of the conservation value of the query pr (...truncated)