PDBe: Protein Data Bank in Europe (pdf)

Article PDF cannot be displayed. You can download it here:

https://nar.oxfordjournals.org/content/42/D1/D285.full.pdf

PDBe: Protein Data Bank in Europe

Aleksandras Gutmanas 0 Younes Alhroub 0 Gary M. Battle 0 John M. Berrisford 0 Estelle Bochet 0 Matthew J. Conroy 0 Jose M. Dana 0 Manuel A. Fernandez Montecelo 0 Glen van Ginkel 0 Swanand P. Gore 0 Pauline Haslam 0 Rowan Hatherley 0 Pieter M.S. Hendrickx 0 Miriam Hirshberg 0 Ingvar Lagerstedt 0 Saqib Mir 0 Abhik Mukhopadhyay 0 Thomas J. Oldfield 0 Ardan Patwardhan 0 Luana Rinaldi 0 Gaurav Sahni 0 Eduardo Sanz-Garca 0 Sanchayita Sen 0 Robert A. Slowley 0 Sameer Velankar 0 Michael E. Wainwright 0 Gerard J. Kleywegt 0 0 Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus , Hinxton, Cambridge CB10 1SD, UK The Protein Data Bank in Europe (pdbe.org) is a founding member of the Worldwide PDB consortium (wwPDB; wwpdb.org) and as such is actively engaged in the deposition, annotation, remediation and dissemination of macromolecular structure data through the single global archive for such data, the PDB. Similarly, PDBe is a member of the EMDataBank organisation (emdatabank.org), which manages the EMDB archive for electron microscopy data. PDBe also develops tools that help the biomedical science community to make effective use of the data in the PDB and EMDB for their research. Here we describe new or improved services, including updated SIFTS mappings to other bioinformatics resources, a new browser for the PDB archive based on Gene Ontology (GO) annotation, updates to the analysis of Nuclear Magnetic Resonance-derived structures, redesigned search and browse interfaces, and new or updated visualisation and validation tools for EMDB entries. - INTRODUCTION For over 4 decades, scientists working in structural biology and related disciplines have been benefiting from a single global archive of 3D structures of biological macromoleculesthe Protein Data Bank (PDB) (13). Since 2003, the PDB has been jointly managed by the members of the Worldwide PDB consortium (46): the Research Collaboratory for Structural Bioinformatics (RCSB PDB) (7), the Protein Data Bank Japan (PDBj) (8) and the Protein Data Bank in Europe (911). In 2006, the Biological Magnetic Resonance Bank (BMRB) (12) joined the wwPDB organization (13). Together, the four partners act as deposition and annotation sites for 3D structures and associated experimental data. They also jointly distribute and remediate the PDB archive, and engage with the wider scientific community on issues of formats, policy and validation criteria. Simultaneously, the wwPDB partners are engaged in friendly competition to develop tools for presenting structural data to their users. PDBes motto is Bringing structure to biology, i.e. to make the complex and rich structural and functional information in the PDB more accessible and useful to the biomedical community (911,14,15). In 2002, PDBe pioneered the archiving of electron microscopy (3DEM) volume maps by launching the Electron Microscopy Data Bank (EMDB) (16). Since 2007, the EMDB archive has been jointly managed by PDBe, RCSB PDB and the National Center for Macromolecular Imaging (NCMI) at Baylor College of Medicine, Texas, USA, under the aegis of the EMDataBank organization (17). The relationship between EMDB and EMDataBank is analogous to that between the PDB archive and the wwPDB organization, which manages it. In this article, we describe updates to the resources, services and tools provided by PDBe since our last publication (9). SIFTS: STRUCTURE INTEGRATION WITH FUNCTION, TAXONOMY AND SEQUENCE The SIFTS resource (http://pdbe.org/sifts) (18,19) provides up-to-date cross-reference information between protein structures (i.e. in PDB entries) and other bioinformatics resources. In the past 2 years, Gene Ontology (GO) (20) and InterPro (21) assignments have been improved substantially. These mappings now apply directly to the protein sequences in PDB entries. Previously, such mappings could pertain to domains that belonged to the mapped UniProt (22) entry, but were not actually present in the protein sample used to determine the structure. Other improvements include the handling of chimeric constructs, microheterogeneity, sequence conflicts between the natural protein and the construct used in the structural study and more up-to-date information on enzymes. It is now also possible to download SIFTS information (in XML format) for proteins in PDB entries with no UniProt mapping. Improvements in the SIFTS infrastructure have resulted in more accurate representation of sequence and cross-reference information on the PDB entry pages of the PDBe website (911). As SIFTS mappings are used by a number of large bioinformatics resources worldwide, the information they use and display has also benefited from the improvements. GO BROWSER PDBeXplore (10) is a PDBe service comprising a set of browsers that allow users to explore and analyse the available structure data in the PDB archive for subsets of entries. The user can select subsets of interest based on well-known chemical and biological classification systems such as enzyme classification (EC numbers) (23), NCBI taxonomy database (24), Pfam sequence family data (25) and CATH structure domain classification (26). We have added a new browser module based on GO assignments (20) (http://pdbe.org/go; Figure 1) (15,18). This browser allows the user to find and analyse relevant structures in the PDB based on the three GO categories: . Molecular function terms, which describe functional activities at the molecular level (e.g. catalytic activity, binding activities); . Cellular component terms, which describe the cellular location of biomacromolecules (e.g. outer membranebounded periplasmic space); and . Biological process terms, which describe operations or sets of molecular events carried out by macromolecules (e.g. vesicle-mediated transport). Similarly to other PDBeXplore modules, the GO browser provides a facility to search for a particular GO term, or the user can browse a tree-like representation of the terms. On selection of a GO term, the browser presents the set of PDB entries matching that term via a number of views, designed to answer questions such as (i) Which small molecules are found in the PDB entries matching the GO term? (ii) Which CATH (26) folds are observed (Figure 1A)? (iii) What types of quaternary structures as determined by PISA (27) are found? (iv) Which Pfam (25) sequence families are present? (v) Which taxa do the source organisms of these proteins belong to (Figure 1B)? and (vi) Who has determined these structures and where were they published? PDBeXplore is a prime example of how the information provided through SIFTS can be used to develop entirely new ways of accessing and analysing the contents of the structure archive. NEW BIOLOGY IN THE PDB Improvements to the SIFTS infrastructure (see earlier in text) now allow the identification of newly released PDB entries that map to a Pfam sequence family, a GO term or a UniProt en (...truncated)