PDBe: Protein Data Bank in Europe
Aleksandras Gutmanas
0
Younes Alhroub
0
Gary M. Battle
0
John M. Berrisford
0
Estelle Bochet
0
Matthew J. Conroy
0
Jose M. Dana
0
Manuel A. Fernandez Montecelo
0
Glen van Ginkel
0
Swanand P. Gore
0
Pauline Haslam
0
Rowan Hatherley
0
Pieter M.S. Hendrickx
0
Miriam Hirshberg
0
Ingvar Lagerstedt
0
Saqib Mir
0
Abhik Mukhopadhyay
0
Thomas J. Oldfield
0
Ardan Patwardhan
0
Luana Rinaldi
0
Gaurav Sahni
0
Eduardo Sanz-Garca
0
Sanchayita Sen
0
Robert A. Slowley
0
Sameer Velankar
0
Michael E. Wainwright
0
Gerard J. Kleywegt
0
0
Protein Data Bank in Europe,
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus
, Hinxton, Cambridge CB10 1SD,
UK
The Protein Data Bank in Europe (pdbe.org) is a founding member of the Worldwide PDB consortium (wwPDB; wwpdb.org) and as such is actively engaged in the deposition, annotation, remediation and dissemination of macromolecular structure data through the single global archive for such data, the PDB. Similarly, PDBe is a member of the EMDataBank organisation (emdatabank.org), which manages the EMDB archive for electron microscopy data. PDBe also develops tools that help the biomedical science community to make effective use of the data in the PDB and EMDB for their research. Here we describe new or improved services, including updated SIFTS mappings to other bioinformatics resources, a new browser for the PDB archive based on Gene Ontology (GO) annotation, updates to the analysis of Nuclear Magnetic Resonance-derived structures, redesigned search and browse interfaces, and new or updated visualisation and validation tools for EMDB entries.
-
INTRODUCTION
For over 4 decades, scientists working in structural
biology and related disciplines have been benefiting from
a single global archive of 3D structures of biological
macromoleculesthe Protein Data Bank (PDB) (13).
Since 2003, the PDB has been jointly managed by the
members of the Worldwide PDB consortium (46): the
Research Collaboratory for Structural Bioinformatics
(RCSB PDB) (7), the Protein Data Bank Japan (PDBj)
(8) and the Protein Data Bank in Europe (911). In 2006,
the Biological Magnetic Resonance Bank (BMRB) (12)
joined the wwPDB organization (13). Together, the four
partners act as deposition and annotation sites for 3D
structures and associated experimental data. They also
jointly distribute and remediate the PDB archive, and
engage with the wider scientific community on issues of
formats, policy and validation criteria. Simultaneously,
the wwPDB partners are engaged in friendly competition
to develop tools for presenting structural data to their
users. PDBes motto is Bringing structure to biology,
i.e. to make the complex and rich structural and functional
information in the PDB more accessible and useful to the
biomedical community (911,14,15).
In 2002, PDBe pioneered the archiving of electron
microscopy (3DEM) volume maps by launching the Electron
Microscopy Data Bank (EMDB) (16). Since 2007, the
EMDB archive has been jointly managed by PDBe,
RCSB PDB and the National Center for Macromolecular
Imaging (NCMI) at Baylor College of Medicine, Texas,
USA, under the aegis of the EMDataBank organization
(17). The relationship between EMDB and EMDataBank
is analogous to that between the PDB archive and the
wwPDB organization, which manages it.
In this article, we describe updates to the resources,
services and tools provided by PDBe since our last
publication (9).
SIFTS: STRUCTURE INTEGRATION WITH
FUNCTION, TAXONOMY AND SEQUENCE
The SIFTS resource (http://pdbe.org/sifts) (18,19) provides
up-to-date cross-reference information between protein
structures (i.e. in PDB entries) and other bioinformatics
resources. In the past 2 years, Gene Ontology (GO) (20) and
InterPro (21) assignments have been improved substantially.
These mappings now apply directly to the protein sequences
in PDB entries. Previously, such mappings could pertain to
domains that belonged to the mapped UniProt (22) entry,
but were not actually present in the protein sample used
to determine the structure. Other improvements include
the handling of chimeric constructs, microheterogeneity,
sequence conflicts between the natural protein and the
construct used in the structural study and more up-to-date
information on enzymes. It is now also possible to download
SIFTS information (in XML format) for proteins in PDB
entries with no UniProt mapping. Improvements in the
SIFTS infrastructure have resulted in more accurate
representation of sequence and cross-reference information on
the PDB entry pages of the PDBe website (911). As
SIFTS mappings are used by a number of large
bioinformatics resources worldwide, the information they use and
display has also benefited from the improvements.
GO BROWSER
PDBeXplore (10) is a PDBe service comprising a set of
browsers that allow users to explore and analyse the
available structure data in the PDB archive for subsets of
entries. The user can select subsets of interest based on
well-known chemical and biological classification systems
such as enzyme classification (EC numbers) (23), NCBI
taxonomy database (24), Pfam sequence family data (25)
and CATH structure domain classification (26). We have
added a new browser module based on GO assignments
(20) (http://pdbe.org/go; Figure 1) (15,18). This browser
allows the user to find and analyse relevant structures in
the PDB based on the three GO categories:
. Molecular function terms, which describe functional
activities at the molecular level (e.g. catalytic activity,
binding activities);
. Cellular component terms, which describe the cellular
location of biomacromolecules (e.g. outer
membranebounded periplasmic space); and
. Biological process terms, which describe operations or
sets of molecular events carried out by macromolecules
(e.g. vesicle-mediated transport).
Similarly to other PDBeXplore modules, the GO
browser provides a facility to search for a particular GO
term, or the user can browse a tree-like representation of
the terms. On selection of a GO term, the browser presents
the set of PDB entries matching that term via a number of
views, designed to answer questions such as (i) Which
small molecules are found in the PDB entries matching
the GO term? (ii) Which CATH (26) folds are observed
(Figure 1A)? (iii) What types of quaternary structures
as determined by PISA (27) are found? (iv) Which Pfam
(25) sequence families are present? (v) Which taxa do
the source organisms of these proteins belong to
(Figure 1B)? and (vi) Who has determined these structures
and where were they published?
PDBeXplore is a prime example of how the information
provided through SIFTS can be used to develop entirely
new ways of accessing and analysing the contents of the
structure archive.
NEW BIOLOGY IN THE PDB
Improvements to the SIFTS infrastructure (see earlier in
text) now allow the identification of newly released PDB
entries that map to a Pfam sequence family, a GO term or
a UniProt en (...truncated)