WeNMR: Structural Biology on the Grid
0
T. Herrmann Centre de RMN trs Hauts Champs, Institut des Sciences Analytiques, Universit de Lyon
, UMR-5280 CNRS, ENS Lyon, UCB Lyon 1, 5 rue de la Doua, 69100 Villeurbanne,
France
1
G. W. Vuister Department of Biochemistry, School of Biological Sciences
, Henry Wellcome Building,
University of Leicester
, Lancaster Road, Leicester LE1 9HN,
UK
2
G. Vriend CMBI,
Radboud University Nijmegen Medical Centre
, Geert Grooteplein 26-28,
Nijmegen, The Netherlands
3
J. F. Doreleijers Protein Biophysics/IMM, Radboud University Nijmegen
, Geert Grooteplein 26-28,
Nijmegen, The Netherlands
4
W. F. Vranken European Bioinformatics Institute
, Hinxton,
Cambridge
, CB10 1SD,
UK
5
Present Address: W. F. Vranken Department of Structural Biology
, VIB,
and Structural Biology Brussels, Vrije Universiteit Brussel
, Pleinlaan 2,
1050 Brussels, Belgium
6
Present Address: N. Loureiro-Ferreira European Grid Infrastructure (EGI),
140 Science Park
, 1098 XG Amsterdam,
The Netherlands
7
Present Address: T. A. Wassenaar Biocomputing Group,
Department of Biological Sciences, University of Calgary, 2500 University Drive NW
, AB T2N 1N4 Calgary,
Canada
The WeNMR (http://www.wenmr.eu) project is a European Union funded international effort to streamline and automate analysis of Nuclear Magnetic Resonance (NMR) and Small Angle X-Ray scattering (SAXS) imaging data for atomic and near-atomic resolution
-
molecular structures. Conventional calculation of
structure requires the use of various software
packages, considerable user expertise and ample
computational resources. To facilitate the use of
NMR spectroscopy and SAXS in life sciences
the WeNMR consortium has established standard
computational workflows and services through
easy-to-use web interfaces, while still retaining
sufficient flexibility to handle more specific
requests. Thus far, a number of programs often used
in structural biology have been made available
through application portals. The implementation
of these services, in particular the distribution
of calculations to a Grid computing
infrastructure, involves a novel mechanism for submission
and handling of jobs that is independent of the
type of job being run. With over 450 registered
users (September 2012), WeNMR is currently the
largest Virtual Organization (VO) in life sciences.
With its large and worldwide user community,
WeNMR has become the first Virtual Research
Community officially recognized by the European
Grid Infrastructure (EGI).
1 Introduction
1.1 NMR Spectroscopy
NMR Spectroscopy is one of two main techniques
that allow determining three dimensional (3D)
structures of biomacromolecules, such as proteins,
RNA, DNA, and their complexes, at atomic
resolution. Knowledge of their 3D structures is vital
for understanding functions and mechanisms of
action of macromolecules, and for elucidating and
predicting the effect of mutations. 3D structures
are also important as guides for the design of new
experimental studies and as starting points for
rational drug design. An advantage of NMR over
X-ray crystallography is that it also allows
investigation of time-dependent chemical and
conformational phenomena, including reaction and folding
kinetics and intramolecular dynamics. For these
reasons, NMR plays an important role within the
life sciences.
The principles underlying NMR are
modulation of the natural magnetic moment of atomic
nuclei, and measurements of how the system
relaxes back to the initial state [1, 2]. The signal thus
obtained is a fading wave consisting of many
individual frequency contributions: the Free
Induction Decay, FID. Typically, up to 27000 different
frequencies can be resolved at the highest
magnetic fields that are nowadays available. To
investigate the frequency contributions and their
decays, such measurements have to be repeated
many times, due to the low signal-to-noise ratio.
To obtain structural information from NMR data,
many more, but also more complex measurements
have to be run, yielding substantial amounts of
data that need processing.
Processing data from NMR to obtain a 3D
structure typically involves the following steps,
summarized graphically in Fig. 1. First the raw
data have to be processed, more specifically
Fourier-transformed, to obtain spectra revealing
the different frequency contributions and their
relations. These frequencies are the resonances of
the atoms measured, but to infer structural
information from them, these resonances subsequently
have to be assigned to individual contributors
(atoms/residues). If the assignment is sufficiently
complete, structural restraints can be determined
Fig. 1 NMR data processing from signal to 3D
structure. After acquisition of the primary NMR data, these
are Fourier transformed to obtain spectra in which the
individual frequency contributions or resonances of spin
systems, and their relations, are revealed. The resonances
subsequently have to be assigned to individual atoms. If
sufficient resonances have been assigned, restraints can be
inferred from the data, pertaining to distances between
atoms, dihedral angles, domain orientations, etc. When an
adequate number of restraints is available, these can be
used to calculate a set of three-dimensional structures
optimally satisfying these restraints. The resulting structures
represent the structure of the protein in solution, which
is validated against the available experimental data.
Although the process is here depicted linearly, intermediate
stages may involve iterative cycles of refinement
from the spectra, including inter-atomic distance
restraints, dihedral angle restraints, and
orientation restraints. These structural restraints are then
used to calculate a number of structures using a
variety of molecular modeling approaches, after
which structure validation checks are performed
to assert the quality of the results.
For each of the steps involved, specialized
computer programs are available, each with its own
characteristics and often with its own data
format. Processing of NMR data has thus become a
task for specialists, who can understand the data
and their formats, as well as the programs, with
installation requirements and usage details.
Furthermore, NMR data processing requires
considerable data storage and computational resources.
These factors together currently represent a
barrier for groups in life sciences to employ the
full power of NMR. Against this background, the
eNMR project was run as a European initiative
funded under the Framework 7 e-Infrastructure
programme to considerably facilitate this process
[3]. It is now carried on by the WeNMR (a
Worldwide e-Infrastructure for NMR and
structural biology) project since November 2010. The
project aims at allowing groups lacking the
resources to add NMR to their toolbox, as well as
allowing dedicated NMR groups to improve their
standard from basic practice towards cutting-edge
research.
1.2 Small Angle X-Ray Scattering
Small Angle X-Ray Scattering (SAXS) is a widely
used (...truncated)