Evol and ProDy for bridging protein sequence evolution and structural dynamics
Evol and ProDy for bridging protein sequence evolution and structural dynamics
Ahmet Bakany 0
Anindita Duttay 0
Wenzhi Mao 0
Ying Liu 0
Chakra Chennubhotla 0
Timothy R. Lezon 0
Ivet Bahar 0
Associate Editor: Anna Tramontano
0 Department of Computational and Systems Biology, and Clinical & Translational Science Institute, School of Medicine, University of Pittsburgh , Pittsburgh, PA 15213 , USA
Correlations between sequence evolution and structural dynamics are of utmost importance in understanding the molecular mechanisms of function and their evolution. We have integrated Evol, a new package for fast and efficient comparative analysis of evolutionary patterns and conformational dynamics, into ProDy, a computational toolbox designed for inferring protein dynamics from experimental and theoretical data. Using information-theoretic approaches, Evol coanalyzes conservation and coevolution profiles extracted from multiple sequence alignments of protein families with their inferred dynamics. Availability and implementation: ProDy and Evol are open-source and freely available under MIT License from http://prody.csb.pitt. edu/. Contact: The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail:
INTRODUCTION
The significance of protein dynamics in a wide range of
biological functions, including cell signaling, regulation and
machinery is widely established
(Bahar et al., 2010; Bhabha et al., 2011;
Marsh et al., 2012)
. In many cases, sequence variability goes
hand in hand with structural dynamics
(Glembo et al., 2012;
Liu and Bahar, 2012; Marks et al., 2011; Micheletti, 2012;
Worth et al., 2009; Zheng et al., 2005)
. Structural dynamics
correlates with evolvability
(Tokuriki and Tawfik, 2009)
or sequence
and conformational diversity
(Friedland et al., 2009)
and enables
adaptation to substrate binding while maintaining specificity
(Liu et al., 2010)
. To our knowledge, existing software usually
relate evolutionary properties to static structures
(Ashkenazy
et al., 2010; Morgan et al., 2006; Wainreb et al., 2011)
, or they
are exclusively dedicated to either sequence analysis
(Waterhouse
et al., 2009)
or structural dynamics
(Eyal et al., 2006; Suhre and
Sanejouand, 2004)
. There is a need for methods that allow
combined analysis of sequence (co)evolution and structural
dynamics. These would be particularly useful if they could be
performed and visualized in a versatile, integrated computing
environment.
*To whom correspondence should be addressed.
yThe authors wish it to be known that, in their opinion, the first two
authors should be regarded as Joint First Authors.
Toward addressing this need, we introduce the v1.5 of ProDy
(Bakan et al., 2011)
with Evol applications. Highlights of the new
version are rich methods for coevolutionary analysis, and
extensions for analyzing and interpreting structural dynamics,
following the approach adopted in our recent comparative
study of sequence conservation and coevolution patterns versus
structure/dynamics properties for a representative set of protein
families
(Liu and Bahar, 2012)
, which has been validated in
detailed case studies
(e.g. General et al., 2014; Liu et al., 2010)
.
A distinctive feature of ProDy is its capability to extract
mechanistic information from principal component analysis (PCA) of
ensembles of structures (e.g. drug targets)
(Bakan and Bahar,
2009)
. The new release has several new modules and command
line applications named ‘evol’ to evaluate sequence conservation
and coevolution using information-theoretic and statistical
approaches. To our knowledge, this is the only package that
enables comparative analysis of protein dynamics with sequence
evolution data extracted from multiple sequence alignments
(MSAs) for protein families.
2
2.1
DESCRIPTION AND FUNCTIONALITY
Input for ProDy and Evol
The input for ProDy is a set of protein coordinates in PDB
format, or simply the PDB ID or protein sequence. The speed
of PDB parser and AtomGroup classes has been increased in the
current version, such that parsing coordinates is 4.5–40 times
faster than using Biopython PDB module
(Hamelryck and
Manderick, 2003)
, and atomic data storage occupies 10 times
less memory footprint. We implemented efficient and flexible
features for handling MSAs. Notably, the new MSA parser
can evaluate various formats at a rate of 700 MB/s (on
3.6 GHz Intel Xeon CPU, 16 GB RAM and Samsung SSD)
and is up to 80 times faster than the alignment parser of
Biopython
(Cock et al., 2009)
. Flexible classes store MSA data
parsimoniously in the memory and provide ways of subsampling.
Sequences can be filtered based on their labels to retain those in
certain categories (e.g. human) and sliced to retain specific
regions or sequences (e.g. regions matching structurally resolved
amino acids). Such refinements, performed in a fraction of a
second, allow for real-time processing of large MSAs and
systematic analyses (...truncated)