EvoDesign: de novo protein design based on structural and evolutionary profiles
Pralay Mitra
0
David Shultis
0
Yang Zhang
0
0
Department of Computational Medicine and Bioinformatics, University of Michigan
,
Ann Arbor, MI 48109 USA
Protein design aims to identify new protein sequences of desirable structure and biological function. Most current de novo protein design methods rely on physics-based force fields to search for low free-energy states following Anfinsen's thermodynamic hypothesis. A major obstacle of such approaches is the inaccuracy of the force field design, which cannot accurately describe the atomic interactions or distinguish correct folds. We developed a new web server, EvoDesign, to design optimal protein sequences of given scaffolds along with multiple sequence and structure-based features to assess the foldability and goodness of the designs. EvoDesign uses an evolution-profile-based Monte Carlo search with the profiles constructed from homologous structure families in the Protein Data Bank. A set of local structure features, including secondary structure, torsion angle and solvation, are predicted by singlesequence neural-network training and used to smooth the sequence motif and accommodate the physicochemical packing. The EvoDesign algorithm has been extensively tested in large-scale protein design experiments, which demonstrate enhanced foldability and structural stability of designed sequences compared with the physics-based designing methods. The EvoDesign server is freely available at http://zhanglab.ccmb.med.umich.edu/ EvoDesign.
-
INTRODUCTION
The number of possible amino acid sequences is huge
( 20L with L being the sequence length). But only a few
of them have folded into real proteins in nature that have
a unique folding state with physiological activities. The
driving force of such nature protein design includes
both physicochemical interaction and evolutionary
pressure (1,2). Computer-based rational protein design
aims to engineer novel sequences of stable folding states
and in particular those with desirable physiological
functionality. Technically, it can be considered as a reversal of
protein folding that critically challenges our
understanding of the fundamental principles of protein folding and
stability (35). Protein design has also significant
biomedical implications on its own. Successful protein designs
and engineering have been shown to generate novel
catalytic activities (6,7) and result in new therapeutic
developments (8,9).
Most of the computer-based protein design efforts are
based on Anfinsens thermodynamic hypothesis (10),
which aim to identify new sequences of lowest free
energy on various designed force fields. One obstacle in
using physics-based approaches comes from the
inaccuracy of the force field potentials for structural and
thermodynamic optimization of the protein stability. Motivated
by the superiority of template-based approaches in protein
structure prediction, which construct structural models
using evolutionarily related protein as template (11,12),
we have developed an evolutionary profile-based method
for de novo protein design (13), where sequence space
search is constrained by the amino acid sequence profiles
as computed from the homologous structure families. The
physicochemical features of the designed sequence are
smoothed by neural-network predictions of local
structural features, including secondary structure, backbone
torsion angle and solvation. The evolutionary
profileguided simulation search has the advantage to allow for
designing and engineering proteins of larger size and more
complex topology compared with that on physical force
fields.
Here, we describe EvoDesign, an evolutionary
profilebased web server for de novo protein design, which is
developed based on our recent protein design method
(13). The server offers several options for users to select
different guiding force fields, structural thresholds for
profile construction and residue conservations. The
execution time of the server is fast and scales in hours because
of the quick convergence of the simulation search under
the profile restraints. EvoDesign is established as an
automated, and yet reliable, on-line facility most useful for
protein engineering and drug discovery studies.
MATERIALS AND METHODS
Figure 1 depicts a flow chart of the EvoDesign server,
which is divided into three stages: (i) pre-processing:
generation of scaffold-specific evolutionary profile restraints;
(ii) simulation: Monte Carlo search on the sequence space;
and (iii) clustering and selection: sequence clustering for
design selection.
Pre-processing
Starting from a scaffold protein structure, EvoDesign first
collects a set of proteins of similar folds from the PDB
library by the structural alignment program TM-align
(14). By default, a high-structural similarity (TM-score
>0.7) is used, which will be gradually reduced till the
number of structural homologies is >10 or the TM-score
threshold is equal to 0.5. Based on the preference in
structural variations, users can control the diversity of the
protein by specifying different lower-limit of fold cut-offs.
An evolutionary profile is then constructed from the
multiple sequence alignments that are constructed based
on TM-align alignments. This profile will be used to guide
the conformational search of amino acid sequence space in
the next step of Monte Carlo simulation, where the
physicochemical packing of side-chain and backbone atoms
is accommodated by neural-networkbased solvation,
torsion angle and secondary structure predictions (13).
Force field design
The EvoDesign force field is a linear combination of four
terms: (i) log-odds match between decoy sequence and
the structure profile of the target scaffold; (ii) secondary
structure (SS) match between decoy and target scaffold;
(iii) backbone torsion angle (TA) match between decoy
and target; (iv) match of solvent accessibility (SA) of
residues between decoy and target. If the target structure
from user input is full-atomic, the SS, TA and SA on
target are pre-assigned by DSSP program (15). If the
scaffold is C-a only, an atomic model including
backbone and side-chain heavy atoms is quickly
constructed using the statistical parameters collected from
the PDB (16), which is then fed into DSSP to assign the
structural features.
The SS, TA and SA value of decoy sequences is predicted
from neural-network learning, which was mostly trained
on the PSI-BLAST position-specific scoring matrix
(PSSM) (17). As new decoys are generated at each step
of movement, we trained the features separately on
single-sequences, which is much faster than the PSSM
predictors ( 5 min versus <1 s) but with comparable
prediction accuracy.
As an option, the EvoDesign server also allows users
to select a physics-based potential, which will be
linearly combined with the evolution-based energy terms.
The FoldX (version 3.0b5) is exploited to count for
the physics-based energy terms, including
hydrogenbonding, electrostatics, van der Waals, steric, solvation (...truncated)