EvoDesign: de novo protein design based on structural and evolutionary profiles

Nucleic Acids Research, Jul 2013

Protein design aims to identify new protein sequences of desirable structure and biological function. Most current de novo protein design methods rely on physics-based force fields to search for low free-energy states following Anfinsen’s thermodynamic hypothesis. A major obstacle of such approaches is the inaccuracy of the force field design, which cannot accurately describe the atomic interactions or distinguish correct folds. We developed a new web server, EvoDesign, to design optimal protein sequences of given scaffolds along with multiple sequence and structure-based features to assess the foldability and goodness of the designs. EvoDesign uses an evolution-profile–based Monte Carlo search with the profiles constructed from homologous structure families in the Protein Data Bank. A set of local structure features, including secondary structure, torsion angle and solvation, are predicted by single-sequence neural-network training and used to smooth the sequence motif and accommodate the physicochemical packing. The EvoDesign algorithm has been extensively tested in large-scale protein design experiments, which demonstrate enhanced foldability and structural stability of designed sequences compared with the physics-based designing methods. The EvoDesign server is freely available at http://zhanglab.ccmb.med.umich.edu/EvoDesign.

Article PDF cannot be displayed. You can download it here:

https://nar.oxfordjournals.org/content/41/W1/W273.full.pdf

EvoDesign: de novo protein design based on structural and evolutionary profiles

Pralay Mitra 0 David Shultis 0 Yang Zhang 0 0 Department of Computational Medicine and Bioinformatics, University of Michigan , Ann Arbor, MI 48109 USA Protein design aims to identify new protein sequences of desirable structure and biological function. Most current de novo protein design methods rely on physics-based force fields to search for low free-energy states following Anfinsen's thermodynamic hypothesis. A major obstacle of such approaches is the inaccuracy of the force field design, which cannot accurately describe the atomic interactions or distinguish correct folds. We developed a new web server, EvoDesign, to design optimal protein sequences of given scaffolds along with multiple sequence and structure-based features to assess the foldability and goodness of the designs. EvoDesign uses an evolution-profile-based Monte Carlo search with the profiles constructed from homologous structure families in the Protein Data Bank. A set of local structure features, including secondary structure, torsion angle and solvation, are predicted by singlesequence neural-network training and used to smooth the sequence motif and accommodate the physicochemical packing. The EvoDesign algorithm has been extensively tested in large-scale protein design experiments, which demonstrate enhanced foldability and structural stability of designed sequences compared with the physics-based designing methods. The EvoDesign server is freely available at http://zhanglab.ccmb.med.umich.edu/ EvoDesign. - INTRODUCTION The number of possible amino acid sequences is huge ( 20L with L being the sequence length). But only a few of them have folded into real proteins in nature that have a unique folding state with physiological activities. The driving force of such nature protein design includes both physicochemical interaction and evolutionary pressure (1,2). Computer-based rational protein design aims to engineer novel sequences of stable folding states and in particular those with desirable physiological functionality. Technically, it can be considered as a reversal of protein folding that critically challenges our understanding of the fundamental principles of protein folding and stability (35). Protein design has also significant biomedical implications on its own. Successful protein designs and engineering have been shown to generate novel catalytic activities (6,7) and result in new therapeutic developments (8,9). Most of the computer-based protein design efforts are based on Anfinsens thermodynamic hypothesis (10), which aim to identify new sequences of lowest free energy on various designed force fields. One obstacle in using physics-based approaches comes from the inaccuracy of the force field potentials for structural and thermodynamic optimization of the protein stability. Motivated by the superiority of template-based approaches in protein structure prediction, which construct structural models using evolutionarily related protein as template (11,12), we have developed an evolutionary profile-based method for de novo protein design (13), where sequence space search is constrained by the amino acid sequence profiles as computed from the homologous structure families. The physicochemical features of the designed sequence are smoothed by neural-network predictions of local structural features, including secondary structure, backbone torsion angle and solvation. The evolutionary profileguided simulation search has the advantage to allow for designing and engineering proteins of larger size and more complex topology compared with that on physical force fields. Here, we describe EvoDesign, an evolutionary profilebased web server for de novo protein design, which is developed based on our recent protein design method (13). The server offers several options for users to select different guiding force fields, structural thresholds for profile construction and residue conservations. The execution time of the server is fast and scales in hours because of the quick convergence of the simulation search under the profile restraints. EvoDesign is established as an automated, and yet reliable, on-line facility most useful for protein engineering and drug discovery studies. MATERIALS AND METHODS Figure 1 depicts a flow chart of the EvoDesign server, which is divided into three stages: (i) pre-processing: generation of scaffold-specific evolutionary profile restraints; (ii) simulation: Monte Carlo search on the sequence space; and (iii) clustering and selection: sequence clustering for design selection. Pre-processing Starting from a scaffold protein structure, EvoDesign first collects a set of proteins of similar folds from the PDB library by the structural alignment program TM-align (14). By default, a high-structural similarity (TM-score >0.7) is used, which will be gradually reduced till the number of structural homologies is >10 or the TM-score threshold is equal to 0.5. Based on the preference in structural variations, users can control the diversity of the protein by specifying different lower-limit of fold cut-offs. An evolutionary profile is then constructed from the multiple sequence alignments that are constructed based on TM-align alignments. This profile will be used to guide the conformational search of amino acid sequence space in the next step of Monte Carlo simulation, where the physicochemical packing of side-chain and backbone atoms is accommodated by neural-networkbased solvation, torsion angle and secondary structure predictions (13). Force field design The EvoDesign force field is a linear combination of four terms: (i) log-odds match between decoy sequence and the structure profile of the target scaffold; (ii) secondary structure (SS) match between decoy and target scaffold; (iii) backbone torsion angle (TA) match between decoy and target; (iv) match of solvent accessibility (SA) of residues between decoy and target. If the target structure from user input is full-atomic, the SS, TA and SA on target are pre-assigned by DSSP program (15). If the scaffold is C-a only, an atomic model including backbone and side-chain heavy atoms is quickly constructed using the statistical parameters collected from the PDB (16), which is then fed into DSSP to assign the structural features. The SS, TA and SA value of decoy sequences is predicted from neural-network learning, which was mostly trained on the PSI-BLAST position-specific scoring matrix (PSSM) (17). As new decoys are generated at each step of movement, we trained the features separately on single-sequences, which is much faster than the PSSM predictors ( 5 min versus <1 s) but with comparable prediction accuracy. As an option, the EvoDesign server also allows users to select a physics-based potential, which will be linearly combined with the evolution-based energy terms. The FoldX (version 3.0b5) is exploited to count for the physics-based energy terms, including hydrogenbonding, electrostatics, van der Waals, steric, solvation (...truncated)


This is a preview of a remote PDF: https://nar.oxfordjournals.org/content/41/W1/W273.full.pdf
Article home page: http://nar.oxfordjournals.org/content/41/W1/W273.abstract

Pralay Mitra, David Shultis, Yang Zhang. EvoDesign: de novo protein design based on structural and evolutionary profiles, Nucleic Acids Research, 2013, pp. W273-W280, 41/W1, DOI: 10.1093/nar/gkt384