Validating a Coarse-Grained Potential Energy Function through Protein Loop Modelling (pdf)

Article PDF cannot be displayed. You can download it here:

https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0065770&type=printable

Validating a Coarse-Grained Potential Energy Function through Protein Loop Modelling

Citation: MacDonald JT, Kelley LA, Freemont PS ( Validating a Coarse-Grained Potential Energy Function through Protein Loop Modelling James T. MacDonald 0 Lawrence A. Kelley 0 Paul S. Freemont 0 Narcis Fernandez-Fuentes, Aberystwyth University, United Kingdom 0 Division of Molecular Biosciences, Imperial College London , London , United Kingdom Coarse-grained (CG) methods for sampling protein conformational space have the potential to increase computational efficiency by reducing the degrees of freedom. The gain in computational efficiency of CG methods often comes at the expense of non-protein like local conformational features. This could cause problems when transitioning to full atom models in a hierarchical framework. Here, a CG potential energy function was validated by applying it to the problem of loop prediction. A novel method to sample the conformational space of backbone atoms was benchmarked using a standard test set consisting of 351 distinct loops. This method used a sequence-independent CG potential energy function representing the protein using a-carbon positions only and sampling conformations with a Monte Carlo simulated annealing based protocol. Backbone atoms were added using a method previously described and then gradient minimised in the Rosetta force field. Despite the CG potential energy function being sequence-independent, the method performed similarly to methods that explicitly use either fragments of known protein backbones with similar sequences or residuespecific w/y-maps to restrict the search space. The method was also able to predict with sub-Angstrom accuracy two out of seven loops from recently solved crystal structures of proteins with low sequence and structure similarity to previously deposited structures in the PDB. The ability to sample realistic loop conformations directly from a potential energy function enables the incorporation of additional geometric restraints and the use of more advanced sampling methods in a way that is not possible to do easily with fragment replacement methods and also enable multi-scale simulations for protein design and protein structure prediction. These restraints could be derived from experimental data or could be design restraints in the case of computational protein design. C++ source code is available for download from http://www.sbg.bio.ic.ac.uk/ phyre2/PD2/. - Funding: JTM was funded by the EPSRC (EP/H019154/1). LAK is funded by the BBSRC (BB/J019240/1). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. The prediction of protein structure to atomic level resolution and the design of de novo proteins with large scale backbone sampling are largely unsolved problems although there has been a great deal of progress in recent years. Both problems require the ability to rapidly sample a large number of backbone conformations. Sampling protein conformational space using full atom models can be prohibitively computationally expensive so a variety of different approaches have been developed to reduce the search space. This can be achieved by using coarse-grained (CG) protein models, by assembling backbone models from short fragments taken from known protein structures or by a combination of both of these methods. Coarse-grained models have been increasingly used for modelling large biomolecules over long time scales due to the computational efficiency provided by these methods [13]. These models vary in the degree of coarse-graining with some models representing multiple amino acid residues with one interaction centre [4], some representing each amino acid residue with a small number of interaction centres [513], and others that are intermediate between minimal and full atom models [1416]. Potential energy functions for CG models have been most commonly derived using statistics from from the Protein Data Bank (PDB) together with a suitable reference state [2]. Potential energy functions derived this way are known as knowledge-based or statistical potentials. It is also possible to derive CG potential energy functions from physical principles [17]. While CG models in the past were mostly used as toy models to study the general principles of protein folding [18,19] they are now becoming sufficiently accurate and transferable to be used for more directly useful applications. For example, CG models are widely and successfully used in protein structure prediction methods with both lattice models [6,8] and off-lattice methods [2022]. CG models coupled with fragment replacement methods have been particularly successful. Backbone fragments are generally assembled in a Monte Carlo based procedure to assemble a new overall fold. As well as reducing the search space, these methods also have the advantage of guaranteeing models that have protein-like local conformational features. When these techniques are used for modelling loops, a loop closure method is required to ensure that the end of the loops connect the anchor residues in a geometrically correct way. Another disadvantage is that it is not easy to sample conformations using fragment replacement with additional restraints that could come from experimental information or for protein design applications. Fragment replacements are inherently non-local and highly disruptive moves so acceptance rates can be very low with additional restraints. It is also harder to use more advanced sampling techniques such as metadynamics [23] or umbrella sampling [24] as fragment replacement violates detailed balance in most common implementations [25] and this would be even more difficult when coupled with loop closure methods as is necessary in loop modelling. The ability to sample loop conformations with protein-like local structural features directly from a CG potential energy function could be one way of avoiding these problems. The accuracy of full-atom reconstruction depends on the level of coarse-graining [16]. A number of methods have been developed to rapidly reconstruct mainchain atoms from Ca atoms [2629]. Sidechains can then be added to the mainchain using fast rotamer-based methods [30,31]. When transitioning between CG and full atom models it is important to retain good model structure quality. However, even in many full atom molecular mechanics force fields the modelling of backbone torsion angles has been problematic but recently efforts have been made to address this [32,33]. A key feature of the Ca CG potential used in this study is its emphasis on protein-like local structure [11]. For most protein sequences, experimentally determined structures of homologous sequences are available and can be used as templates for accurate modelling [34,35]. These homology models often have missing sections of the peptide chain where new residues have been inserted during the course of evolution. In these (...truncated)