Validating a Coarse-Grained Potential Energy Function through Protein Loop Modelling
Citation: MacDonald JT, Kelley LA, Freemont PS (
Validating a Coarse-Grained Potential Energy Function through Protein Loop Modelling
James T. MacDonald 0
Lawrence A. Kelley 0
Paul S. Freemont 0
Narcis Fernandez-Fuentes, Aberystwyth University, United Kingdom
0 Division of Molecular Biosciences, Imperial College London , London , United Kingdom
Coarse-grained (CG) methods for sampling protein conformational space have the potential to increase computational efficiency by reducing the degrees of freedom. The gain in computational efficiency of CG methods often comes at the expense of non-protein like local conformational features. This could cause problems when transitioning to full atom models in a hierarchical framework. Here, a CG potential energy function was validated by applying it to the problem of loop prediction. A novel method to sample the conformational space of backbone atoms was benchmarked using a standard test set consisting of 351 distinct loops. This method used a sequence-independent CG potential energy function representing the protein using a-carbon positions only and sampling conformations with a Monte Carlo simulated annealing based protocol. Backbone atoms were added using a method previously described and then gradient minimised in the Rosetta force field. Despite the CG potential energy function being sequence-independent, the method performed similarly to methods that explicitly use either fragments of known protein backbones with similar sequences or residuespecific w/y-maps to restrict the search space. The method was also able to predict with sub-Angstrom accuracy two out of seven loops from recently solved crystal structures of proteins with low sequence and structure similarity to previously deposited structures in the PDB. The ability to sample realistic loop conformations directly from a potential energy function enables the incorporation of additional geometric restraints and the use of more advanced sampling methods in a way that is not possible to do easily with fragment replacement methods and also enable multi-scale simulations for protein design and protein structure prediction. These restraints could be derived from experimental data or could be design restraints in the case of computational protein design. C++ source code is available for download from http://www.sbg.bio.ic.ac.uk/ phyre2/PD2/.
-
Funding: JTM was funded by the EPSRC (EP/H019154/1). LAK is funded by the BBSRC (BB/J019240/1). The funders had no role in study design, data collection
and analysis, decision to publish, or preparation of the manuscript.
Competing Interests: The authors have declared that no competing interests exist.
The prediction of protein structure to atomic level resolution
and the design of de novo proteins with large scale backbone
sampling are largely unsolved problems although there has been a
great deal of progress in recent years. Both problems require the
ability to rapidly sample a large number of backbone
conformations. Sampling protein conformational space using full atom
models can be prohibitively computationally expensive so a variety
of different approaches have been developed to reduce the search
space. This can be achieved by using coarse-grained (CG) protein
models, by assembling backbone models from short fragments
taken from known protein structures or by a combination of both
of these methods.
Coarse-grained models have been increasingly used for
modelling large biomolecules over long time scales due to the
computational efficiency provided by these methods [13]. These
models vary in the degree of coarse-graining with some models
representing multiple amino acid residues with one interaction
centre [4], some representing each amino acid residue with a small
number of interaction centres [513], and others that are
intermediate between minimal and full atom models [1416].
Potential energy functions for CG models have been most
commonly derived using statistics from from the Protein Data
Bank (PDB) together with a suitable reference state [2]. Potential
energy functions derived this way are known as knowledge-based
or statistical potentials. It is also possible to derive CG potential
energy functions from physical principles [17].
While CG models in the past were mostly used as toy models to
study the general principles of protein folding [18,19] they are now
becoming sufficiently accurate and transferable to be used for
more directly useful applications. For example, CG models are
widely and successfully used in protein structure prediction
methods with both lattice models [6,8] and off-lattice methods
[2022]. CG models coupled with fragment replacement methods
have been particularly successful. Backbone fragments are
generally assembled in a Monte Carlo based procedure to
assemble a new overall fold. As well as reducing the search space,
these methods also have the advantage of guaranteeing models
that have protein-like local conformational features. When these
techniques are used for modelling loops, a loop closure method is
required to ensure that the end of the loops connect the anchor
residues in a geometrically correct way. Another disadvantage is
that it is not easy to sample conformations using fragment
replacement with additional restraints that could come from
experimental information or for protein design applications.
Fragment replacements are inherently non-local and highly
disruptive moves so acceptance rates can be very low with
additional restraints. It is also harder to use more advanced
sampling techniques such as metadynamics [23] or umbrella
sampling [24] as fragment replacement violates detailed balance in
most common implementations [25] and this would be even more
difficult when coupled with loop closure methods as is necessary in
loop modelling. The ability to sample loop conformations with
protein-like local structural features directly from a CG potential
energy function could be one way of avoiding these problems.
The accuracy of full-atom reconstruction depends on the level
of coarse-graining [16]. A number of methods have been
developed to rapidly reconstruct mainchain atoms from Ca atoms
[2629]. Sidechains can then be added to the mainchain using fast
rotamer-based methods [30,31]. When transitioning between CG
and full atom models it is important to retain good model structure
quality. However, even in many full atom molecular mechanics
force fields the modelling of backbone torsion angles has been
problematic but recently efforts have been made to address this
[32,33]. A key feature of the Ca CG potential used in this study is
its emphasis on protein-like local structure [11].
For most protein sequences, experimentally determined
structures of homologous sequences are available and can be used as
templates for accurate modelling [34,35]. These homology models
often have missing sections of the peptide chain where new
residues have been inserted during the course of evolution. In
these (...truncated)