Quantifying side-chain conformational variations in protein structure
www.nature.com/scientificreports
OPEN
received: 29 March 2016
accepted: 24 October 2016
Published: 15 November 2016
Quantifying side-chain
conformational variations in
protein structure
Zhichao Miao1,2,3 & Yang Cao4
Protein side-chain conformation is closely related to their biological functions. The side-chain prediction
is a key step in protein design, protein docking and structure optimization. However, side-chain
polymorphism comprehensively exists in protein as various types and has been long overlooked by
side-chain prediction. But such conformational variations have not been quantitatively studied and the
correlations between these variations and residue features are vague. Here, we performed statistical
analyses on large scale data sets and found that the side-chain conformational flexibility is closely
related to the exposure to solvent, degree of freedom and hydrophilicity. These analyses allowed us
to quantify different types of side-chain variabilities in PDB. The results underscore that protein sidechain conformation prediction is not a single-answer problem, leading us to reconsider the assessment
approaches of side-chain prediction programs.
Protein side-chain conformations have been shown to be closely related to protein mutations1. The protein interactions with proteins, RNA/DNA or ligands are mainly mediated by side-chain contacts. During these functional
steps, some critical side-chains may change their conformations to adapt to the shape and character of its interaction partner. The ‘induced-and-fit’ model2 gave us plenty of examples inferring the importance of side-chain
conformational change. Therefore, side-chain conformational changes, or side-chain polymorphism, could be
closely related to protein functions. Nevertheless, the side-chain variations have not yet been quantitatively analyzed, the understanding of side-chain conformational variation is still intuitive and not systematic. Side-chain
conformation prediction, or side-chain packing, has been a well-established problem in computational biology
and involved in diverse applications, such as protein folding3, docking4, design5, engineering6 and structure optimization7. Various new programs8–12, which do not consider conformational change, have been proposed in
recent years. Laleh et al.13 tried to predict side-chain conformation with polymorphism, but covered only a simple
type of polymorphism in small scale data. Quantifying the side-chain variations can directly contribute to our
understanding of structural essence of side-chain conformation and improving side-chain packing programs.
A protein structure solved by X-ray crystallography or cyro-EM is normally considered as a unique 3D conformation of the molecule in the defined condition. Molecules deposited in Protein Data Bank (PDB)14 generally
include only unique sets of coordinates to demonstrate the 3D structures, which lead to the unique answer as
‘gold standard’ in structure prediction, such as protein structure prediction (CASP)15 and side-chain packing16.
Protein side-chain conformations are usually clustered into rotamers, which are rigid conformations represented
by discrete side-chain dihedral angles. However, some clues have led us to consider the variation and polymorphism17,18 of the protein side-chains: (i) the temperature factor (or B factor) of an atom describes the attenuation
of X-ray scattering caused by thermal motion, and thus can be taken to indicate the relative vibrational motion
of the atom; (ii) the alternate location and atom occupancy columns in PDB describe the probability of possible
conformational states. Although a molecule keeps its global topology in a defined environment, the side-chain
conformations are not constrained and could be flexible. With more and more crystal structures being solved,
1
Architecture et Réactivité de l′ARN, Université de Strasbourg, Institut de biologie moléculaire et cellulaire du CNRS,
67000 Strasbourg, France. 2European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome
Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK. 3Wellcome Trust Sanger Institute, Wellcome Trust
Genome Campus, Hinxton, Cambridge CB10 1SA, UK. 4Center of Growth, Metabolism and Aging, Key Laboratory
of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences and State Key Laboratory
of Biotherapy, Sichuan University, Chengdu, 610014, China. Correspondence and requests for materials should be
addressed to Z.M. (email: ) or Y.C. (email: )
Scientific Reports | 6:37024 | DOI: 10.1038/srep37024
1
www.nature.com/scientificreports/
we now can find structural differences of the same protein either in the same or in different crystals. Hence,
side-chain conformational variations are detected through comparing these structures of the same protein.
The first step to understand side-chain conformation is to know their immanent conformational variability
in unbound state. In this work, we focus on understanding the conformational variation of protein side-chains,
which do not bind nucleic acid chains. First, we described four different classes of side-chain conformations
observed in crystal structures. Then, we carefully curated and analyzed several datasets of protein structures
to quantify: (1) the reliabilities of the atom coordinates according to electron density; (2) the alternate location defined side-chain conformation variations; (3) side-chain conformation variations in either the same or
different crystal structures and (4) the influences of backbone structure deviations and sequence mutations to
side-chain variations. This work is the first quantitative analysis of side-chain conformational variations based
on experimentally reliable data, providing useful knowledge of side-chain flexibility to side-chain prediction and
its assessment.
Till now, protein side-chain packing methods have been widely benchmarked, considering the crystal environment19 or residue environments20. According to the conclusions of this analysis, we realized that the protein side-chain conformation prediction is not a single-answer problem and an urgent need is to reconsider the
problem of side-chain packing and side-chain prediction assessment. Therefore, we propose novel approaches
and large scale datasets in benchmarking side-chain packing programs considering different types of side-chain
conformational variations. In perspective, this work can help us (1) to understand the flexibility of protein
side-chains; (2) to optimize the side-chain prediction programs and help optimizing cryo-EM structures and (3)
to relate side-chain conformational changes to protein functions in later researches.
Results
Side-chain conformation models.
The coordinates of crystal structures are determined according to the
electron density maps. However, some electron density maps tend to cover a larger region than a unique position.
Side-chain conformations in proteins may adopt more than on (...truncated)