Maximum Allowed Solvent Accessibilites of Residues in Proteins
Citation: Tien MZ, Meyer AG, Sydykova DK, Spielman SJ, Wilke CO (
Maximum Allowed Solvent Accessibilites of Residues in Proteins
Matthew Z. Tien 0
Austin G. Meyer 0
Dariya K. Sydykova 0
Stephanie J. Spielman 0
Claus O. Wilke 0
Alexey Porollo, Cincinnati Childrens Hospital Medical Center, United States of America
0 1 Department of Biochemistry & Molecular Biology, The University of Chicago , Chicago , Illinois, United States of America, 2 School of Medicine, Texas Tech University Health Sciences Center, Lubbock, Texas, United States of America, 3 Institute for Cellular and Molecular Biology, The University of Texas at Austin , Austin, Texas , United States of America, 4 Center for Computational Biology and Bioinformatics, The University of Texas at Austin , Austin, Texas , United States of America, 5 Department of Integrative Biology, The University of Texas at Austin , Austin, Texas , United States of America
The relative solvent accessibility (RSA) of a residue in a protein measures the extent of burial or exposure of that residue in the 3D structure. RSA is frequently used to describe a protein's biophysical or evolutionary properties. To calculate RSA, a residue's solvent accessibility (ASA) needs to be normalized by a suitable reference value for the given amino acid; several normalization scales have previously been proposed. However, these scales do not provide tight upper bounds on ASA values frequently observed in empirical crystal structures. Instead, they underestimate the largest allowed ASA values, by up to 20%. As a result, many empirical crystal structures contain residues that seem to have RSA values in excess of one. Here, we derive a new normalization scale that does provide a tight upper bound on observed ASA values. We pursue two complementary strategies, one based on extensive analysis of empirical structures and one based on systematic enumeration of biophysically allowed tripeptides. Both approaches yield congruent results that consistently exceed published values. We conclude that previously published ASA normalization values were too small, primarily because the conformations that maximize ASA had not been correctly identified. As an application of our results, we show that empirically derived hydrophobicity scales are sensitive to accurate RSA calculation, and we derive new hydrophobicity scales that show increased correlation with experimentally measured scales.
-
Funding: This work was supported by National Institutes of Health (http://nih.gov/) grant R01 GM088344 to COW. The funders had no role in study design, data
collection and analysis, decision to publish, or preparation of the manuscript.
Competing Interests: The authors have declared that no competing interests exist.
Relative solvent accessibility (RSA) has emerged as a commonly
used metric describing protein structure in computational
molecular biology, with the particular application of identifying buried
or exposed residues. It is defined as a residues solvent accessibility
(ASA) normalized by a suitable maximum value for that residue.
RSA was first introduced in the context of hydrophobicity scales
derived by computational means from protein crystal structures
[15]. More recently, RSA has been shown to correlate with
protein evolutionary rates and has been incorporated as a
parameter into models which determine these rates [613]. As
RSA straightforwardly characterizes the local environment of
residues in protein structures, many studies have developed
computational methods to predict RSA from protein primary
and/or secondary structure [1420]. Further applications of RSA
include identification of surface, interior, and interface regions in
proteins [21], protein-domain prediction [22], and prediction of
deleterious mutations [23].
To derive a residues RSA from its surface area, an ASA
normalization factor is needed for each amino acid. By
convention, these normalization values have been derived by
evaluating the surface area around a residue of interest X when
placed between two glycines, to form a Gly-X-Gly tripeptide. Most
commonly, the normalization values utilized are those previously
calculated by either Rose et al. [2] or Miller et al. [3]. The primary
distinction between these two sets of normalization values lies in
the different q and y dihedral backbone angles chosen when
evaluating Gly-X-Gly tripeptide conformations. Rose et al. [2]
considered tripeptides with backbone angles representing an
average of observed q and y angles, whereas Miller et al. [3]
considered tripeptides in the extended conformation (q~{1200,
y~1400).
As the number of empirically determined 3D protein crystal
structures has grown over the years, it has become apparent that
neither the Rose [2] nor the Miller [3] scale accurately identifies
the true upper bound for a residues ASA. In fact, virtually all
amino acids display, on occasion, ASA values in excess of the
normalization ASA values provided by either scale. Some do so
quite frequently (e.g. R, D, G, K, P), reaching RSA values of up to
1.2. This discrepancy, which leads to RSA values w1, is generally
known in the field though rarely acknowledged in print. One
exception is a recent study that carried out an extensive empirical
survey of ASA values in PDB structures [20]. That study found
that the most accessible conformations are generally found in loops
and turns, not in the extended conformation, and it suggested to
use conformation-dependent maximum ASA values for
normalization [20].
Here, we derive a new set of ASA normalization values that
provide a tight upper bound on ASA values observed in
biophysically realistic tripeptide conformations. To calculate these
normalization values, we pursue two complementary strategies
one empirical and one theoretical. For the empirical approach, we
mined thousands of 3D crystal structures and recorded the
maximum ASA values we found for each amino acid across all
structures. For the theoretical approach, we computationally built
Gly-X-Gly tripeptides and systematically evaluated all
biophysically allowed conformations to determine a maximum theoretical
ASA value. These two strategies yield congruent results and
ultimately produce comparable normalization scales that tightly
bound ASA for all 20 amino acids. We then return to the historic
motivation for RSA and investigate the implications of our results
for hydrophobicity scales. We find that ASA normalization affects
the performance of empirically derived hydrophobicity scales, and
we propose new scales that show improved correlation with
experimentally measured scales.
Published ASA normalization values are too small
We initially assessed the accuracy of Roses [2] and Millers [3]
ASA normalization scales through an exhaustive survey of the
ASA values found in experimentally determined protein
structures. We obtained a list of 3197 high-quality PDB structures from
the PISCES server [24]. We then calculated ASA for each residue
in all 3197 (...truncated)