Maximum Allowed Solvent Accessibilites of Residues in Proteins

PLOS ONE, Dec 2019

The relative solvent accessibility (RSA) of a residue in a protein measures the extent of burial or exposure of that residue in the 3D structure. RSA is frequently used to describe a protein's biophysical or evolutionary properties. To calculate RSA, a residue's solvent accessibility (ASA) needs to be normalized by a suitable reference value for the given amino acid; several normalization scales have previously been proposed. However, these scales do not provide tight upper bounds on ASA values frequently observed in empirical crystal structures. Instead, they underestimate the largest allowed ASA values, by up to 20%. As a result, many empirical crystal structures contain residues that seem to have RSA values in excess of one. Here, we derive a new normalization scale that does provide a tight upper bound on observed ASA values. We pursue two complementary strategies, one based on extensive analysis of empirical structures and one based on systematic enumeration of biophysically allowed tripeptides. Both approaches yield congruent results that consistently exceed published values. We conclude that previously published ASA normalization values were too small, primarily because the conformations that maximize ASA had not been correctly identified. As an application of our results, we show that empirically derived hydrophobicity scales are sensitive to accurate RSA calculation, and we derive new hydrophobicity scales that show increased correlation with experimentally measured scales.

Maximum Allowed Solvent Accessibilites of Residues in Proteins

Citation: Tien MZ, Meyer AG, Sydykova DK, Spielman SJ, Wilke CO ( Maximum Allowed Solvent Accessibilites of Residues in Proteins Matthew Z. Tien 0 Austin G. Meyer 0 Dariya K. Sydykova 0 Stephanie J. Spielman 0 Claus O. Wilke 0 Alexey Porollo, Cincinnati Childrens Hospital Medical Center, United States of America 0 1 Department of Biochemistry & Molecular Biology, The University of Chicago , Chicago , Illinois, United States of America, 2 School of Medicine, Texas Tech University Health Sciences Center, Lubbock, Texas, United States of America, 3 Institute for Cellular and Molecular Biology, The University of Texas at Austin , Austin, Texas , United States of America, 4 Center for Computational Biology and Bioinformatics, The University of Texas at Austin , Austin, Texas , United States of America, 5 Department of Integrative Biology, The University of Texas at Austin , Austin, Texas , United States of America The relative solvent accessibility (RSA) of a residue in a protein measures the extent of burial or exposure of that residue in the 3D structure. RSA is frequently used to describe a protein's biophysical or evolutionary properties. To calculate RSA, a residue's solvent accessibility (ASA) needs to be normalized by a suitable reference value for the given amino acid; several normalization scales have previously been proposed. However, these scales do not provide tight upper bounds on ASA values frequently observed in empirical crystal structures. Instead, they underestimate the largest allowed ASA values, by up to 20%. As a result, many empirical crystal structures contain residues that seem to have RSA values in excess of one. Here, we derive a new normalization scale that does provide a tight upper bound on observed ASA values. We pursue two complementary strategies, one based on extensive analysis of empirical structures and one based on systematic enumeration of biophysically allowed tripeptides. Both approaches yield congruent results that consistently exceed published values. We conclude that previously published ASA normalization values were too small, primarily because the conformations that maximize ASA had not been correctly identified. As an application of our results, we show that empirically derived hydrophobicity scales are sensitive to accurate RSA calculation, and we derive new hydrophobicity scales that show increased correlation with experimentally measured scales. - Funding: This work was supported by National Institutes of Health (http://nih.gov/) grant R01 GM088344 to COW. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. Relative solvent accessibility (RSA) has emerged as a commonly used metric describing protein structure in computational molecular biology, with the particular application of identifying buried or exposed residues. It is defined as a residues solvent accessibility (ASA) normalized by a suitable maximum value for that residue. RSA was first introduced in the context of hydrophobicity scales derived by computational means from protein crystal structures [15]. More recently, RSA has been shown to correlate with protein evolutionary rates and has been incorporated as a parameter into models which determine these rates [613]. As RSA straightforwardly characterizes the local environment of residues in protein structures, many studies have developed computational methods to predict RSA from protein primary and/or secondary structure [1420]. Further applications of RSA include identification of surface, interior, and interface regions in proteins [21], protein-domain prediction [22], and prediction of deleterious mutations [23]. To derive a residues RSA from its surface area, an ASA normalization factor is needed for each amino acid. By convention, these normalization values have been derived by evaluating the surface area around a residue of interest X when placed between two glycines, to form a Gly-X-Gly tripeptide. Most commonly, the normalization values utilized are those previously calculated by either Rose et al. [2] or Miller et al. [3]. The primary distinction between these two sets of normalization values lies in the different q and y dihedral backbone angles chosen when evaluating Gly-X-Gly tripeptide conformations. Rose et al. [2] considered tripeptides with backbone angles representing an average of observed q and y angles, whereas Miller et al. [3] considered tripeptides in the extended conformation (q~{1200, y~1400). As the number of empirically determined 3D protein crystal structures has grown over the years, it has become apparent that neither the Rose [2] nor the Miller [3] scale accurately identifies the true upper bound for a residues ASA. In fact, virtually all amino acids display, on occasion, ASA values in excess of the normalization ASA values provided by either scale. Some do so quite frequently (e.g. R, D, G, K, P), reaching RSA values of up to 1.2. This discrepancy, which leads to RSA values w1, is generally known in the field though rarely acknowledged in print. One exception is a recent study that carried out an extensive empirical survey of ASA values in PDB structures [20]. That study found that the most accessible conformations are generally found in loops and turns, not in the extended conformation, and it suggested to use conformation-dependent maximum ASA values for normalization [20]. Here, we derive a new set of ASA normalization values that provide a tight upper bound on ASA values observed in biophysically realistic tripeptide conformations. To calculate these normalization values, we pursue two complementary strategies one empirical and one theoretical. For the empirical approach, we mined thousands of 3D crystal structures and recorded the maximum ASA values we found for each amino acid across all structures. For the theoretical approach, we computationally built Gly-X-Gly tripeptides and systematically evaluated all biophysically allowed conformations to determine a maximum theoretical ASA value. These two strategies yield congruent results and ultimately produce comparable normalization scales that tightly bound ASA for all 20 amino acids. We then return to the historic motivation for RSA and investigate the implications of our results for hydrophobicity scales. We find that ASA normalization affects the performance of empirically derived hydrophobicity scales, and we propose new scales that show improved correlation with experimentally measured scales. Published ASA normalization values are too small We initially assessed the accuracy of Roses [2] and Millers [3] ASA normalization scales through an exhaustive survey of the ASA values found in experimentally determined protein structures. We obtained a list of 3197 high-quality PDB structures from the PISCES server [24]. We then calculated ASA for each residue in all 3197 (...truncated)


This is a preview of a remote PDF: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0080635&type=printable
Article home page: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0080635

Matthew Z. Tien, Austin G. Meyer, Dariya K. Sydykova, Stephanie J. Spielman, Claus O. Wilke. Maximum Allowed Solvent Accessibilites of Residues in Proteins, PLOS ONE, 2013, Volume 8, Issue 11, DOI: 10.1371/journal.pone.0080635