Cross-species comparison of site-specific evolutionary-rate variation in influenza haemagglutinin

Philosophical Transactions of the Royal Society B: Biological Sciences, Mar 2013

We investigate the causes of site-specific evolutionary-rate variation in influenza haemagglutinin (HA) between human and avian influenza, for subtypes H1, H3, and H5. By calculating the evolutionary-rate ratio, ω = dN/dS as a function of a residue's solvent accessibility in the three-dimensional protein structure, we show that solvent accessibility has a significant but relatively modest effect on site-specific rate variation. By comparing rates within HA subtypes among host species, we derive an upper limit to the amount of variation that can be explained by structural constraints of any kind. Protein structure explains only 20–40% of the variation in ω. Finally, by comparing ω at sites near the sialic-acid-binding region to ω at other sites, we show that ω near the sialic-acid-binding region is significantly elevated in both human and avian influenza, with the exception of avian H5. We conclude that protein structure, HA subtype, and host biology all impose distinct selection pressures on sites in influenza HA.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://rstb.royalsocietypublishing.org/content/368/1614/20120334.full.pdf

Cross-species comparison of site-specific evolutionary-rate variation in influenza haemagglutinin

Austin G. Meyer 0 Eric T. Dawson 0 Claus O. Wilke 0 0 Section of Integrative Biology, Institute for Cellular and Molecular Biology, and Center for Computational Biology and Bioinformatics, The University of Texas , Austin, Austin, TX 78731 , USA We investigate the causes of site-specific evolutionary-rate variation in influenza haemagglutinin (HA) between human and avian influenza, for subtypes H1, H3, and H5. By calculating the evolutionary-rate ratio, v dN/dS as a function of a residue's solvent accessibility in the three-dimensional protein structure, we show that solvent accessibility has a significant but relatively modest effect on site-specific rate variation. By comparing rates within HA subtypes among host species, we derive an upper limit to the amount of variation that can be explained by structural constraints of any kind. Protein structure explains only 20 - 40% of the variation in v. Finally, by comparing v at sites near the sialic-acid-binding region to v at other sites, we show that v near the sialic-acid-binding region is significantly elevated in both human and avian influenza, with the exception of avian H5. We conclude that protein structure, HA subtype, and host biology all impose distinct selection pressures on sites in influenza HA. Research 1. Introduction Viral proteins are highly variable at the sequence level; they accumulate amino acid substitutions at a rapid pace [1,2]. Yet their structures tend to be fairly conserved. Highly variable surface regions notwithstanding, most viral proteins need to maintain a specific structure to carry out their function in the viral life cycle [3]. The generally accepted picture is that sites in the protein core maintain the overall protein structure and are, therefore, most conserved. Sites on the surface are less critical to the protein structure and hence more free to vary, for example in response to selection pressures imposed by immune response. This view is based on the finding, replicated in widely differing organisms and using many different techniques, that, on average, sequence variability increases the closer a site is located towards the surface of a protein [4 13]. More specifically, in influenza, exposed sites in haemagglutinin (HA) and neuraminidase have been found to evolve faster than buried sites in these proteins [14,15]. Thus, prior work has clearly established that protein structure influences site variability. What is less clear, however, is the magnitude of this effect. Is knowing a site is buried sufficient to predict that the site will be evolutionarily conserved, or are other factors stronger driving forces for site-specific evolutionary rates? And similarly, will homologous sites in related but distinct viral strains evolve at similar rates, or do the nature of the viral strain and the infected host organism impose stronger influences on site-specific evolutionary rates than the location of a site in the protein structure? Here, we address these questions for influenza HA. We compare per-site sequence evolution for two different host species (human and avian) and three HA subtypes (H1, H3 and H5), and ask the following questions: (i) To what extent is rate variation determined by the location of a site in the structure, as measured by the sites relative solvent accessibility (RSA)? (ii) To what extent is rate variation conserved within HA subtypes among viruses infecting different host species? (iii) Are v dN/dS ratios elevated near the active site (the sialic-acid-binding region, SABR) of HA? We find that protein structure, HA subtype and host biology affect rate variation in influenza HA. 2. Material and methods (a) Sequence preparation We obtained sequences for HA subtypes H1, H3 and H5 for human and avian hosts from the Influenza Research Database [16]. Using the built-in curating tools of the database, we carefully selected subsets of sequences that corresponded as much as possible to well-defined and distinct viral populations. Sequences were curated within each host species depending on its subtype. In particular, for each combination of HA subtype and host species, we considered only sequences that could be linked to a specific neuraminidase subtype. Human H1 sequences were obtained from H1N1 strains isolated between 1977 (after the Fort Dix outbreak) and 2008 (before the 2009 flu pandemic). H1N1 strains since 2009 are not direct descendants of H1N1 strains before 2009 and thus were excluded. We found 2057 distinct H1 sequences. Human H3 sequences were obtained from H3N2 strains isolated between 1968 and 2012. We found 8315 distinct sequences. Human H5 sequences were obtained from H5N1 sequences without date restriction. We found 297 distinct sequences. Avian sequences were curated by subtype with no restrictions placed on the date range; full datasets from FluDB of H1N1, H3N2 and H5N1 sequences were used. We found 106, 115 and 2684 distinct sequences, respectively. To align sequences and map them (...truncated)


This is a preview of a remote PDF: https://rstb.royalsocietypublishing.org/content/368/1614/20120334.full.pdf

Austin G. Meyer, Eric T. Dawson, Claus O. Wilke. Cross-species comparison of site-specific evolutionary-rate variation in influenza haemagglutinin, Philosophical Transactions of the Royal Society B: Biological Sciences, 2013, 368/1614, DOI: 10.1098/rstb.2012.0334