Rapid divergence of codon usage patterns within the rice genome (pdf)

Article PDF cannot be displayed. You can download it here:

http://www.biomedcentral.com/content/pdf/1471-2148-7-S1-S6.pdf

Rapid divergence of codon usage patterns within the rice genome

BMC Evolutionary Biology Rapid divergence of codon usage patterns within the rice genome Huai-Chun Wang1 and Donal A Hickey*2 0 Department of Biology, Concordia University , 7141 Sherbrooke West, Montreal, Quebec, H4B 1R6 , Canada 1 Department of Mathematics and Statistics, Dalhousie University , Halifax, Nova Scotia, B3H 2G1 , Canada Background: Synonymous codon usage varies widely between genomes, and also between genes within genomes. Although there is now a large body of data on variations in codon usage, it is still not clear if the observed patterns reflect the effects of positive Darwinian selection acting at the level of translational efficiency or whether these patterns are due simply to the effects of mutational bias. In this study, we have included both intra-genomic and inter-genomic comparisons of codon usage. This allows us to distinguish more efficiently between the effects of nucleotide bias and translational selection. Results: We show that there is an extreme degree of heterogeneity in codon usage patterns within the rice genome, and that this heterogeneity is highly correlated with differences in nucleotide content (particularly GC content) between the genes. In contrast to the situation observed within the rice genome, Arabidopsis genes show relatively little variation in both codon usage and nucleotide content. By exploiting a combination of intra-genomic and inter-genomic comparisons, we provide evidence that the differences in codon usage among the rice genes reflect a relatively rapid evolutionary increase in the GC content of some rice genes. We also noted that the degree of codon bias was negatively correlated with gene length. Conclusion: Our results show that mutational bias can cause a dramatic evolutionary divergence in codon usage patterns within a period of approximately two hundred million years. The heterogeneity of codon usage patterns within the rice genome can be explained by a balance between genome-wide mutational biases and negative selection against these biased mutations. The strength of the negative selection is proportional to the length of the coding sequences. Our results indicate that the large variations in synonymous codon usage are not related to selection acting on the translational efficiency of synonymous codons. <supplement><title><p>FirstInternationalConferenceonPhylogenomics</p></title><editor>HervPhilp e; MathieuBlanchete</editor><note>Proce dings</note></supplement> - Background Synonymous codon usage patterns can vary significantly among genomes [1,2]. In addition, one can also observe differences in synonymous codon usage among different genes within a single genome (e.g., [3,4]). For prokaryotes and unicellular eukaryotes such as yeast, the variation in codon usage within a genome is thought to be due to natural selection acting to optimize protein production [5-7]. Specifically, the most highly expressed genes use codons that are complementary to the most abundant tRNA anticodons (e.g., [8,9]). For multicellular eukaryotes, such as Drosophila melanogaster and Caenorhabditis elegans, there is also some evidence that codon bias might be caused by selection for translational efficiency [10,11]. For the majority of multicellular organisms, however, it has been difficult to explain codon usage variation within a genome in terms of natural selection. Instead, the codon usage in mammalian genes appears to be correlated with the GC content of the chromosomal region that contains the genes [12]. This correlation has generally been interpreted as meaning that the codon usage of mammalian genes reflects mutational bias, but a recent report [13] suggests that high GC content increases mRNA levels in mammalian cells. This would mean that selection for gene high expression is the primary factor determining the codon usage bias in this case. Thus, although the correlation between codon usage and nucleotide bias is well documented, the question of whether the nucleotide bias is a cause or a consequence of the biased codon usage remains a matter of debate. In this study, we examined the patterns of synonymous codon usage that are seen in the genomes of angiosperm plants. It is already known that monocot plant genomes have a higher average GC content than dicot genomes, and that this difference is reflected in an average difference in codon usage between monocots and dicots [14,15]. Here, we focused on the heterogeneity in synonymous codon usage within the rice genome. In particular, we looked for intra-genomic correlations between codon usage and nucleotide bias, and we compared the results found for the rice genes with the results for their homologs in the Arabidopsis genome. All of the previous studies of codon usage have focused on either: (i) the comparison of genes within a single genome (typically, a comparison of highly expressed genes and lowly expressed genes); or (ii) differences between genomes, such as differences in codon usage between prokaryotes and eukaryotes, or between thermophiles and mesophiles. Here, we have combined a study of contrasting patterns of codon usage within a genome (rice) with a comparison of homologous gene sequences between two genomes (rice and Arabidopsis). This "factorial" design allows for a number of unique controls in the interpretation of the data. dal distribution of GC content among the 14,005 rice genes, which is consistent with previous reports [15,17,23]. In contrast to this, the Arabidopsis genes are characterized by a unimodal distribution with a relatively low average value for GC content. In the Figure, the vertical line at 60% GC indicates the point at which we separated the rice genes into two classes: High GC genes and Low GC genes. The average GC content of these two classes, along with the average for all Arabidopsis genes is shown in Table 1. From the Table, we can see that the GC content of the Arabidopsis genes (44.5%) is comparable to that of the Low GC rice genes (50.1%). Table 1 also presents the data for the third positions of codons only. In this case, we see the same trends as for all of the codon positions, but the differences are much greater. For instance, the GC content of the third codon positions of the High GC rice genes (80.4%) is almost twice the values for the Arabidopsis genes (42.8%). Given that variations in codon usage will affect the third codon position primarily, this result leads us to expect significant differences in codon usage between the two classes of rice genes. We investigated this using Correspondence Analysis (see below). We also wished to investigate the possible clustering of GC-rich genes within the rice genome. To do this, we took a sample of two rice chromosomes and plotted the GC content at the third codon positions (GC3) against the position of the genes along the chromosome. For comparison, we did the same analysis for the GC3 content of Arabidopsis genes along the chromosome. The results (see [33]) show that gen (...truncated)