Rapid divergence of codon usage patterns within the rice genome
BMC Evolutionary Biology
Rapid divergence of codon usage patterns within the rice genome Huai-Chun Wang1 and Donal A Hickey*2
0 Department of Biology, Concordia University , 7141 Sherbrooke West, Montreal, Quebec, H4B 1R6 , Canada
1 Department of Mathematics and Statistics, Dalhousie University , Halifax, Nova Scotia, B3H 2G1 , Canada
Background: Synonymous codon usage varies widely between genomes, and also between genes within genomes. Although there is now a large body of data on variations in codon usage, it is still not clear if the observed patterns reflect the effects of positive Darwinian selection acting at the level of translational efficiency or whether these patterns are due simply to the effects of mutational bias. In this study, we have included both intra-genomic and inter-genomic comparisons of codon usage. This allows us to distinguish more efficiently between the effects of nucleotide bias and translational selection. Results: We show that there is an extreme degree of heterogeneity in codon usage patterns within the rice genome, and that this heterogeneity is highly correlated with differences in nucleotide content (particularly GC content) between the genes. In contrast to the situation observed within the rice genome, Arabidopsis genes show relatively little variation in both codon usage and nucleotide content. By exploiting a combination of intra-genomic and inter-genomic comparisons, we provide evidence that the differences in codon usage among the rice genes reflect a relatively rapid evolutionary increase in the GC content of some rice genes. We also noted that the degree of codon bias was negatively correlated with gene length. Conclusion: Our results show that mutational bias can cause a dramatic evolutionary divergence in codon usage patterns within a period of approximately two hundred million years. The heterogeneity of codon usage patterns within the rice genome can be explained by a balance between genome-wide mutational biases and negative selection against these biased mutations. The strength of the negative selection is proportional to the length of the coding sequences. Our results indicate that the large variations in synonymous codon usage are not related to selection acting on the translational efficiency of synonymous codons.
<supplement><title><p>FirstInternationalConferenceonPhylogenomics</p></title><editor>HervPhilp e; MathieuBlanchete</editor><note>Proce dings</note></supplement>
-
Background
Synonymous codon usage patterns can vary significantly
among genomes [1,2]. In addition, one can also observe
differences in synonymous codon usage among different
genes within a single genome (e.g., [3,4]). For prokaryotes
and unicellular eukaryotes such as yeast, the variation in
codon usage within a genome is thought to be due to
natural selection acting to optimize protein production [5-7].
Specifically, the most highly expressed genes use codons
that are complementary to the most abundant tRNA
anticodons (e.g., [8,9]). For multicellular eukaryotes, such as
Drosophila melanogaster and Caenorhabditis elegans, there is
also some evidence that codon bias might be caused by
selection for translational efficiency [10,11]. For the
majority of multicellular organisms, however, it has been
difficult to explain codon usage variation within a
genome in terms of natural selection. Instead, the codon
usage in mammalian genes appears to be correlated with
the GC content of the chromosomal region that contains
the genes [12]. This correlation has generally been
interpreted as meaning that the codon usage of mammalian
genes reflects mutational bias, but a recent report [13]
suggests that high GC content increases mRNA levels in
mammalian cells. This would mean that selection for gene
high expression is the primary factor determining the
codon usage bias in this case. Thus, although the
correlation between codon usage and nucleotide bias is well
documented, the question of whether the nucleotide bias is a
cause or a consequence of the biased codon usage remains
a matter of debate.
In this study, we examined the patterns of synonymous
codon usage that are seen in the genomes of angiosperm
plants. It is already known that monocot plant genomes
have a higher average GC content than dicot genomes,
and that this difference is reflected in an average difference
in codon usage between monocots and dicots [14,15].
Here, we focused on the heterogeneity in synonymous
codon usage within the rice genome. In particular, we
looked for intra-genomic correlations between codon
usage and nucleotide bias, and we compared the results
found for the rice genes with the results for their
homologs in the Arabidopsis genome. All of the previous
studies of codon usage have focused on either: (i) the
comparison of genes within a single genome (typically, a
comparison of highly expressed genes and lowly
expressed genes); or (ii) differences between genomes,
such as differences in codon usage between prokaryotes
and eukaryotes, or between thermophiles and
mesophiles. Here, we have combined a study of contrasting
patterns of codon usage within a genome (rice) with a
comparison of homologous gene sequences between two
genomes (rice and Arabidopsis). This "factorial" design
allows for a number of unique controls in the
interpretation of the data.
dal distribution of GC content among the 14,005 rice
genes, which is consistent with previous reports
[15,17,23]. In contrast to this, the Arabidopsis genes are
characterized by a unimodal distribution with a relatively
low average value for GC content. In the Figure, the
vertical line at 60% GC indicates the point at which we
separated the rice genes into two classes: High GC genes and
Low GC genes. The average GC content of these two
classes, along with the average for all Arabidopsis genes is
shown in Table 1. From the Table, we can see that the GC
content of the Arabidopsis genes (44.5%) is comparable to
that of the Low GC rice genes (50.1%).
Table 1 also presents the data for the third positions of
codons only. In this case, we see the same trends as for all
of the codon positions, but the differences are much
greater. For instance, the GC content of the third codon
positions of the High GC rice genes (80.4%) is almost
twice the values for the Arabidopsis genes (42.8%). Given
that variations in codon usage will affect the third codon
position primarily, this result leads us to expect significant
differences in codon usage between the two classes of rice
genes. We investigated this using Correspondence
Analysis (see below).
We also wished to investigate the possible clustering of
GC-rich genes within the rice genome. To do this, we took
a sample of two rice chromosomes and plotted the GC
content at the third codon positions (GC3) against the
position of the genes along the chromosome. For
comparison, we did the same analysis for the GC3 content of
Arabidopsis genes along the chromosome. The results (see
[33]) show that gen (...truncated)