Dual Modes of Natural Selection on Upstream Open Reading Frames
Daniel E. Neafsey
0
James E. Galagan
0
0
Microbial Analysis Group,
Broad Institute of MIT and Harvard
,
Cambridge, Massachusetts
Upstream open reading frames (uORFs) are common features of eukaryotic genes, occurring in 10%-25% of 5# leader sequences. Upstream ORFs that have been subjected to experimental analysis have been generally found to decrease translational efficiency of the downstream coding sequence. Previous investigations of uORFs in mammals and yeast have detected uORFs conserved over long evolutionary distances, prompting speculation about the nature and cause of the natural selection underlying such conservation. We have analyzed uORFs in the basidiomycetous fungal pathogen Cryptococcus neoformans to discern the properties of this purifying selection. We find that uORFs in the Cryptococcus species complex are conserved at twice the expected rate, and we report 122 uORFs that are conserved among all four sequenced Cryptococcus strains. A significantly greater proportion of uORF losses occur via direct mutation to the uORF start codon than expected. This observation suggests that mutational disruption of a uORF that leaves the start codon intact may be selectively disadvantageous, perhaps because of the risk of premature translation initiation. Accounting for this constrained mode of loss and comparing the relative conservation of uORFs between the 5# leader and control sequences enables us to calculate that at least a third of uORFs may be conserved for their effects on translational efficiency. The remaining fraction may be conserved either by chance or as a result of selective pressure to prevent premature translation initiation from the uORF start codon. We find that the majority of conserved uORFs do not exhibit codon usage bias or conservation at the amino acid level, and therefore they do not likely encode bioactive peptides. Our analysis suggests that uORFs are an important and underappreciated mechanism of post-transcriptional gene regulation in eukaryotes.
Introduction
Microarrays have given the biological community
abundant genome-wide data on rates of DNA transcription.
The relative ease with which microarray data can now be
acquired should not obscure the fact that transcription is not
synonymous with expression. Indeed, there is growing
evidence of significant variation in mRNA transcript half-life
(Wang et al. 2002) and translational efficiency among genes
(Serikawa et al. 2003; MacKay et al. 2004). To make fullest
use of transcriptional data, then, it is imperative to
understand what factors may intercede at the translation stage to
decouple levels of transcription and expression.
Short open reading frames in the 5# leader sequence of
genes called upstream open reading frames (uORFs) are
known to affect the translational efficiency of many
eukaryotic genes (Morris and Geballe 2000; Meijer and Thomas
2002; Vilela and McCarthy 2003). Upstream ORFs are
common genomic features, with estimates of uORF
incidence in mammalian genes ranging as high as 25% (Crowe,
Wang, and Rothnagel 2006) and 10%22% of fungal genes
(Galagan et al. 2005). Although some uORFs may augment
expression by obscuring other cis-acting inhibitory
elements (Geballe and Sachs 2000), most experimentally
tested eukaryotic uORFs are translational repressors.
Upstream ORFs have been shown to affect translational
efficiency negatively through a variety of means, including
ribosome-blocking by the encoded peptide, ribosome
stalling at the uORF termination codon, induction of the
nonsense-mediated decay (NMD) pathway, and failure of
the ribosome to re-initiate at the genic translation start site
after disengaging from the uORF (Gaba et al. 2001).
Upstream ORFs that have been experimentally tested through
cell-free translation assays or other means have been found
to decrease the rate of translation up to 20-fold (Hinnebusch
2005), although some uORFs appear to have little impact,
or a variable impact, on translation rates (e.g., Wang and
Rothnagel 2004).
In accordance with the scanning model of translation
initiation (Kozak 1994), it has been suggested that some
uORFs may be conserved to prevent deleterious premature
translation initiation from upstream AUG (uAUG) triplets
(Iacono, Mignone, and Pesole 2005; Lynch, Scofield, and
Hong 2005; Lynch 2006). Premature translation initiation
leading to genic read-through would, at best, add
extraneous peptides to the N-terminus of the encoded protein if the
uAUG were in the same reading frame as the genic ORF,
and, at worst, it would create a frameshift-induced nonsense
mutation and entirely eliminate translation of the genic
ORF. In this latter case, even if the uORF decreases the
translation rate of the adjacent genic sequence, the
phenotypic effect may be less severe than premature translation
initiation, which results in the ribosomes reading through
the genic translation start site. This hypothesis is supported
by the observation that uAUGs are significantly
underrepres (...truncated)