Extreme Reconfiguration of Plastid Genomes in the Angiosperm Family Geraniaceae: Rearrangements, Repeats, and Codon Usage
Extreme Reconfiguration of Plastid Genomes in the
Angiosperm Family Geraniaceae: Rearrangements, Repeats,
and Codon Usage
Mary M. Guisinger,*,1,2 Jennifer V. Kuehl,3 Jeffrey L. Boore,3,4,5 and Robert K. Jansen1
1
Section of Integrative Biology, University of Texas, Austin
Department of Plant Microbial Biology, University of California, Berkeley
3
DOE Joint Genome Institute and Lawrence Berkeley National Laboratory, Walnut Creek, California
4
Genome Project Solutions, Hercules, California
5
Department of Integrative Biology, University of California, Berkeley
*Corresponding author: E-mail: .
Associate editor: Charles Delwiche
2
Geraniaceae plastid genomes (plastomes) have experienced a remarkable number of genomic changes. The plastomes of
Erodium texanum, Geranium palmatum, and Monsonia speciosa were sequenced and compared with other rosids and the
previously published Pelargonium hortorum plastome. Geraniaceae plastomes were found to be highly variable in size, gene
content and order, repetitive DNA, and codon usage. Several unique plastome rearrangements include the disruption of
two highly conserved operons (S10 and rps2-atpA), and the inverted repeat (IR) region in M. speciosa does not contain all
genes in the ribosomal RNA operon. The sequence of M. speciosa is unusually small (128,787 bp); among angiosperm
plastomes sequenced to date, only those of nonphotosynthetic species and those that have lost one IR copy are smaller. In
contrast, the plastome of P. hortorum is the largest, at 217,942 bp. These genomes have experienced numerous gene and
intron losses and partial and complete gene duplications. Some of the losses are shared throughout the family (e.g., trnTGGU and the introns of rps16 and rpl16); however, other losses are homoplasious (e.g., trnG-UCC intron in G. palmatum
and M. speciosa). IR length is also highly variable. The IR in P. hortorum was previously shown to be greatly expanded to 76
kb, and the IR is lost in E. texanum and reduced in G. palmatum (11 kb) and M. speciosa (7 kb). Geraniaceae plastomes
contain a high frequency of large repeats (.100 bp) relative to other rosids. Within each plastome, repeats are often
located at rearrangement end points and many repeats shared among the four Geraniaceae flank rearrangement end
points. GC content is elevated in the genomes and also in coding regions relative to other rosids. Codon usage per amino
acid and GC content at third position sites are significantly different for Geraniaceae protein-coding sequences relative to
other rosids. Our findings suggest that relaxed selection and/or mutational biases lead to increased GC content, and this in
turn altered codon usage. We propose that increases in genomic rearrangements, repetitive DNA, nucleotide substitutions,
and GC content may be caused by relaxed selection resulting from improper DNA repair.
Key words: plastid genomics, molecular evolution, Geraniaceae, Erodium, Geranium, Monsonia.
Introduction
Comparisons among the approximate 130 land plant plastid genomes (plastomes) available on GenBank show that
genome size, gene content, gene order, and rates of sequence evolution are generally conserved. Most have
a quadripartite structure with two copies of a large inverted
repeat (IR) separating two unequally sized single-copy regions, termed the large and small single-copy regions. Land
plant plastomes generally range in size from 108 to 165 kb
and usually contain 110–130 distinct genes (reviewed in
Raubeson and Jansen 2005; Bock 2007). The majority of
these genes (about 80) code for proteins and are mostly
involved in photosynthesis or gene expression with the
remainder being transfer RNA (tRNA) (about 30) or
ribosomal RNA (rRNA) (4) genes.
GC content is also highly conserved in the plastomes
of land plants and is typically in the range of 30–40%,
with GC content being lower in noncoding intergenic
regions than in coding regions (reviewed in Bock 2007).
The strong AT bias is reflected in codon usage, where
an A or T is preferred in the third position of synonymous
codons (Shimada and Sugiura 1991). Raubeson et al. (2007)
examined GC content and codon bias in two early
diverging land plants, Nuphar and Ranunculus, and found
that GC content could not explain codon usage patterns.
Raubeson et al. (2007) suggested that an error checking
bias of the plastid DNA polymerase and/or efficiency for
DNA denaturation during replication or transcription
likely affect GC content in plastomes. On the other hand,
strong evidence from nematode nuclear genomes shows
that GC content influences both codon usage and
amino acid composition and that GC content is probably
driven by directional mutation pressure (Mitreva et al.
2006).
© The Author 2010. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please
e-mail:
Mol. Biol. Evol. 28(1):583–600. 2011 doi:10.1093/molbev/msq229
Advance Access publication August 30, 2010
583
Downloaded fromarticle
https://academic.oup.com/mbe/article/28/1/583/984367 by guest on 07 June 2024
Research
Abstract
MBE
Guisinger et al. · doi:10.1093/molbev/msq229
584
plastid-encoded NADH dehydrogenase (ndh) genes was
suggested for Erodium chrysanthum (Guisinger et al. 2008).
The unusual features exhibited by Geraniaceae organellar genomes make this an ideal family to study plastome
evolution. Aside from data gathered from restriction site
mapping studies (Palmer et al. 1987; Price et al. 1990)
and from the complete plastome sequence of P. hortorum
(Chumley et al. 2006), relatively little is known about the
extent of genomic change throughout the Geraniaceae.
The goals of the current study were to 1) characterize plastomes from the other major lineages in the family, 2) compare and contrast genome size and gene content in
Geraniaceae plastomes relative to each other and to other
representative rosids, 3) examine the extent of repetitive
DNA in Geraniaceae plastomes with an emphasis on the
role that repeats might play in genome rearrangement,
and 4) characterize codon and tRNA use in Geraniaceae
plastomes relative to other rosids. The last goal is particularly relevant given the loss of the tRNA gene trnT-GGU in
P. hortorum (Palmer et al. 1987; Chumley et al. 2006).
Materials and Methods
Taxon Sampling and Sample Preparation
Based on previous phylogenies of the family (Price and
Palmer 1993; Parkinson et al. 2005; Guisinger et al.
2008), taxa were chosen from each additional major lineage
in Geraniaceae. The family is comprised of approximately
800 species and 5 genera, namely, Erodium, the monotypic genus California (formerly in Erodium), Geranium,
Monsonia (circumscribed with Sarcocaulon; Albers 1996),
and Pelargonium. The sequence of P. hortorum was previously published (Chumley et al. 2006). Plant material from
Erodium texanum, Geranium palmatum, and Monsonia
speciosa was used, and protocols for plastid isolations
are previously described (Jansen (...truncated)