Identification, characterization and distribution of transposable elements in the flax (Linum usitatissimum L.) genome
BMC Genomics
Identification, characterization and distribution of transposable elements in the flax (Linum usitatissimum L.) genome
Leonardo Galindo Gonzlez 0 1
Michael K Deyholos 0 1
0 Department of Biological Sciences, University of Alberta , Edmonton, AB T6G 2E9 , Canada
1 Authors' information LGG: Department of Biological Sciences, University of Alberta , Edmonton , AB Canada T6G 2E9. Centennial Centre for Interdisciplinary Science (CCIS), 5- 114.MKD: Department of Biological Sciences, University of Alberta , Edmonton , AB Canada T6G 2E9. Centennial Centre for Interdisciplinary Science (CCIS) , 5-114
Background: Flax (Linum usitatissimum L.) is an important crop for the production of bioproducts derived from its seed and stem fiber. Transposable elements (TEs) are widespread in plant genomes and are a key component of their evolution. The availability of a genome assembly of flax (Linum usitatissimum) affords new opportunities to explore the diversity of TEs and their relationship to genes and gene expression. Results: Four de novo repeat identification algorithms (PILER, RepeatScout, LTR_finder and LTR_STRUC) were applied to the flax genome assembly. The resulting library of flax repeats was combined with the RepBase Viridiplantae division and used with RepeatMasker to identify TEs coverage in the genome. LTR retrotransposons were the most abundant TEs (17.2% genome coverage), followed by Long Interspersed Nuclear Element (LINE) retrotransposons (2.10%) and Mutator DNA transposons (1.99%). Comparison of putative flax TEs to flax transcript databases indicated that TEs are not highly expressed in flax. However, the presence of recent insertions, defined by 100% intra-element LTR similarity, provided evidence for recent TE activity. Spatial analysis showed TE-rich regions, gene-rich regions as well as regions with similar genes and TE density. Monte Carlo simulations for the 71 largest scaffolds ( 1 Mb each) did not show any regional differences in the frequency of TE overlap with gene coding sequences. However, differences between TE superfamilies were found in their proximity to genes. Genes within TE-rich regions also appeared to have lower transcript expression, based on EST abundance. When LTR elements were compared, Copia showed more diversity, recent insertions and conserved domains than the Gypsy, demonstrating their importance in genome evolution. Conclusions: The calculated 23.06% TE coverage of the flax WGS assembly is at the low end of the range of TE coverages reported in other eudicots, although this estimate does not include TEs likely found in unassembled repetitive regions of the genome. Since enrichment for TEs in genomic regions was associated with reduced expression of neighbouring genes, and many members of the Copia LTR superfamily are inserted close to coding regions, we suggest Copia elements have a greater influence on recent flax genome evolution while Gypsy elements have become residual and highly mutated.
Transposable elements; Flax; Genome evolution; LTR elements; Gene expression
-
Background
Transposable elements (TEs) influence the evolution,
structure, amplification, gene creation, mutation and
transcriptional regulation of genes and genomes [1-6].
They are also useful as genetic markers in basic and
applied science [7,8]. TEs occupy a substantial fraction of
sequenced plant genomes [9], ranging from over 14% in
Arabidopsis [10] to more than 80% in maize [11].
Because of their nature and characteristic patterns of
insertion [12], TEs may influence large portions of the
genome. A study found that one-sixth of all rice genes
had some kind of association with TEs [13]. Some TE
insertions occur within or near genes, thereby disrupting
normal gene expression [12]. Such insertions may
influence phenotypic characteristics, as in petal color of
gentians [14], or disruption of vitamin E synthesis in
sunflower [15]. However, due to gene redundancy or to
insertion in regions of the genome that do not affect
gene expression, the majority of TE insertions do not
have detectable effects on morphology or physiology.
For example, neither the insertion of a Stowaway
element in an intron of the manganese superoxide dismutase
gene [16], nor the insertion of retrotransposon Vine-1 in
one member of the alcohol dehydrogenase multigene
family [17] affected plant growth and development.
Nevertheless, TEs can influence the evolution of plant
gene families, as exemplified by disease resistance genes
in several plants [18]. Insertions can also result in the
capture of gene fragments by TEs, or the adoption of
parts of TEs by genes. Some of the clearest examples of
gene capture by TEs involve Pack-MULEs. In rice, over
3000 of these gene-carrying transposon-derived elements
were found in 440 Mb of sequence [19], and the
acquisition of multiple gene fragments from multiple loci may
result in the creation of new genes [20]. Genes such as
FAR1 and FHY3 (involved in the phytochrome signalling
pathway), have a conserved transposase-derived region,
whose DNA binding and regulatory capacities have been
adopted for transcriptional control of downstream genes
[21,22]. As was first shown by McClintock in the early
experiments that uncovered the Ac/Ds TE system in
maize [23-26], some types of stress can activate TEs,
which can in turn modify gene expression. TE
expression triggered by stress has been reported for several
elements including: Tnt1 [27,28] and Tto1 [29,30] in
tobacco; Tos17 in rice [31,32]; and BARE-1 in barley
[33]. However, relatively few active TEs have been
identified and several expression studies indicate that
transcription and transposition are rare for most elements
[12]. While some studies have focused on the expression
of individual elements, more recent approaches have
compared genome-wide expression data of TEs. These
kind of studies have been used to identify TE cassettes
in expressed genes in coffee species [34] and Arabidopsis
[35], and the activity of different TE families in maize
[36] and sugarcane [37]. Flax (L. usitatissimum) is one of
over 270 species within the family Linaceae, and is a
member of the order Malpighiales along with three
other species with published whole genome sequences:
poplar (Populus trichocarpa), cassava (Manihot
esculenta), and castor (Ricinus communis) [38]. Flax is a
predominantly self-polinating annual crop grown in
temperate regions [39]. Distinct varieties of flax are
cultivated for either seed (i.e. linseed) or bast fibers. We
recently reported a whole genome shotgun (WGS)
assembly of a linseed variety, CDC Bethune [40]. The
assembly contains 302Mb of the estimated 373Mb nuclear
genome, in scaffolds with N50=694kb. Flax is considered
a diploid (2n=2x=30), although our genome analysis
pointed to a recent whole genome duplication 5-9MYa.
Flax appears to have originated from its wild relative, L.
bienne, with cultivation and domestication probably
starting in the Mesopotamian valleys between 8000
10000 (...truncated)