Identification, characterization and distribution of transposable elements in the flax (Linum usitatissimum L.) genome

BMC Genomics, Nov 2012

Background Flax (Linum usitatissimum L.) is an important crop for the production of bioproducts derived from its seed and stem fiber. Transposable elements (TEs) are widespread in plant genomes and are a key component of their evolution. The availability of a genome assembly of flax (Linum usitatissimum) affords new opportunities to explore the diversity of TEs and their relationship to genes and gene expression. Results Four de novo repeat identification algorithms (PILER, RepeatScout, LTR_finder and LTR_STRUC) were applied to the flax genome assembly. The resulting library of flax repeats was combined with the RepBase Viridiplantae division and used with RepeatMasker to identify TEs coverage in the genome. LTR retrotransposons were the most abundant TEs (17.2% genome coverage), followed by Long Interspersed Nuclear Element (LINE) retrotransposons (2.10%) and Mutator DNA transposons (1.99%). Comparison of putative flax TEs to flax transcript databases indicated that TEs are not highly expressed in flax. However, the presence of recent insertions, defined by 100% intra-element LTR similarity, provided evidence for recent TE activity. Spatial analysis showed TE-rich regions, gene-rich regions as well as regions with similar genes and TE density. Monte Carlo simulations for the 71 largest scaffolds (≥ 1 Mb each) did not show any regional differences in the frequency of TE overlap with gene coding sequences. However, differences between TE superfamilies were found in their proximity to genes. Genes within TE-rich regions also appeared to have lower transcript expression, based on EST abundance. When LTR elements were compared, Copia showed more diversity, recent insertions and conserved domains than the Gypsy, demonstrating their importance in genome evolution. Conclusions The calculated 23.06% TE coverage of the flax WGS assembly is at the low end of the range of TE coverages reported in other eudicots, although this estimate does not include TEs likely found in unassembled repetitive regions of the genome. Since enrichment for TEs in genomic regions was associated with reduced expression of neighbouring genes, and many members of the Copia LTR superfamily are inserted close to coding regions, we suggest Copia elements have a greater influence on recent flax genome evolution while Gypsy elements have become residual and highly mutated.

Article PDF cannot be displayed. You can download it here:

http://www.biomedcentral.com/content/pdf/1471-2164-13-644.pdf

Identification, characterization and distribution of transposable elements in the flax (Linum usitatissimum L.) genome

BMC Genomics Identification, characterization and distribution of transposable elements in the flax (Linum usitatissimum L.) genome Leonardo Galindo Gonzlez 0 1 Michael K Deyholos 0 1 0 Department of Biological Sciences, University of Alberta , Edmonton, AB T6G 2E9 , Canada 1 Authors' information LGG: Department of Biological Sciences, University of Alberta , Edmonton , AB Canada T6G 2E9. Centennial Centre for Interdisciplinary Science (CCIS), 5- 114.MKD: Department of Biological Sciences, University of Alberta , Edmonton , AB Canada T6G 2E9. Centennial Centre for Interdisciplinary Science (CCIS) , 5-114 Background: Flax (Linum usitatissimum L.) is an important crop for the production of bioproducts derived from its seed and stem fiber. Transposable elements (TEs) are widespread in plant genomes and are a key component of their evolution. The availability of a genome assembly of flax (Linum usitatissimum) affords new opportunities to explore the diversity of TEs and their relationship to genes and gene expression. Results: Four de novo repeat identification algorithms (PILER, RepeatScout, LTR_finder and LTR_STRUC) were applied to the flax genome assembly. The resulting library of flax repeats was combined with the RepBase Viridiplantae division and used with RepeatMasker to identify TEs coverage in the genome. LTR retrotransposons were the most abundant TEs (17.2% genome coverage), followed by Long Interspersed Nuclear Element (LINE) retrotransposons (2.10%) and Mutator DNA transposons (1.99%). Comparison of putative flax TEs to flax transcript databases indicated that TEs are not highly expressed in flax. However, the presence of recent insertions, defined by 100% intra-element LTR similarity, provided evidence for recent TE activity. Spatial analysis showed TE-rich regions, gene-rich regions as well as regions with similar genes and TE density. Monte Carlo simulations for the 71 largest scaffolds ( 1 Mb each) did not show any regional differences in the frequency of TE overlap with gene coding sequences. However, differences between TE superfamilies were found in their proximity to genes. Genes within TE-rich regions also appeared to have lower transcript expression, based on EST abundance. When LTR elements were compared, Copia showed more diversity, recent insertions and conserved domains than the Gypsy, demonstrating their importance in genome evolution. Conclusions: The calculated 23.06% TE coverage of the flax WGS assembly is at the low end of the range of TE coverages reported in other eudicots, although this estimate does not include TEs likely found in unassembled repetitive regions of the genome. Since enrichment for TEs in genomic regions was associated with reduced expression of neighbouring genes, and many members of the Copia LTR superfamily are inserted close to coding regions, we suggest Copia elements have a greater influence on recent flax genome evolution while Gypsy elements have become residual and highly mutated. Transposable elements; Flax; Genome evolution; LTR elements; Gene expression - Background Transposable elements (TEs) influence the evolution, structure, amplification, gene creation, mutation and transcriptional regulation of genes and genomes [1-6]. They are also useful as genetic markers in basic and applied science [7,8]. TEs occupy a substantial fraction of sequenced plant genomes [9], ranging from over 14% in Arabidopsis [10] to more than 80% in maize [11]. Because of their nature and characteristic patterns of insertion [12], TEs may influence large portions of the genome. A study found that one-sixth of all rice genes had some kind of association with TEs [13]. Some TE insertions occur within or near genes, thereby disrupting normal gene expression [12]. Such insertions may influence phenotypic characteristics, as in petal color of gentians [14], or disruption of vitamin E synthesis in sunflower [15]. However, due to gene redundancy or to insertion in regions of the genome that do not affect gene expression, the majority of TE insertions do not have detectable effects on morphology or physiology. For example, neither the insertion of a Stowaway element in an intron of the manganese superoxide dismutase gene [16], nor the insertion of retrotransposon Vine-1 in one member of the alcohol dehydrogenase multigene family [17] affected plant growth and development. Nevertheless, TEs can influence the evolution of plant gene families, as exemplified by disease resistance genes in several plants [18]. Insertions can also result in the capture of gene fragments by TEs, or the adoption of parts of TEs by genes. Some of the clearest examples of gene capture by TEs involve Pack-MULEs. In rice, over 3000 of these gene-carrying transposon-derived elements were found in 440 Mb of sequence [19], and the acquisition of multiple gene fragments from multiple loci may result in the creation of new genes [20]. Genes such as FAR1 and FHY3 (involved in the phytochrome signalling pathway), have a conserved transposase-derived region, whose DNA binding and regulatory capacities have been adopted for transcriptional control of downstream genes [21,22]. As was first shown by McClintock in the early experiments that uncovered the Ac/Ds TE system in maize [23-26], some types of stress can activate TEs, which can in turn modify gene expression. TE expression triggered by stress has been reported for several elements including: Tnt1 [27,28] and Tto1 [29,30] in tobacco; Tos17 in rice [31,32]; and BARE-1 in barley [33]. However, relatively few active TEs have been identified and several expression studies indicate that transcription and transposition are rare for most elements [12]. While some studies have focused on the expression of individual elements, more recent approaches have compared genome-wide expression data of TEs. These kind of studies have been used to identify TE cassettes in expressed genes in coffee species [34] and Arabidopsis [35], and the activity of different TE families in maize [36] and sugarcane [37]. Flax (L. usitatissimum) is one of over 270 species within the family Linaceae, and is a member of the order Malpighiales along with three other species with published whole genome sequences: poplar (Populus trichocarpa), cassava (Manihot esculenta), and castor (Ricinus communis) [38]. Flax is a predominantly self-polinating annual crop grown in temperate regions [39]. Distinct varieties of flax are cultivated for either seed (i.e. linseed) or bast fibers. We recently reported a whole genome shotgun (WGS) assembly of a linseed variety, CDC Bethune [40]. The assembly contains 302Mb of the estimated 373Mb nuclear genome, in scaffolds with N50=694kb. Flax is considered a diploid (2n=2x=30), although our genome analysis pointed to a recent whole genome duplication 5-9MYa. Flax appears to have originated from its wild relative, L. bienne, with cultivation and domestication probably starting in the Mesopotamian valleys between 8000 10000 (...truncated)


This is a preview of a remote PDF: http://www.biomedcentral.com/content/pdf/1471-2164-13-644.pdf
Article home page: http://www.biomedcentral.com/1471-2164/13/644

Leonardo Galindo González, Michael K Deyholos. Identification, characterization and distribution of transposable elements in the flax (Linum usitatissimum L.) genome, BMC Genomics, 2012, pp. 644, 13, DOI: 10.1186/1471-2164-13-644