SNP Discovery by Illumina-Based Transcriptome Sequencing of the Olive and the Genetic Characterization of Turkish Olive Genotypes Revealed by AFLP, SSR and SNP Markers
SSR and SNP Markers. PLoS ONE 8(9): e73674. doi:10.1371/journal.pone.0073674
SNP Discovery by Illumina-Based Transcriptome Sequencing of the Olive and the Genetic Characterization of Turkish Olive Genotypes Revealed by AFLP, SSR and SNP Markers
Hilal Betul Kaya 0
Oznur Cetin 0
Hulya Kaya 0
Mustafa Sahin 0
Filiz Sefer 0
Abdullah Kahraman 0
Bahattin Tanyolac 0
Qiong Wu, Harbin Institute of Technology, China
0 1 Department of Bioengineering, Ege University , Izmir , Turkey , 2 Olive Research Station , Izmir , Turkey , 3 Department of Field Crops, Harran University , S. Urfa , Turkey
Background: The olive tree (Olea europaea L.) is a diploid (2n = 2x = 46) outcrossing species mainly grown in the Mediterranean area, where it is the most important oil-producing crop. Because of its economic, cultural and ecological importance, various DNA markers have been used in the olive to characterize and elucidate homonyms, synonyms and unknown accessions. However, a comprehensive characterization and a full sequence of its transcriptome are unavailable, leading to the importance of an efficient large-scale single nucleotide polymorphism (SNP) discovery in olive. The objectives of this study were (1) to discover olive SNPs using next-generation sequencing and to identify SNP primers for cultivar identification and (2) to characterize 96 olive genotypes originating from different regions of Turkey. Methodology/Principal Findings: Next-generation sequencing technology was used with five distinct olive genotypes and generated cDNA, producing 126,542,413 reads using an Illumina Genome Analyzer IIx. Following quality and size trimming, the high-quality reads were assembled into 22,052 contigs with an average length of 1,321 bases and 45 singletons. The SNPs were filtered and 2,987 high-quality putative SNP primers were identified. The assembled sequences and singletons were subjected to BLAST similarity searches and annotated with a Gene Ontology identifier. To identify the 96 olive genotypes, these SNP primers were applied to the genotypes in combination with amplified fragment length polymorphism (AFLP) and simple sequence repeats (SSR) markers. Conclusions/Significance: This study marks the highest number of SNP markers discovered to date from olive genotypes using transcriptome sequencing. The developed SNP markers will provide a useful source for molecular genetic studies, such as genetic diversity and characterization, high density quantitative trait locus (QTL) analysis, association mapping and map-based gene cloning in the olive. High levels of genetic variation among Turkish olive genotypes revealed by SNPs, AFLPs and SSRs allowed us to characterize the Turkish olive genotype.
-
Funding: This manuscript was funded by Turkish Technical and Research Council with the project number of 108G096. The funders had no role in study design,
data collection and analysis, decision to publish, or preparation of the manuscript.
Competing Interests: The authors have declared that no competing interests exist.
The olive tree (Olea europaea L. subsp. europaea var. europaea,
Oleaceae) is one of the most ancient and important Mediterranean
long-lived fruit species [1]. It is a diploid (2n = 2x = 46) outcrossing
species mainly grown in the Mediterranean basin with a very wide
genetic patrimony [2]. This wide genetic patrimony is represented
by more than 1200 cultivars [3]. Olive oil and table olives are very
important components in the Mediterranean diet [4]. Several
studies have emphasized the beneficial effects of table olives [4]
and olive oil on human health [5].The leading olive-producing
countries of the world are Spain, Italy, Greece and Morocco.
According to statistics provided by the Food and Agriculture
Organization (FAO), Turkey ranks as the fifth largest olive
producer in the world, with production hovering approximately
1.415 million tons of fruit in 2010 [6].
The sequencing and analysis of transcriptomes has been
considered an efficient approach for gene expression profiling,
alternative splicing, SNP discovery, mapping and quantification of
transcriptomes in plants, especially in species without a reference
genome sequence [7,8]. The Sanger sequencing of ESTs used to
be the most common approach for SNP discovery to obtain the
expressed sequence tags (ESTs) information. Over the past 10
years, the sequencing of ESTs using traditional techniques were
used in several important species [9]. However, Sanger sequencing
requires expensive and time-consuming approaches, including
cDNA library construction and the cloning of DNA fragments
[10]. Alternatively, a transcriptome analysis based on
nextgeneration sequencing (NGS) is more attractive in identifying a
transcriptome sequence dataset for marker development and gene
discovery due to its lower cost per base pair of DNA, short time
requirement and lack of a subcloning process [11].
Nextgeneration transcriptome sequencing has created transcriptome
databases in various plants without a sequenced genome, including
chickpea [12], wheat [13], Eucalyptus pilularis [14], carrot [15],
mangroves [16], strawberry [17] and chestnut [18]. Additionally,
the discovery of SNP markers using NGS technologies permits the
identification of thousands of markers from entire genomes or
from cDNA [19], which can be used for genetic diversity analyses
[20], association mapping [21,22], linkage mapping [23] and
marker-assisted selection [24] studies.
Various platforms utilizing NGS, such as the Roche 454
Genome Sequencer, the Illumina Genome Analyzer and the Life
Technologies SOLiD System, can produce massive sequence
outputs, making high-throughput DNA marker discovery feasible
and cost-effective [25,26]. There are various advantages and
limitations among the various NGS platforms, which vary in terms
of sensitivity, accuracy, reproducibility and throughput. Among
these platforms, Illumina sequencing technology, which generates
large-scale reads (75150 bp) at low costs with very high
sequencing coverage, has been especially useful for de novo
transcriptome studies [2527].
A large number of accessions are currently available in
oliveproducing countries, raising several problems for germplasm
management and preservation [28]. The evaluation and
identification of olive genetic resources is therefore crucial, especially
estimating the genetic variation in the existing germplasm,
particularly due to the high occurrence of mislabeling, synonyms
and homonyms in the olive.
Genetic identification is the first key step in breeding programs,
and molecular markers are valuable tools for identifying and
characterizing diverse genotypes [29]. Currently, with the large
array of DNA molecular marker types available, DNA markers
provide useful information in theoretical and applied research
fields for olive breeding, such as the determination of genetic
diversity, genetic relationships [30] and population structures
among cultivated species and their wild relatives [31, (...truncated)