Genome-wide BAC-end sequencing of Cucumis melo using two BAC libraries
BMC Genomics
Genome-wide BAC-end sequencing of Cucumis melo using two BAC libraries
Vctor M Gonzlez 3
Luis Rodrguez-Moreno 0 2
Emilio Centeno 3
Andrej Benjak 1
Jordi Garcia-Mas 1
Pere Puigdomnech 3
Miguel A Aranda 0 2
0 Departamento de Biologia del Estres y Patologia Vegetal, Centro de Edafologia y Biologia Aplicada del Segura (CEBAS) - CSIC, Apdo. correos 164, 30100 Espinardo (Murcia) , Spain
1 IRTA, Center for Research in Agricultural Genomics CRAG (CSIC-IRTA-UAB) , Carretera de Cabrils Km 2, 08348 (Barcelona) , Spain
2 Departamento de Biologia del Estres y Patologia Vegetal, Centro de Edafologia y Biologia Aplicada del Segura (CEBAS) - CSIC, Apdo. correos 164, 30100 Espinardo (Murcia) , Spain
3 Molecular Genetics Department, Center for Research in Agricultural Genomics CRAG (CSIC-IRTA-UAB) , Jordi Girona, 18-26, 08034 Barcelona , Spain
Background: Although melon (Cucumis melo L.) is an economically important fruit crop, no genome-wide sequence information is openly available at the current time. We therefore sequenced BAC-ends representing a total of 33,024 clones, half of them from a previously described melon BAC library generated with restriction endonucleases and the remainder from a new random-shear BAC library. Results: We generated a total of 47,140 high-quality BAC-end sequences (BES), 91.7% of which were paired-BES. Both libraries were assembled independently and then cross-assembled to obtain a final set of 33,372 nonredundant, high-quality sequences. These were grouped into 6,411 contigs (4.5 Mb) and 26,961 non-assembled BES (14.4 Mb), representing ~4.2% of the melon genome. The sequences were used to screen genomic databases, identifying 7,198 simple sequence repeats (corresponding to one microsatellite every 2.6 kb) and 2,484 additional repeats of which 95.9% represented transposable elements. The sequences were also used to screen expressed sequence tag (EST) databases, revealing 11,372 BES that were homologous to ESTs. This suggests that ~30% of the melon genome consists of coding DNA. We observed regions of microsynteny between melon paired-BES and six other dicotyledonous plant genomes. Conclusion: The analysis of nearly 50,000 BES from two complementary genomic libraries covered ~4.2% of the melon genome, providing insight into properties such as microsatellite and transposable element distribution, and the percentage of coding DNA. The observed synteny between melon paired-BES and six other plant genomes showed that useful comparative genomic data can be derived through large scale BAC-end sequencing by anchoring a small proportion of the melon genome to other sequenced genomes.
-
Background
Melon (Cucumis melo L.) is an important horticultural
crop grown in temperate, subtropical and tropical
regions worldwide. More than 25 million tonnes of fruit
were produced in 2007, 64.5% in Asia, 14.6% in Europe,
13.1% in America and 7.8% in Africa [1]. Melon belongs
to the Cucurbitaceae family, which comprises 90 genera
and ~750 species, including other fruit crops such as
watermelon (Citrullus lanatus (Thunb.) Matsum &
Nakai), cucumber (Cucumis sativus L.), squash and
pumpkin (Cucurbita spp.). Genetically, melon is a
diploid species (2 = 2n = 24) with an estimated
genome size of 454 Mb [2]. Transgenic melons, first
produced in 1990, can now be generated in a range of
recalcitrant cultivars [3,4]. Melon fruits are
morphologically and biochemically diverse, which makes them
particularly suitable for research into the flavor and texture
changes that occur during ripening [5].
Despite its economic importance, there are few genomic
resources for melon. As of January 2010, 126,940
highquality expressed sequence tags (ESTs) and 23,762
unigenes were available in public databases [6,7], which is low
when compared to the 298,123 ESTs available for tomato
(Solanum lycopersicum L.) and the 1,249,110 ESTs
available for rice (Oryza sativa L.) [8]. More recent efforts to
increase the availability of genetic and genomic resources
for melon [9] have included the construction of bacterial
artificial chromosome (BAC) libraries [10,11], the
development of oligo-based microarrays [12,13], the
production of TILLING and EcoTILLING platforms [14,15]
and the development of a collection of near isogenic lines
(NILs) [16]. However, the integration of genetic and
physical maps is a necessary first step towards sequencing the
melon genome, identifying relevant genes using these to
discover how economically important aspects of fruit
development are controlled [17,18].
Over the last 15 years, several melon genetic maps have
been constructed primarily using randomly amplified
polymorphic DNAs (RAPDs), restriction fragment length
polymorphisms (RFLPs), amplified fragment length
polymorphisms (AFLPs) and simple sequence repeats (SSRs)
[19-24]. These maps have helped to pinpoint the loci of
some important agronomic traits [25-27], but they are
sparsely populated and the different markers make them
difficult to compare. To address this issue, a genetic map
has recently been constructed by merging several of
those previous genetic maps [6]. In addition, a melon
physical map representing 0.9 melon genomic
equivalents has recently been constructed using both a BAC
library and a genetic map previously developed in our
laboratory [28]. The physical and genetic maps have been
integrated by anchoring 175 genetic markers to the
physical map, allowing contigs representing 12% of the
melon genome to be anchored to known genetic loci.
It is important to obtain an accurate, representative
sample of the genome ahead of full genome sequencing
and annotation, and the end-sequencing of large
numbers of BAC clones is an efficient strategy to achieve
this goal. BAC-end sequences (BES) generate accurate
but inexpensive genome samples that give a first
impression of properties such as GC content, the distribution
of microsatellites and transposable elements, and the
amount of coding DNA [29-31]. However, most BAC
libraries are constructed by digesting DNA with one or
more restriction endonucleases, which introduces a
partial bias in coverage because the target sites are
distributed in a non-random manner [32]. We therefore
sequenced BAC-ends representing 33,024 clones, half
from a previously described BAC library generated using
restriction endonucleases, but the remainder from a
freshly-prepared random shear BAC library to eliminate
the possibility of bias. We obtained 47,140 high-quality
BES, which were analyzed for GC content,
microsatellites, repeat elements and coding regions. A total of
43,224 paired BES were mapped independently onto six
sequenced genomes from other dicotyledonous plant
species to identify regions of microsynteny.
Results and discussion
BAC libraries
Two BAC libraries from the double-haploid melon line
PIT92 were used for BAC-end sequencing (Table 1).
Table 1 Genomic C. melo BAC libraries
1Non-empty BAC clones containing melon genomic DNA
2Based on an estimated haploid genom (...truncated)