The complexity of Rhipicephalus (Boophilus) microplus genome characterised through detailed analysis of two BAC clones
Moolhuijzen et al. BMC Research Notes 2011, 4:254
http://www.biomedcentral.com/1756-0500/4/254
RESEARCH ARTICLE
Open Access
The complexity of Rhipicephalus (Boophilus)
microplus genome characterised through detailed
analysis of two BAC clones
Paula M Moolhuijzen1,2, Ala E Lew-Tabor1,2,3, Jess A T Morgan2,3, Manuel Rodriguez Valle2,3, Daniel G Peterson5,
Scot E Dowd6, Felix D Guerrero4, Matthew I Bellgard1* and Rudi Appels1
Abstract
Background: Rhipicephalus (Boophilus) microplus (Rmi) a major cattle ectoparasite and tick borne disease vector,
impacts on animal welfare and industry productivity. In arthropod research there is an absence of a complete
Chelicerate genome, which includes ticks, mites, spiders, scorpions and crustaceans. Model arthropod genomes
such as Drosophila and Anopheles are too taxonomically distant for a reference in tick genomic sequence analysis.
This study focuses on the de-novo assembly of two R. microplus BAC sequences from the understudied R microplus
genome. Based on available R. microplus sequenced resources and comparative analysis, tick genomic structure
and functional predictions identify complex gene structures and genomic targets expressed during tick-cattle
interaction.
Results: In our BAC analyses we have assembled, using the correct positioning of BAC end sequences and
transcript sequences, two challenging genomic regions. Cot DNA fractions compared to the BAC sequences
confirmed a highly repetitive BAC sequence BM-012-E08 and a low repetitive BAC sequence BM-005-G14 which
was gene rich and contained short interspersed elements (SINEs). Based directly on the BAC and Cot data
comparisons, the genome wide frequency of the SINE Ruka element was estimated. Using a conservative approach
to the assembly of the highly repetitive BM-012-E08, the sequence was de-convoluted into three repeat units, each
unit containing an 18S, 5.8S and 28S ribosomal RNA (rRNA) encoding gene sequence (rDNA), related internal
transcribed spacer and complex intergenic region.
In the low repetitive BM-005-G14, a novel gene complex was found between to 2 genes on the same strand.
Nested in the second intron of a large 9 Kb papilin gene was a helicase gene. This helicase overlapped in two
exonic regions with the papilin. Both these genes were shown expressed in different tick life stage important in
ectoparasite interaction with the host. Tick specific sequence differences were also determined for the papilin gene
and the protein binding sites of the 18S subunit in a comparison to Bos taurus.
Conclusion: In the absence of a sequenced reference genome we have assembled two complex BAC sequences,
characterised novel gene structure that was confirmed by gene expression and sequencing analyses. This is the
first report to provide evidence for 2 eukaryotic genes with exon regions that overlap on the same strand, the first
to describe Rhipicephalinae papilin, and the first to report the complete ribosomal DNA repeated unit sequence
structure for ticks. The Cot data estimation of genome wide sequence frequency means this research will underpin
future efforts for genome sequencing and assembly of the R. microplus genome.
* Correspondence:
1
Centre for Comparative Genomics, Murdoch University, South St., Perth,
Western Australia, 6150, Australia
Full list of author information is available at the end of the article
© 2011 Bellgard et al; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons
Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in
any medium, provided the original work is properly cited.
Moolhuijzen et al. BMC Research Notes 2011, 4:254
http://www.biomedcentral.com/1756-0500/4/254
Background
The cattle tick, Rhipicephalus (Boophilus) microplus
(Rmi), is one of the most economically important ticks
affecting the global cattle population [1]. Currently, Rmi
and its associated pathogens can be transmitted to cattle
and lead to severe agricultural losses in milk and beef
production and restrict the movement of livestock. The
most affected regions of the world are tropical and subtropical countries including northern Australia, Mexico,
South America and South Africa, with threats to USA
cattle populations at southern borders with Mexico [2].
The genome sizes of three species of ixodid ticks,
Ambylomma americanum [3], Boophilus (Rhipicephalus)
microplus and Ixodes scapularis (Isc) [4] have been estimated using Cot DNA reassociating kinetics, a procedure
also used to estimate repetitive DNA in genomes [4]. The
Rmi genome has an estimated size of 7.1 Gb, three times
the size of the Isc genome (2.3 Gb) [4,5]. The Rmi genome is found to be composed of foldback (FB), highly
repetitive (HR) and moderately repetitive (MR) elements,
in the following proportion 0.82% FB, 31% HR, 38% MR,
and 30% unique DNA, similar to Isc [4]. A short interspersed repetitive element (SINE) Ruka element, containing RNA polymerase III promoters, is major component
of eukaryotic genomes that are particularly abundant in
the heterochromatic compartment of vertebrates and
plants as reviewed Kidwell and Sunter [6,7]. SINE transposable elements have the ability to move to new locations based on reverse transcription prior to genomic
integration. Most SINEs are derived from tRNA [8],
although some, such as the Alu family which accounts
for approximately 10% of the human genome, are
thought to originate from 7SL RNA sequences [9]. It has
been shown in R. appendiculatus that secondary structure predictions indicate Ruka could adopt a tRNA structure similar to a serine tRNA [6].
The Isc Genome Project (IGP) [10,11], is the first tick
genome sequencing effort and currently a major resource
for tick comparative genomic analyses. This project has
influenced the rapid rise in the number of sequences for
tick DNA in NCBI [12]. The current Isc genome draft,
represented by 369,492 supercontigs, (1.7 Gb) of linear
genomic sequence was used in this analysis to identify
conservation with available Rmi genomic DNA.
To provide insights into the complexity of the tick
genome and that of specific BAC genes, the following
Rmi sequence resources were available for analysis. The
BmiGI Version 2 gene index [13] containing 13,643
non-redundant tentative consensus gene sequences. Rmi
Cot reassociating kinetics genomic sequence, that has
been demonstrated as a useful tool to explore the gene
space of large genome species [14]. A BAC end library,
created with the view to probe the Rmi genome for
Page 2 of 16
BAC sequencing [15]. A suppressive subtractive hybridization (SSH) to identify transcripts associated with host
attachment and/or feeding, which identified both a large
increase in rRNA transcripts thought to be associated
increase protein production during tick feeding, and the
production of a number of enzymes including serine
protease inhibitors (Serpins) [16]. The results f (...truncated)