Genomic Resources for Sea Lice: Analysis of ESTs and Mitochondrial Genomes
Stuart G. Jantzen
Kristian R. von Schalburg
Simon R. M. Jones
Ben F. Koop
S. R. M. Jones Pacific Biological Station
, Fisheries and Oceans Canada, 3190 Hammond Bay Road, Nanaimo, BC V9T 6N7,
F. Nilsen Department of Biology, University of Bergen
5020 Bergen, Norway
) Department of Biology, University of Victoria
, PO Box 3020 STN CSC, Victoria, BC V8W 3N5,
Present Address: M. Yasuike Aquatic Genomics Research Center, National Research Institute of Fisheries Science, Fisheries Research Agency
, 2-12-4 Fukuura, Kanazawa, Yokohama, Kanagawa 236-8648,
Sea lice are common parasites of both farmed and wild salmon. Salmon farming constitutes an important economic market in North America, South America, and Northern Europe. Infections with sea lice can result in significant production losses. A compilation of genomic information on different genera of sea lice is an important resource for understanding their biology as well as for the study of population genetics and control strategies. We report on over 150,000 expressed sequence tags (ESTs) from five different species (Pacific Lepeophtheirus salmonis (49,672 new ESTs in addition to 14,994 previously reported ESTs), Atlantic L. salmonis (57,349 ESTs), Caligus clemensi (14,821 ESTs), Caligus rogercresseyi (32,135 ESTs), and Lernaeocera branchialis (16,441 ESTs)). For each species, ESTs were assembled into complete or partial genes and annotated by comparisons to known proteins in public databases. In addition, whole mitochondrial (mt) genome sequences of C. clemensi (13,440 bp) and C. rogercresseyi (13,468 bp) were determined and compared to L. salmonis. Both nuclear and mtDNA genes show very high levels of sequence divergence between these ectoparastic copepods suggesting that the different species of sea lice have been in existence for 37-113 million years and that parasitic association with salmonids is also quite ancient. Our ESTs and mtDNA data provide a novel resource for the study of sea louse biology, population genetics, and control strategies. This genomic information provides the material basis for the development of a 38K sea louse microarray that can be used in conjunction with our existing 44K salmon microarray to study host-parasite interactions at the molecular level. This report represents the largest genomic resource for any copepod species to date.
Copepods (Copepoda) are a group of small crustaceans
found in various aquatic environments and they are
described as the most abundant metazoans on earth (Humes
1994). The subclass Copepoda consists of over 250
described families, 2,600 genera, and 21,000 described
species classified into ten orders (Walter and Boxshall
2008). Their life histories are diverse; planktonic and
benthic copepods are an important ecological link in the
aquatic food chain (Gee 1987; Ohman and Hirche 2001),
but approximately one third of marine copepod species live
as associates, commensals, or parasites on invertebrates and
fishes (Humes 1994).
Parasitic copepods are commonly found both on farmed
and wild marine finfish (Johnson and Fast 2004). They feed
on host mucus, epidermal cells, tissues, and blood, the
result of which causes physiological stress, immune
dysfunction, impairment of swimming ability, and possibly
death (Boxaspen 2006; Costello 2006; Johnson and Fast
2004; Tully and Nolan 2002). Members of the family
Caligidae, especially the genera Caligus and
Lepeophtheirus, are commonly referred to as sea lice (Costello 2006;
Johnson et al. 2004; Pike and Wadsworth 1999). They are
the most economically important parasites of the world
salmon farming industry and may cause direct and indirect
economic losses in the industry of 300 million (US
$480 million) annually (Costello 2009). In addition, there
is concern that salmon farms elevate the risk of sea lice
infections on wild salmon beyond that which naturally
occurs and lead to a depression in the abundance of wild
salmon stocks (Costello 2006; Heuch et al. 2005; Krkoek
et al. 2007a; Krkoek et al. 2007b; Todd et al. 2006).
In the North Atlantic Ocean, Lepeophtheirus salmonis and
Caligus elongatus account for the most serious infestations
of cultured and wild salmonids (Johnson et al. 2004; Pike
and Wadsworth 1999). In the eastern north Pacific Ocean, L.
salmonis and Caligus clemensi have been found on farmed
Atlantic salmon (Salmo salar) and wild Pacific salmon
(Oncorhynchus spp.; Beamish et al. 2009; Beamish et al.
2005; Saksida et al. 2007). While L. salmonis is prevalent in
both Atlantic and Pacific coasts, earlier studies suggested
that the Pacific and Atlantic populations of L. salmonis are
genetically distinct (Tjensvoll et al. 2006; Todd et al. 2004).
More recent genomic studies strongly suggest that distinct
species of L. salmonis exist in the Pacific and Atlantic
Oceans following a separation that occurred from 2.5 to
11 million years ago (Boulding et al. 2009; Yazawa et al.
2008). These parasites are referred to herein as the Pacific
and Atlantic forms of L. salmonis, respectively. In the
southern hemisphere, Caligus rogercresseyi is the dominant
species affecting salmonid aquaculture in Chile where the
parasites were found on farmed salmon in 99% of the
established cultured cages (Boxshall and Bravo 2000;
Carvajal et al. 1998).
Lepeophtheirus and Caligus species are distinguished
from each other based on morphological characters (Kabata
1979). The life cycle in L. salmonis has a total of ten
developmental stages, while C. elongatus and C.
rogercresseyi are similar but appear to lack pre-adult stages
(Piasecki and MacKinnon 1995; Gonzlez and Carvajal
2003). The host range of L. salmonis mainly includes
salmonids but the parasite has also been reported from
nonsalmonid hosts, including sticklebacks, that co-occur with
salmon (Jones et al. 2006). In contrast, some Caligus
species have a broad host range of salmonids and
nonsalmonids (Costello 2006; Johnson et al. 2004). Among its
salmonid hosts, L. salmonis displays clear preferences, with
heaviest infestations and greatest impacts occurring on
Atlantic salmon (S. salar) and sea trout (Onchorhynchus
trutta) followed by rainbow trout (Onchorhynchus mykiss),
chinook (Onchorhynchus tshawytscha), and coho salmon
(Onchorhynchus kisutch; Dawson et al. 1997; Fast et al.
2002; Johnson and Albright 1992). In contrast, C.
rogercresseyi occurs in higher numbers on caged rainbow trout
compared with Atlantic or coho salmon (Gonzlez et al.
2000). Thus, while L. salmonis and Caligus species exhibit
similar parasitic life history strategies, they display
considerable differences in morphology, life cycle, and host range.
Another parasite, Lernaeocera branchialis belongs to
the copepod family Pennellidae and is distantly related to
the caligid copepods, and this species is commonly found
on gadoids, particularly Atlantic cod (Gadus morhua) and
haddock (Melanogrammus aeglefinus) in the North Atlantic
Ocean and North Sea (Bricknell et al. 2006; Smith et al.
2007). This parasite has a negative impact on wild gadoids
and is a potentially serious pathogen of farmed Atlantic cod
(Smith et al. 2007). A compilation of genomic information
on parasitic copepods is an important tool for understanding
their biology as well as for the study of population genetics
and control strategies.
In this study, we report on over 150,000 expressed
sequence tags (ESTs) obtained from Pacific L. salmonis
(49,672 new ESTs in addition to 14,994 previously reported
ESTs), Atlantic L. salmonis (57,349 ESTs), C. clemensi
(14,821 ESTs), C. rogercresseyi (32,135 ESTs), and L.
branchialis (16,441 ESTs). These ESTs were assembled
into complete or partial genes and annotated by
comparisons to known proteins in public databases. In addition,
whole mitochondrial (mt) genome sequences of two
Caligus species, C. clemensi and C. rogercresseyi, were
determined and compared to each other and to L. salmonis.
These studies show high levels of sequence divergence in
nuclear and mtDNA genes. This report describes the
production and characteristics of the largest genomic
resource for copepods.
Materials and Methods
Specimens belonging to the Pacific (British Columbia,
Canada (BC)) and Atlantic forms of L. salmonis (Norway
and New Brunswick, Canada), C. clemensi (BC), C.
rogercresseyi (Chile), and L. branchialis (Scotland, UK)
were collected and stored at 80C or in RNAlater
(Invitrogen) until RNA extraction. Total RNA was extracted
from whole bodies (from various life stages and both sexes)
using TRIzol reagent (Invitrogen) and spin-column purified
using RNeasy Mini kits (Qiagen). The purified RNAs were
then quantified and quality checked by spectrophotometer
(NanoDrop Technologies) and agarose gel, respectively.
Approximately 1.03.0 g of total RNA was converted into
cDNA and normalized and was directionally cloned into
pAL 17.3 vector (Evrogen Co.).
Clones from each library were robotically arrayed in
384-well microtiter plates as detailed previously (Koop et
al. 2008). Plasmid DNAs were extracted and sequenced
on an ABI 3730 DNA analyzer (Applied Biosystems) with
M13 forward and M13 reverse primers (L. salmonis and
C. rogercresseyi) or with M13 forward and SP6 primers
(C. clemensi and L. branchialis). These sequence primers
are shown in supplemental Table 1. The resulting ESTs
were assembled with CAP3 (Huang and Madan 1999)
with default parameters. The assembled total contigs
(clusters + singletons) were annotated using RPS-BLAST
and BLASTX comparisons with the Conserved Domain
Database (CDD) and SwissProt (Bairoch and Apweiler
1996), respectively. The best BLAST match (E value
threshold of 1 E10) was used to identify contigs. Contigs
that did not meet this threshold were annotated as
Reference full-length cDNAs (FLcDNAs) were
identified as detailed previously (Leong et al. 2010). A single
clone containing an entire coding sequence (CDS) for a
gene product is considered a reference FLcDNA.
Complete Mitochondrial Genome Sequences of C. clemensi
and C. rogercresseyi
The total genomic DNAs were extracted from an adult male
C. clemensi and C. rogercresseyi as previously described
(Yazawa et al. 2008). A sample placed in 5% Chelex-100
resin (Sigma) solution (5% Chelex-100 resin, 0.2% SDS in
TE, with proteinase K (100 g/ml)) was incubated for
30 min at 55C, and the proteinase K was then inactivated
for 10 min at 90C. The sequence determination of the
complete C. rogercresseyi mt genome was carried out as
previously described (Yazawa et al. 2008). The PCR primer
sets that were used were designed for 15 fragments
(Supplemental Table 1) based on the EST sequences
encoding mtDNA. PCR amplification was performed using
1.0 l of extracted total genomic DNA of C. rogercresseyi
with an initial denaturation step of 2 min at 95C and then
30 cycles as follows: 30 s of denaturation at 95C, 30 s of
annealing at 55C, and 3 min of extension at 72C. PCR
products were cloned into pCR2.1 vector (TA Cloning Kit,
Invitrogen) with the manufacturers protocol, and each
positive PCR product was sequenced as described above.
Table 1 Sea lice EST project summary
a L. salmonis Pacific form
b L. salmonis Atlantic (Canada, Norway) form
c Number of clones which from at least one sequence (5 or 3) was obtained
d Number of 5 and 3 EST sequences obtained
e Twenty-eight thousand thirty-two clones and 49,672 sequences were obtained from this study, while 5,760 clones and 14,994 sequences were
previously reported (Yazawa et al. 2008)
f Vector, low quality, and contaminating bacterial sequences are trimmed
g A contig (contiguous sequence) contains two or more ESTs
h Number of transcripts that have a RPS-BLAST or BLASTX hit of less than 1 E10 to the Conserved Domain Database (CDD) or SwissProt
i 28K sequences were obtained from F. Nilsen (University of Bergen, Norway)
L. salmonis (P)a
L. salmonis (A)b
The entire mt genome for C. clemensi was amplified by a
long PCR method for three long fragments (5.4, 5.0, and
3.0 kb) and by PCR as described above for one short
fragment (0.8 kb). The three PCR fragments were amplified
using the PCR primer sets shown in Supplemental Table 1
and by using Long PCR Enzyme mix (Fermentas)
following the manufacturers protocol. The long PCR
amplification was performed using 100 ng of extracted
total genomic DNA of C. clemensi with an initial
denaturation step of 2 min at 94C and then a two-step
PCR procedure (40 cycles of 95C for 10 s and 68C for
7 min), and 10 min of final extension. The three long PCR
products were cloned into pCR-XL-TOPO vector
(Invitrogen) with the manufacturers protocol, and each positive
PCR product was sequenced by primer walking
(supplemental Table 1). The one short fragment was cloned into
pCR2.1 vector and sequenced as described above.
Protein-coding and rRNA genes of C. clemensi and C.
rogercresseyi were identified by alignment with the Pacific
L. salmonis mt gene sequences (GenBank: EU288200). The
majority of the tRNA genes was identified using
tRNAscan-SE 1.21(Lowe and Eddy 1997), using the same
parameters as described by Tjensvoll et al. (2005). The
remaining tRNA genes were identified based on the
sequence homology with L. salmonis tRNA sequences.
Pair-wise Kimura two-parameter (K2P) distances (Kimura
1980) of 16S rRNA and cox1genes for C. clemensi, C.
rogercresseyi, and Pacific L. salmonis were calculated in
MEGA5 (Tamura et al. 2007), with default settings.
Results and Discussion
EST Analysis and Comparison of the Nuclear Genes
Normalized cDNA libraries were constructed for Pacific L.
salmonis, Atlantic L. salmonis, C. clemensi, C.
rogercresseyi, and L. branchialis. The 114,967 clones obtained
from these cDNA libraries (28,032 Pacific L. salmonis,
51,607 Atlantic L. salmonis, 7,680 C. clemensi, 19,200 C.
rogercresseyi, and 8,448 L. branchialis) were sequenced
with M13 forward and M13 reverse (L. salmonis and C.
rogercresseyi) or with M13 forward and SP6 primers (C.
clemensi and L. branchialis). A summary of the EST
project is shown in Table 1. From these clones, 153,977
high-quality ESTs were obtained from Pacific L. salmonis
(49,672 ESTs), Atlantic L. salmonis (57,349 ESTs), C.
clemensi (14,821 ESTs), C. rogercresseyi (32,135 ESTs),
and L. branchialis (16,441 ESTs). The average trimmed
length of these ESTs was 734 bp. These EST sequences are
available in GenBank.
The 49,672 Pacific L. salmonis ESTs obtained in this
study along with 14,994 Pacific L. salmonis ESTs from our
previous study (Yazawa et al. 2008) were assembled into
11,922 contigs and 4,186 singletons (16,108 putative
transcripts). There is a total of 14,466 putative transcripts
for Atlantic L. salmonis, 6,054 for C. clemensi, 11,357 for
C. rogercresseyi, and 6,438 for L. branchialis. These
putative transcripts were annotated using RPS-BLAST and
BLASTX comparisons with the CDD and SwissProt
(Bairoch and Apweiler 1996), respectively. The best match
(E value threshold of 1 E10) was used to identify putative
transcripts. Of the 16,108 Pacific L. salmonis putative
transcripts, 7,157 (44.4%) matched at least one entry in the
databases while the others remain unidentified. Similarly,
6,726 (46.5%) Atlantic L. salmonis, 3,775 (62.4%) C.
clemensi, 5,830 (51.3%) C. rogercresseyi, and 3,951
(61.4%) L. branchialis putative transcripts have significant
BLAST hits (Table 1).
A collection of reference FLcDNA clones is an
important resource for identifying genes, determining their
structural features and for experimental analysis of gene
functions. Possible reference FLcDNAs were defined as
having an entire open reading frame (ORF) corresponding
to a full-length protein and were identified as described
previously (Leong et al. 2010). Using an E value filter of
E 105, the top ten SwissProt high-scoring segment pairs
(HSPs) from BLASTX for each putative transcript were
analyzed in succession to identify the correct ORF. Of the
16,108 Pacific L. salmonis putative transcripts, 1,435
transcripts were identified as possible FLcDNAs. There
are 1,086 Atlantic L. salmonis FLcDNAs, 1,223 C.
clemensi FLcDNAs, and 1,574 C. rogercresseyi FLcDNAs.
These reference FLcDNAs were submitted to NCBIs FLIC
A relational database with an intuitive web interface was
developed to process and display the large quantities of
EST data, their assemblies, and their associated annotation
information (Fig. 1). This interface provides the ability to
search using sequence data, identifiers, accession numbers,
and descriptive keywords. The BLAST search allows users
to perform homology searches with sequences of interest,
identifying potential transcripts names, and then visualizing
these sequences and EST alignments. These EST contigs
have predicted ORFs and BLASTX HSPs displayed in a
single view. This database contributes to the identification
and analysis of proteins and to the development of
microarrays for gene expression analyses.
Fig. 1 Screenshot of sea lice EST contig summary and search tools.b
The top panel allows users to perform homology searches for
sequences of interest. The second provides the ability to search using
sequence data, identifiers, accession numbers, and descriptive
keywords. The third to seventh panels show a summary of the EST
clustering results of C. clemensi, C. rogercressyi, Pacific L. salmonis,
Atlantic L. salmonis, and L. branchialis, respectively
Sequence similarities and putative transcripts were
compared among the nuclear genes of the five copepods
(Pacific L. salmonis, Atlantic L. salmonis, C. clemensi, C.
rogercresseyi, and L. branchialis) by BLASTN for
nucleotide (nt) sequences and tBLASTX for amino acid (aa)
sequences (Table 2). We previously reported that a total of
155 nuclear genes from Pacific and Atlantic L. salmonis
showed an average of 96.8% nt identity over an average of
756 bp (Yazawa et al. 2008). In this study, a total of 8,121
nucleotide and 8,827 translated aa sequences matched
between the Pacific and Atlantic L. salmonis putative
transcripts. These sequences showed an average of 96%
identity at the nt level over an average of 626 bp and 88%
at the aa level over an average of 187 aa (Table 2). Nuclear
gene sequences were quite different not only between the
genera Caligus and Lepeophtheirus (8182% nt, 7072%
aa identities), but also between the two Caligus species
(83% nt, 71% aa identities; Table 2). The range of nuclear
gene sequence divergence was quite similar among these
species (1719% nt and 2830% aa sequence divergences).
As expected, nucleotide sequences of L. branchialis, the
only species examined from the family Pennellidae, were
very different from the caligid sequences: only 46% of the
total queries (254405 sequences) matched the nuclear
genes of the four other copepods. We speculate that the
matched genes are conserved among copepods and
therefore we could not determine the divergence between nt
sequences of L. branchialis and the four caligid copepods.
However, the 2,6343,375 translated aa sequences of L.
branchialis (4452% of query sequences) did show
significant matches with sequences of the four other
copepods. These translated aa sequences showed 5962%
identities over averages of 121132 aa (Table 2). Although
these comparisons provide only a very rough estimate of
overall sequence similarity, they clearly indicate a high
level of sequence divergence among these copepods nuclear
Mitochondrial Genome Sequences of L. salmonis, C.
clemensi, and C. rogercresseyi
Metazoan mt genomes typically range between 15 and
20 kb in size, containing 37 genes: 13 protein-encoding
genes (PCGs), 22 transfer RNA (tRNA) genes, two
ribosomal RNA (rRNA) genes and a major non-coding
region (NCR; Boore 1999). In this study, whole mt genome
sequences of two Caligus species, C. clemensi and C.
rogercresseyi, were determined. The sizes of the entire mt
genomes were 13,440 bp for C. clemensi [Genbank:
HQ157566] and 13,468 bp for C. rogercresseyi [Genbank:
HQ157565], and thus, these mt genomes are the shortest
among 57 crustacean mt genomes (average length:
15,785 bp) reported so far (Genbank: November 2010).
There are two reasons for the small size of these mt
genomes. First, the major NCRs of the C. clemensi (104 bp)
and C. rogercresseyi (129 bp) mt genomes were much
shorter than that of L. salmonis (Pacific form, 1,441 bp;
Atlantic form, 2,146 bp) and that of other crustaceans
(average length, 875 bp), except for that of the amphipod
Metacrangonyx longipes (76 bp; Bauz-Ribot et al. 2009).
Second, while both Caligus mt genomes contained the
typical set of 12 protein-encoding, 21 tRNA and two rRNA
genes found in other animal mt genomes, both mt genomes
lacked the PCG, nad4L, and a tRNA gene, trnL2 (CUN).
Interestingly, the C. clemensi mt genome is adenine and
thymine (A + T)-rich (PCG, 74.5%; whole genome, 75.6%)
compared to C. rogercresseyi and L. salmonis (PCG, 63.6
64.9%; whole genome, 65.266.5%; Supplemental Table 2).
In crustaceans, the mt genomic AT content values range
from 60.9% for Ligia oceanica (Isopoda; Kilpert and
Podsiadlowski 2006) to 77.8% for Argulus americanus
(Branchiura; Lavrov et al. 2004). The reason for the
variability in AT richness within the mitochondrial
genome among taxa is not clear.
Like the nuclear genes, the mtDNA gene sequences also
exhibited large divergence, not only between L. salmonis
and the two Caligus species (66.768.8% nt and 64.2
65.4% aa identities), but also between the two Caligus
species (68.8% nt and 63.6% aa identities). The range of
mtDNA sequence divergence was quite similar among the
three caligid copepods. The percent nt and aa identities
among the L. salmonis, C. clemensi, and C. rogercresseyi
sequences are 63.668.8% (Table 3). The cox1 gene is the
most conserved PCG among the three mt genomes (79.1
82.6% nt and 91.294.1% aa identities), while nad2, nad4,
nad5, and nad6 exhibit a large sequence divergence (56.1
62.2% nt and 40.051.9% aa identities; Table 3).
Hebert et al. (2003) reported that cox1 divergences
among the 13,320 species in the animal kingdom ranged
from a low of 0.0% to a high of 53.7% and the mean
divergence value of 11.3%. The cox1 divergences in the
Crustacea showed the mean species divergence value of
15.4% (Hebert et al. 2003). Interestingly, our present study
showed that the cox1 divergences among the three caligid
copepods were higher than the mean divergence value of
Crustacea. The cox1 interspecific divergence between C.
clemensi and C. rogercresseyi is 20.2% and between the
genera Caligus and Lepeoptheirus 26.0%. ines and
Schram (2008) compared among the cox fragment (a total
504 aligned base pairs) of 18 caligid copepods and the 16S
rRNA fragment (a total of 438 aligned base pairs) of 11
caligid copepods. They found that an average K2P distance
of cox1 were 0.218 and those of 16S rRNA were 0.221
(ines and Schram 2008). In the present study, the K2P
distance of cox1 (a total of 1,539 aligned base pairs) among
the L. salmonis, C. clemensi, and C. rogercresseyi is 0.202
tiite um tls
rad itno iend .414 .1581 .832 .0971 .982 .6561 .113 .0971 .013 .4761 .862 .4571 .403 .8161 .992 .2261 .752 .3161 .732 .3861 iinm rseu
iaxm titien 00% 00% 8% 00% 8% 00% 00% 00% 7% 00% 9% 00% 00% 00% 9% 00% 8% 00% 9% 00% anE ,tish
Mid 1 1 9 1 9 1 1 1 9 1 9 1 1 1 9 1 9 1 9 1 th e
A C C C C C C L L L
Table 3 Comparison of the L. salmonis, C. clemensi, and C. rogercressyi mtDNA genes
In nucleic sequence (%)
In deduced amino acid sequence (%)
a Comparisons of amino acid sequences of atp8 genes were not conducted because these sequences are very short in size (31 aa)
b nad4L genes are absent in the two Caligus species
0.270 (Supplemental Table 3), which is similar to an
average K2P distance found by ines and Schram (2008).
However, the 16S rRNA among the three copepods showed
a very high genetic distance. The K2P distance of the 16S
rRNA (a total of 1,085 aligned base pairs) were 0.333
between C. clemensi and C. rogercresseyi and 0.422
(Supplemental Table 3). These molecular distance values
support an ancient separation between C. clemensi and C.
rogercresseyi as well as between Lepeoptheirus and
In our previous study, a molecular clock based on 16S
rRNA and calibrated by copepod data suggested that the
forms of L. salmonis existing in the Pacific and Atlantic
Oceans evolved from a common ancestor following a
separation that occurred from 4.611 million years ago
(Yazawa et al. 2008). In this study, the molecular estimates
of the age of divergence between the L. salmonis (Pacific)
and the two Caligus species were calculated based on the
16S rRNA gene using the same method as previously
reported (Yazawa et al. 2008). The results suggest that the
separation between the L. salmonis (Pacific) and the two
Caligus species occurred approximately 45113 million
years ago (Table 4). In addition, the separation between the
two Caligus species was estimated to have occurred 37
87 million years ago (Table 4). Salmonids are believed to
have evolved from an ancestor in which a whole genome
duplication event occurred 25100 million years ago (Ohno
1970). Thus, our present results suggest that the L. salmonis
and C. clemensi have been in existence for 45106 million
years and that parasitic association with salmonids is likely
also quite ancient (Table 4).
The order of the genes in the two Caligus mt genomes is
identical despite extensive sequence divergence. In contrast,
the order of genes in the two Caligus mt genomes is quite
different from that in the L. salmonis mt genome. The gene
arrangement in the region between nad4 and trnL1 (UUR;
approximately 10 kb) is well conserved between L.
salmonis and the Caligus species. However, the gene
arrangements adjacent to their control regions (CRs) are
very distinct, and the Caligus mt genomes show a novel
gene arrangement (Fig. 2). The region around the CR is
more prone to gene rearrangement in both vertebrate
(Macey et al. 1997) and invertebrate (Dowton and Austin
1999) mt genomes. In the L. salmonis mt genomes, the
region between trnK2 and trnR (six tRNA and atp6 genes)
is in a row (Tjensvoll et al. 2005; Yazawa et al. 2008).
However, in the Caligus mt genomes, this region is
separated by rrnS-nad6-trnA-trnK1-trnQ-trnT-cytb-CR, and
divided into trnK2-trnN-trnG-trnV and atp6-trnY-trnR (trnY
also had a position change; Fig. 2). As mentioned above,
the nad4L and trnL2 (CUN) genes are absent in the Caligus
mt genomes. These two genes normally reside in this region
Table 4 Ranges of 16S rRNA gene divergence based on Kimura two-parameter distance and crustacean molecular clock calibrations
Divergence Range (Myr)
Pacific form L. salmonis vs. C. clemensi
Pacific form L. salmonis vs. C. rogercressyi
C. clemensi vs. C. rogercressyi
The values for Distance are the Kimura two-parameter (K2P) distance between the species. Rates of molecular evolution used for the 16S rRNA
gene include 0.38% K2P/million year (Myr) for anomurans (Ano; Cunningham et al. 1992), 0.90% K2P/Myr for fiddler crabs (Fid; Sturmbauer et
al. 1996), and 0.65 (low)0.88% (high) K2P/Myr obtained from grapsid crabs (Gra; Schubart et al. 1998)
and have probably been lost due to rearrangement. It is
likely that this rearrangement event also has led to the
trimming of their CRs in the two Caligus mt genomes.
In the mt genomes of most animals, nad4L and atp8 are
located together with nad4 and atp6, respectively
(nad4Lnad4 and atp8-atp6), and nad4L- nad4 and atp8-atp6 are
translated from a single mRNA (Amalric et al. 1978;
Berthier et al. 1986). In contrast, several genes separate
nad4 and nad4L in the mt genomes of L. salmonis and in
the mt genomes of all copepods characterized so far:
Tigriopus japonicas (Machida et al. 2002), Tigriopus
californicus (Burton et al. 2007), Paracyclopina nana (Ki
et al. 2009), and the partially sequenced mt genomes of
Eucalanus bungii and Neocalanus cristatus (Machida et al.
2004). The atp6 and atp8 are also separated in the two
Caligus species and in L. salmonis (Fig. 2). In addition, it
has been reported that atp8 is absent in the mt genome of P.
nana (Ki et al. 2009). Thus, it is most likely that these
separations of nad4-nad4L and atp6-atp8 occurred during
copepod evolution and led to the loss of nad4L in the two
Caligus species and to the loss of atp8 in the P. nana.
In summary, the mtDNA genes of the two Caligus
species showed high levels of sequence divergence
(Table 3). The A+T content is also quite different between
Fig. 2 Genomic organization of the C. clemensi (13,440 bp) and the
C. rogercressyi (13,468 bp) mt genomes. The complete mt genomes of
the Atlantic (15,445 bp) and Pacific (16,148 bp) L. salmonis were
previously reported, and these mt genomes are identical in gene
organization (Tjensvoll et al. 2005; Yazawa et al. 2008). Boxes
represent mtDNA genes. tRNA genes are denoted by the single letter
amino acid code, and an underline indicates tRNA genes located on
negative strand. rrnL and rrnS refer to 16S and 12S rRNA; cox1,
cox2, and cox3 refer to cytochrome oxidase subunit I, II, and III; cob
refers to cytochrome b; nad16 and nad4L refer to NADH
dehydrogenase subunits 16 and 4 L, atp6 and atp8 refer to ATP
synthase subunits 6 and 8, respectively, and CR refers to control
region. Transcription directions for the protein-coding and rRNA
genes are shown by arrowheads
the two Caligus mt genomes (Supplemental Table 2). In
addition, the orders of the genes in the two Caligus mt
genomes are identical to each other, but different from the
order in the L. salmonis mt genome (Fig. 2).
Sea Lice as Ectoparasite Model System
Since parasites by definition depend on a live host for
growth and survival, in vitro culture system is typically
very difficult to establish. Although procedures for
experimental infections are established for some parasitic
species, manipulation of the parasites may still be very
difficult since removing them from the host is lethal for the
parasite in general. Sea lice have life cycle features that
make them promising as a model system. The life cycle
features, consisting of both free-living larval developmental
stages and pre-adults and adult stages that can move
unrestricted on host surface, enable manipulation of these
parasites. For L. salmonis, recent advances in larval
production systems and infection procedures (see Hamre
et al. 2009) have been crucial for the establishment of
defined laboratory strains of the salmon louse with different
properties (e.g., drug-resistant strains, inbred strains). Stable
and predictable production conditions further enables
specific breeding to create various types of hybrids (e.g.,
susceptible and drug-resistant family groups). The
improvement of rearing facilities has been a crucial facilitator for
establishment of RNAi in L. salmonis (Dalvin et al. 2009).
Systemic RNAi is easily achieved in pre-adult or adult lice
by injection of dsRNA in the animal. In addition, soaking
free-living larval stages (e.g., copepodids) in dsRNA
enables RNAi in copepodids (Campell et al. 2009). In
addition, the genomes of both the Pacific and Atlantic
variants of L. salmonis are currently being sequenced and
together with the present cDNA resources this will open up
for a new avenue in sea lice research. There is a wide
diversity of arthropod parasites and good experimental
parasite model systems are scarce, and we anticipate that
experimental studies on salmon louse and other sea lice
species will contribute to increase our knowledge about
ectoparasites in general, particularly when more parasite
genomes become available.
We sequenced over 150,000 ESTs from Pacific L. salmonis
(49,672 new ESTs in addition to 14,994 previously reported
ESTs), Atlantic L. salmonis (57,349 ESTs), C. clemensi
(14,821 ESTs), C. rogercresseyi (32,135 ESTs), and L.
branchialis (16,441 ESTs; Table 1). A relational database
with an intuitive web interface was developed to process
and display the large quantities of EST data, their
assemblies and associated annotation information, as well
as possible full-length gene information (Fig. 1). This
database provides a novel resource for the study of sea
louse biology, population genetics, and control strategies.
This genomic resource represents the largest compilation of
any copepod species and provides the material basis for the
development of a 38K microarray that can be used in
conjunction with our existing salmon 44K microarray to
study hostparasite interactions at the molecular level.
The nuclear genes showed a high level of sequence
divergence among the caligid copepods examined: L.
salmonis, C. clemensi, C. rogercresseyi, and L. branchialis
(Table 2). In addition, whole mt genome sequences of two
Caligus species, C. clemensi (13,440 bp) and C.
rogercresseyi (13,468 bp), were determined and compared. The
L. salmonis, C. clemensi, and C. rogercresseyi mtDNA
genes also exhibited extensive sequence divergence,
ranging among these species from 66.7 to 68.8% nt and from
63.6% to 65.4% aa identities (Table 3). Both nuclear and
mtDNA genes showed very high levels of sequence
divergence between these ectoparastic copepods which
suggested that they have been in existence for 37
113 million years and that parasitic association with marine
organisms is likely also quite ancient. However, while the
order of the genes in the two Caligus mt genomes is the
same, they are different from L. salmonis (Fig. 2). The large
sequence divergence observed among these copepods may
help to explain an extensive variety of morphology, life
history, and host association in copepods.
Acknowledgments This project (GiLSGenomics in Lice and
Salmon) was supported by Genome BC, Microtek Intl., Marine
Harvest, Mainstream Canada, Greig Seafoods, and the University of
Victoria. We would like to thank Rob Holt (Head of Sequencing,
Genome Sciences Centre, Vancouver, BC, Canada), Richard Moore
(Sequencing Group Leader, Genome Sciences Centre), Sarah Munro,
Mike Mayo, and Susan Wagner (Genome Sciences Centre) for plating
and sequencing. We also would like to thank John Burka (University
of P.E.I., Canada), Frank Nilsen, and Heidi Kongshaug (University of
Bergen, Norway) for Atlantic forms of L. salmonis; the Salmones
Maullin Company (Chile) for C. rogercresseyi; Brendan Conners
(Salmon Coast Field Station, Simoom Sound, BC, Canada) for C.
clemensi; and James Bron and Sarah Barker (University of Stirling,
Scotland, UK) for L. branchialis.
Open Access This article is distributed under the terms of the
Creative Commons Attribution Noncommercial License which permits
any noncommercial use, distribution, and reproduction in any medium,
provided the original author(s) and source are credited.