Genomic Resources for Sea Lice: Analysis of ESTs and Mitochondrial Genomes

Marine Biotechnology, Apr 2012

Sea lice are common parasites of both farmed and wild salmon. Salmon farming constitutes an important economic market in North America, South America, and Northern Europe. Infections with sea lice can result in significant production losses. A compilation of genomic information on different genera of sea lice is an important resource for understanding their biology as well as for the study of population genetics and control strategies. We report on over 150,000 expressed sequence tags (ESTs) from five different species (Pacific Lepeophtheirus salmonis (49,672 new ESTs in addition to 14,994 previously reported ESTs), Atlantic L. salmonis (57,349 ESTs), Caligus clemensi (14,821 ESTs), Caligus rogercresseyi (32,135 ESTs), and Lernaeocera branchialis (16,441 ESTs)). For each species, ESTs were assembled into complete or partial genes and annotated by comparisons to known proteins in public databases. In addition, whole mitochondrial (mt) genome sequences of C. clemensi (13,440 bp) and C. rogercresseyi (13,468 bp) were determined and compared to L. salmonis. Both nuclear and mtDNA genes show very high levels of sequence divergence between these ectoparastic copepods suggesting that the different species of sea lice have been in existence for 37–113 million years and that parasitic association with salmonids is also quite ancient. Our ESTs and mtDNA data provide a novel resource for the study of sea louse biology, population genetics, and control strategies. This genomic information provides the material basis for the development of a 38K sea louse microarray that can be used in conjunction with our existing 44K salmon microarray to study host–parasite interactions at the molecular level. This report represents the largest genomic resource for any copepod species to date.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://link.springer.com/content/pdf/10.1007%2Fs10126-011-9398-z.pdf

Genomic Resources for Sea Lice: Analysis of ESTs and Mitochondrial Genomes

Motoshige Yasuike 0 1 3 Jong Leong 0 1 3 Stuart G. Jantzen 0 1 3 Kristian R. von Schalburg 0 1 3 Frank Nilsen 0 1 3 Simon R. M. Jones 0 1 3 Ben F. Koop 0 1 3 0 S. R. M. Jones Pacific Biological Station , Fisheries and Oceans Canada, 3190 Hammond Bay Road, Nanaimo, BC V9T 6N7, Canada 1 F. Nilsen Department of Biology, University of Bergen , 5020 Bergen, Norway 2 ) Department of Biology, University of Victoria , PO Box 3020 STN CSC, Victoria, BC V8W 3N5, Canada 3 Present Address: M. Yasuike Aquatic Genomics Research Center, National Research Institute of Fisheries Science, Fisheries Research Agency , 2-12-4 Fukuura, Kanazawa, Yokohama, Kanagawa 236-8648, Japan Sea lice are common parasites of both farmed and wild salmon. Salmon farming constitutes an important economic market in North America, South America, and Northern Europe. Infections with sea lice can result in significant production losses. A compilation of genomic information on different genera of sea lice is an important resource for understanding their biology as well as for the study of population genetics and control strategies. We report on over 150,000 expressed sequence tags (ESTs) from five different species (Pacific Lepeophtheirus salmonis (49,672 new ESTs in addition to 14,994 previously reported ESTs), Atlantic L. salmonis (57,349 ESTs), Caligus clemensi (14,821 ESTs), Caligus rogercresseyi (32,135 ESTs), and Lernaeocera branchialis (16,441 ESTs)). For each species, ESTs were assembled into complete or partial genes and annotated by comparisons to known proteins in public databases. In addition, whole mitochondrial (mt) genome sequences of C. clemensi (13,440 bp) and C. rogercresseyi (13,468 bp) were determined and compared to L. salmonis. Both nuclear and mtDNA genes show very high levels of sequence divergence between these ectoparastic copepods suggesting that the different species of sea lice have been in existence for 37-113 million years and that parasitic association with salmonids is also quite ancient. Our ESTs and mtDNA data provide a novel resource for the study of sea louse biology, population genetics, and control strategies. This genomic information provides the material basis for the development of a 38K sea louse microarray that can be used in conjunction with our existing 44K salmon microarray to study host-parasite interactions at the molecular level. This report represents the largest genomic resource for any copepod species to date. - Copepods (Copepoda) are a group of small crustaceans found in various aquatic environments and they are described as the most abundant metazoans on earth (Humes 1994). The subclass Copepoda consists of over 250 described families, 2,600 genera, and 21,000 described species classified into ten orders (Walter and Boxshall 2008). Their life histories are diverse; planktonic and benthic copepods are an important ecological link in the aquatic food chain (Gee 1987; Ohman and Hirche 2001), but approximately one third of marine copepod species live as associates, commensals, or parasites on invertebrates and fishes (Humes 1994). Parasitic copepods are commonly found both on farmed and wild marine finfish (Johnson and Fast 2004). They feed on host mucus, epidermal cells, tissues, and blood, the result of which causes physiological stress, immune dysfunction, impairment of swimming ability, and possibly death (Boxaspen 2006; Costello 2006; Johnson and Fast 2004; Tully and Nolan 2002). Members of the family Caligidae, especially the genera Caligus and Lepeophtheirus, are commonly referred to as sea lice (Costello 2006; Johnson et al. 2004; Pike and Wadsworth 1999). They are the most economically important parasites of the world salmon farming industry and may cause direct and indirect economic losses in the industry of 300 million (US $480 million) annually (Costello 2009). In addition, there is concern that salmon farms elevate the risk of sea lice infections on wild salmon beyond that which naturally occurs and lead to a depression in the abundance of wild salmon stocks (Costello 2006; Heuch et al. 2005; Krkoek et al. 2007a; Krkoek et al. 2007b; Todd et al. 2006). In the North Atlantic Ocean, Lepeophtheirus salmonis and Caligus elongatus account for the most serious infestations of cultured and wild salmonids (Johnson et al. 2004; Pike and Wadsworth 1999). In the eastern north Pacific Ocean, L. salmonis and Caligus clemensi have been found on farmed Atlantic salmon (Salmo salar) and wild Pacific salmon (Oncorhynchus spp.; Beamish et al. 2009; Beamish et al. 2005; Saksida et al. 2007). While L. salmonis is prevalent in both Atlantic and Pacific coasts, earlier studies suggested that the Pacific and Atlantic populations of L. salmonis are genetically distinct (Tjensvoll et al. 2006; Todd et al. 2004). More recent genomic studies strongly suggest that distinct species of L. salmonis exist in the Pacific and Atlantic Oceans following a separation that occurred from 2.5 to 11 million years ago (Boulding et al. 2009; Yazawa et al. 2008). These parasites are referred to herein as the Pacific and Atlantic forms of L. salmonis, respectively. In the southern hemisphere, Caligus rogercresseyi is the dominant species affecting salmonid aquaculture in Chile where the parasites were found on farmed salmon in 99% of the established cultured cages (Boxshall and Bravo 2000; Carvajal et al. 1998). Lepeophtheirus and Caligus species are distinguished from each other based on morphological characters (Kabata 1979). The life cycle in L. salmonis has a total of ten developmental stages, while C. elongatus and C. rogercresseyi are similar but appear to lack pre-adult stages (Piasecki and MacKinnon 1995; Gonzlez and Carvajal 2003). The host range of L. salmonis mainly includes salmonids but the parasite has also been reported from nonsalmonid hosts, including sticklebacks, that co-occur with salmon (Jones et al. 2006). In contrast, some Caligus species have a broad host range of salmonids and nonsalmonids (Costello 2006; Johnson et al. 2004). Among its salmonid hosts, L. salmonis displays clear preferences, with heaviest infestations and greatest impacts occurring on Atlantic salmon (S. salar) and sea trout (Onchorhynchus trutta) followed by rainbow trout (Onchorhynchus mykiss), chinook (Onchorhynchus tshawytscha), and coho salmon (Onchorhynchus kisutch; Dawson et al. 1997; Fast et al. 2002; Johnson and Albright 1992). In contrast, C. rogercresseyi occurs in higher numbers on caged rainbow trout compared with Atlantic or coho salmon (Gonzlez et al. 2000). Thus, while L. salmonis and Caligus species exhibit similar parasitic life history strategies, they display considerable differences in morphology, life cycle, and host range. Another parasite, Lernaeocera branchialis belongs to the copepod family Pennellidae and is distantly related to the caligid copepods, and this species is commonly found on gadoids, particularly Atlantic cod (Gadus morhua) and haddock (Melanogrammus aeglefinus) in the North Atlantic Ocean and North Sea (Bricknell et al. 2006; Smith et al. 2007). This parasite has a negative impact on wild gadoids and is a potentially serious pathogen of farmed Atlantic cod (Smith et al. 2007). A compilation of genomic information on parasitic copepods is an important tool for understanding their biology as well as for the study of population genetics and control strategies. In this study, we report on over 150,000 expressed sequence tags (ESTs) obtained from Pacific L. salmonis (49,672 new ESTs in addition to 14,994 previously reported ESTs), Atlantic L. salmonis (57,349 ESTs), C. clemensi (14,821 ESTs), C. rogercresseyi (32,135 ESTs), and L. branchialis (16,441 ESTs). These ESTs were assembled into complete or partial genes and annotated by comparisons to known proteins in public databases. In addition, whole mitochondrial (mt) genome sequences of two Caligus species, C. clemensi and C. rogercresseyi, were determined and compared to each other and to L. salmonis. These studies show high levels of sequence divergence in nuclear and mtDNA genes. This report describes the production and characteristics of the largest genomic resource for copepods. Materials and Methods Specimens belonging to the Pacific (British Columbia, Canada (BC)) and Atlantic forms of L. salmonis (Norway and New Brunswick, Canada), C. clemensi (BC), C. rogercresseyi (Chile), and L. branchialis (Scotland, UK) were collected and stored at 80C or in RNAlater (Invitrogen) until RNA extraction. Total RNA was extracted from whole bodies (from various life stages and both sexes) using TRIzol reagent (Invitrogen) and spin-column purified using RNeasy Mini kits (Qiagen). The purified RNAs were then quantified and quality checked by spectrophotometer (NanoDrop Technologies) and agarose gel, respectively. Approximately 1.03.0 g of total RNA was converted into cDNA and normalized and was directionally cloned into pAL 17.3 vector (Evrogen Co.). Clones from each library were robotically arrayed in 384-well microtiter plates as detailed previously (Koop et al. 2008). Plasmid DNAs were extracted and sequenced on an ABI 3730 DNA analyzer (Applied Biosystems) with M13 forward and M13 reverse primers (L. salmonis and C. rogercresseyi) or with M13 forward and SP6 primers (C. clemensi and L. branchialis). These sequence primers are shown in supplemental Table 1. The resulting ESTs were assembled with CAP3 (Huang and Madan 1999) with default parameters. The assembled total contigs (clusters + singletons) were annotated using RPS-BLAST and BLASTX comparisons with the Conserved Domain Database (CDD) and SwissProt (Bairoch and Apweiler 1996), respectively. The best BLAST match (E value threshold of 1 E10) was used to identify contigs. Contigs that did not meet this threshold were annotated as unknown. Reference full-length cDNAs (FLcDNAs) were identified as detailed previously (Leong et al. 2010). A single clone containing an entire coding sequence (CDS) for a gene product is considered a reference FLcDNA. Complete Mitochondrial Genome Sequences of C. clemensi and C. rogercresseyi The total genomic DNAs were extracted from an adult male C. clemensi and C. rogercresseyi as previously described (Yazawa et al. 2008). A sample placed in 5% Chelex-100 resin (Sigma) solution (5% Chelex-100 resin, 0.2% SDS in TE, with proteinase K (100 g/ml)) was incubated for 30 min at 55C, and the proteinase K was then inactivated for 10 min at 90C. The sequence determination of the complete C. rogercresseyi mt genome was carried out as previously described (Yazawa et al. 2008). The PCR primer sets that were used were designed for 15 fragments (Supplemental Table 1) based on the EST sequences encoding mtDNA. PCR amplification was performed using 1.0 l of extracted total genomic DNA of C. rogercresseyi with an initial denaturation step of 2 min at 95C and then 30 cycles as follows: 30 s of denaturation at 95C, 30 s of annealing at 55C, and 3 min of extension at 72C. PCR products were cloned into pCR2.1 vector (TA Cloning Kit, Invitrogen) with the manufacturers protocol, and each positive PCR product was sequenced as described above. Table 1 Sea lice EST project summary a L. salmonis Pacific form b L. salmonis Atlantic (Canada, Norway) form c Number of clones which from at least one sequence (5 or 3) was obtained d Number of 5 and 3 EST sequences obtained e Twenty-eight thousand thirty-two clones and 49,672 sequences were obtained from this study, while 5,760 clones and 14,994 sequences were previously reported (Yazawa et al. 2008) f Vector, low quality, and contaminating bacterial sequences are trimmed g A contig (contiguous sequence) contains two or more ESTs h Number of transcripts that have a RPS-BLAST or BLASTX hit of less than 1 E10 to the Conserved Domain Database (CDD) or SwissProt databases i 28K sequences were obtained from F. Nilsen (University of Bergen, Norway) L. salmonis (P)a L. salmonis (A)b The entire mt genome for C. clemensi was amplified by a long PCR method for three long fragments (5.4, 5.0, and 3.0 kb) and by PCR as described above for one short fragment (0.8 kb). The three PCR fragments were amplified using the PCR primer sets shown in Supplemental Table 1 and by using Long PCR Enzyme mix (Fermentas) following the manufacturers protocol. The long PCR amplification was performed using 100 ng of extracted total genomic DNA of C. clemensi with an initial denaturation step of 2 min at 94C and then a two-step PCR procedure (40 cycles of 95C for 10 s and 68C for 7 min), and 10 min of final extension. The three long PCR products were cloned into pCR-XL-TOPO vector (Invitrogen) with the manufacturers protocol, and each positive PCR product was sequenced by primer walking (supplemental Table 1). The one short fragment was cloned into pCR2.1 vector and sequenced as described above. Protein-coding and rRNA genes of C. clemensi and C. rogercresseyi were identified by alignment with the Pacific L. salmonis mt gene sequences (GenBank: EU288200). The majority of the tRNA genes was identified using tRNAscan-SE 1.21(Lowe and Eddy 1997), using the same parameters as described by Tjensvoll et al. (2005). The remaining tRNA genes were identified based on the sequence homology with L. salmonis tRNA sequences. Pair-wise Kimura two-parameter (K2P) distances (Kimura 1980) of 16S rRNA and cox1genes for C. clemensi, C. rogercresseyi, and Pacific L. salmonis were calculated in MEGA5 (Tamura et al. 2007), with default settings. Results and Discussion EST Analysis and Comparison of the Nuclear Genes Normalized cDNA libraries were constructed for Pacific L. salmonis, Atlantic L. salmonis, C. clemensi, C. rogercresseyi, and L. branchialis. The 114,967 clones obtained from these cDNA libraries (28,032 Pacific L. salmonis, 51,607 Atlantic L. salmonis, 7,680 C. clemensi, 19,200 C. rogercresseyi, and 8,448 L. branchialis) were sequenced with M13 forward and M13 reverse (L. salmonis and C. rogercresseyi) or with M13 forward and SP6 primers (C. clemensi and L. branchialis). A summary of the EST project is shown in Table 1. From these clones, 153,977 high-quality ESTs were obtained from Pacific L. salmonis (49,672 ESTs), Atlantic L. salmonis (57,349 ESTs), C. clemensi (14,821 ESTs), C. rogercresseyi (32,135 ESTs), and L. branchialis (16,441 ESTs). The average trimmed length of these ESTs was 734 bp. These EST sequences are available in GenBank. The 49,672 Pacific L. salmonis ESTs obtained in this study along with 14,994 Pacific L. salmonis ESTs from our previous study (Yazawa et al. 2008) were assembled into 11,922 contigs and 4,186 singletons (16,108 putative transcripts). There is a total of 14,466 putative transcripts for Atlantic L. salmonis, 6,054 for C. clemensi, 11,357 for C. rogercresseyi, and 6,438 for L. branchialis. These putative transcripts were annotated using RPS-BLAST and BLASTX comparisons with the CDD and SwissProt (Bairoch and Apweiler 1996), respectively. The best match (E value threshold of 1 E10) was used to identify putative transcripts. Of the 16,108 Pacific L. salmonis putative transcripts, 7,157 (44.4%) matched at least one entry in the databases while the others remain unidentified. Similarly, 6,726 (46.5%) Atlantic L. salmonis, 3,775 (62.4%) C. clemensi, 5,830 (51.3%) C. rogercresseyi, and 3,951 (61.4%) L. branchialis putative transcripts have significant BLAST hits (Table 1). A collection of reference FLcDNA clones is an important resource for identifying genes, determining their structural features and for experimental analysis of gene functions. Possible reference FLcDNAs were defined as having an entire open reading frame (ORF) corresponding to a full-length protein and were identified as described previously (Leong et al. 2010). Using an E value filter of E 105, the top ten SwissProt high-scoring segment pairs (HSPs) from BLASTX for each putative transcript were analyzed in succession to identify the correct ORF. Of the 16,108 Pacific L. salmonis putative transcripts, 1,435 transcripts were identified as possible FLcDNAs. There are 1,086 Atlantic L. salmonis FLcDNAs, 1,223 C. clemensi FLcDNAs, and 1,574 C. rogercresseyi FLcDNAs. These reference FLcDNAs were submitted to NCBIs FLIC database. A relational database with an intuitive web interface was developed to process and display the large quantities of EST data, their assemblies, and their associated annotation information (Fig. 1). This interface provides the ability to search using sequence data, identifiers, accession numbers, and descriptive keywords. The BLAST search allows users to perform homology searches with sequences of interest, identifying potential transcripts names, and then visualizing these sequences and EST alignments. These EST contigs have predicted ORFs and BLASTX HSPs displayed in a single view. This database contributes to the identification and analysis of proteins and to the development of microarrays for gene expression analyses. Fig. 1 Screenshot of sea lice EST contig summary and search tools.b The top panel allows users to perform homology searches for sequences of interest. The second provides the ability to search using sequence data, identifiers, accession numbers, and descriptive keywords. The third to seventh panels show a summary of the EST clustering results of C. clemensi, C. rogercressyi, Pacific L. salmonis, Atlantic L. salmonis, and L. branchialis, respectively Sequence similarities and putative transcripts were compared among the nuclear genes of the five copepods (Pacific L. salmonis, Atlantic L. salmonis, C. clemensi, C. rogercresseyi, and L. branchialis) by BLASTN for nucleotide (nt) sequences and tBLASTX for amino acid (aa) sequences (Table 2). We previously reported that a total of 155 nuclear genes from Pacific and Atlantic L. salmonis showed an average of 96.8% nt identity over an average of 756 bp (Yazawa et al. 2008). In this study, a total of 8,121 nucleotide and 8,827 translated aa sequences matched between the Pacific and Atlantic L. salmonis putative transcripts. These sequences showed an average of 96% identity at the nt level over an average of 626 bp and 88% at the aa level over an average of 187 aa (Table 2). Nuclear gene sequences were quite different not only between the genera Caligus and Lepeophtheirus (8182% nt, 7072% aa identities), but also between the two Caligus species (83% nt, 71% aa identities; Table 2). The range of nuclear gene sequence divergence was quite similar among these species (1719% nt and 2830% aa sequence divergences). As expected, nucleotide sequences of L. branchialis, the only species examined from the family Pennellidae, were very different from the caligid sequences: only 46% of the total queries (254405 sequences) matched the nuclear genes of the four other copepods. We speculate that the matched genes are conserved among copepods and therefore we could not determine the divergence between nt sequences of L. branchialis and the four caligid copepods. However, the 2,6343,375 translated aa sequences of L. branchialis (4452% of query sequences) did show significant matches with sequences of the four other copepods. These translated aa sequences showed 5962% identities over averages of 121132 aa (Table 2). Although these comparisons provide only a very rough estimate of overall sequence similarity, they clearly indicate a high level of sequence divergence among these copepods nuclear genes. Mitochondrial Genome Sequences of L. salmonis, C. clemensi, and C. rogercresseyi Metazoan mt genomes typically range between 15 and 20 kb in size, containing 37 genes: 13 protein-encoding genes (PCGs), 22 transfer RNA (tRNA) genes, two ribosomal RNA (rRNA) genes and a major non-coding region (NCR; Boore 1999). In this study, whole mt genome sequences of two Caligus species, C. clemensi and C. rogercresseyi, were determined. The sizes of the entire mt genomes were 13,440 bp for C. clemensi [Genbank: HQ157566] and 13,468 bp for C. rogercresseyi [Genbank: HQ157565], and thus, these mt genomes are the shortest among 57 crustacean mt genomes (average length: 15,785 bp) reported so far (Genbank: November 2010). There are two reasons for the small size of these mt genomes. First, the major NCRs of the C. clemensi (104 bp) and C. rogercresseyi (129 bp) mt genomes were much shorter than that of L. salmonis (Pacific form, 1,441 bp; Atlantic form, 2,146 bp) and that of other crustaceans (average length, 875 bp), except for that of the amphipod Metacrangonyx longipes (76 bp; Bauz-Ribot et al. 2009). Second, while both Caligus mt genomes contained the typical set of 12 protein-encoding, 21 tRNA and two rRNA genes found in other animal mt genomes, both mt genomes lacked the PCG, nad4L, and a tRNA gene, trnL2 (CUN). Interestingly, the C. clemensi mt genome is adenine and thymine (A + T)-rich (PCG, 74.5%; whole genome, 75.6%) compared to C. rogercresseyi and L. salmonis (PCG, 63.6 64.9%; whole genome, 65.266.5%; Supplemental Table 2). In crustaceans, the mt genomic AT content values range from 60.9% for Ligia oceanica (Isopoda; Kilpert and Podsiadlowski 2006) to 77.8% for Argulus americanus (Branchiura; Lavrov et al. 2004). The reason for the variability in AT richness within the mitochondrial genome among taxa is not clear. Like the nuclear genes, the mtDNA gene sequences also exhibited large divergence, not only between L. salmonis and the two Caligus species (66.768.8% nt and 64.2 65.4% aa identities), but also between the two Caligus species (68.8% nt and 63.6% aa identities). The range of mtDNA sequence divergence was quite similar among the three caligid copepods. The percent nt and aa identities among the L. salmonis, C. clemensi, and C. rogercresseyi sequences are 63.668.8% (Table 3). The cox1 gene is the most conserved PCG among the three mt genomes (79.1 82.6% nt and 91.294.1% aa identities), while nad2, nad4, nad5, and nad6 exhibit a large sequence divergence (56.1 62.2% nt and 40.051.9% aa identities; Table 3). Hebert et al. (2003) reported that cox1 divergences among the 13,320 species in the animal kingdom ranged from a low of 0.0% to a high of 53.7% and the mean divergence value of 11.3%. The cox1 divergences in the Crustacea showed the mean species divergence value of 15.4% (Hebert et al. 2003). Interestingly, our present study showed that the cox1 divergences among the three caligid copepods were higher than the mean divergence value of Crustacea. The cox1 interspecific divergence between C. clemensi and C. rogercresseyi is 20.2% and between the genera Caligus and Lepeoptheirus 26.0%. ines and Schram (2008) compared among the cox fragment (a total 504 aligned base pairs) of 18 caligid copepods and the 16S rRNA fragment (a total of 438 aligned base pairs) of 11 caligid copepods. They found that an average K2P distance of cox1 were 0.218 and those of 16S rRNA were 0.221 (ines and Schram 2008). In the present study, the K2P distance of cox1 (a total of 1,539 aligned base pairs) among the L. salmonis, C. clemensi, and C. rogercresseyi is 0.202 f o s tiite um tls rad itno iend .414 .1581 .832 .0971 .982 .6561 .113 .0971 .013 .4761 .862 .4571 .403 .8161 .992 .2261 .752 .3161 .732 .3861 iinm rseu m s u e iaxm titien 00% 00% 8% 00% 8% 00% 00% 00% 7% 00% 9% 00% 00% 00% 9% 00% 8% 00% 9% 00% anE ,tish Mid 1 1 9 1 9 1 1 1 9 1 9 1 1 1 9 1 9 1 9 1 th e i v i w f s A C C C C C C L L L Table 3 Comparison of the L. salmonis, C. clemensi, and C. rogercressyi mtDNA genes In nucleic sequence (%) In deduced amino acid sequence (%) a Comparisons of amino acid sequences of atp8 genes were not conducted because these sequences are very short in size (31 aa) b nad4L genes are absent in the two Caligus species 0.270 (Supplemental Table 3), which is similar to an average K2P distance found by ines and Schram (2008). However, the 16S rRNA among the three copepods showed a very high genetic distance. The K2P distance of the 16S rRNA (a total of 1,085 aligned base pairs) were 0.333 between C. clemensi and C. rogercresseyi and 0.422 (Supplemental Table 3). These molecular distance values support an ancient separation between C. clemensi and C. rogercresseyi as well as between Lepeoptheirus and Caligus. In our previous study, a molecular clock based on 16S rRNA and calibrated by copepod data suggested that the forms of L. salmonis existing in the Pacific and Atlantic Oceans evolved from a common ancestor following a separation that occurred from 4.611 million years ago (Yazawa et al. 2008). In this study, the molecular estimates of the age of divergence between the L. salmonis (Pacific) and the two Caligus species were calculated based on the 16S rRNA gene using the same method as previously reported (Yazawa et al. 2008). The results suggest that the separation between the L. salmonis (Pacific) and the two Caligus species occurred approximately 45113 million years ago (Table 4). In addition, the separation between the two Caligus species was estimated to have occurred 37 87 million years ago (Table 4). Salmonids are believed to have evolved from an ancestor in which a whole genome duplication event occurred 25100 million years ago (Ohno 1970). Thus, our present results suggest that the L. salmonis and C. clemensi have been in existence for 45106 million years and that parasitic association with salmonids is likely also quite ancient (Table 4). The order of the genes in the two Caligus mt genomes is identical despite extensive sequence divergence. In contrast, the order of genes in the two Caligus mt genomes is quite different from that in the L. salmonis mt genome. The gene arrangement in the region between nad4 and trnL1 (UUR; approximately 10 kb) is well conserved between L. salmonis and the Caligus species. However, the gene arrangements adjacent to their control regions (CRs) are very distinct, and the Caligus mt genomes show a novel gene arrangement (Fig. 2). The region around the CR is more prone to gene rearrangement in both vertebrate (Macey et al. 1997) and invertebrate (Dowton and Austin 1999) mt genomes. In the L. salmonis mt genomes, the region between trnK2 and trnR (six tRNA and atp6 genes) is in a row (Tjensvoll et al. 2005; Yazawa et al. 2008). However, in the Caligus mt genomes, this region is separated by rrnS-nad6-trnA-trnK1-trnQ-trnT-cytb-CR, and divided into trnK2-trnN-trnG-trnV and atp6-trnY-trnR (trnY also had a position change; Fig. 2). As mentioned above, the nad4L and trnL2 (CUN) genes are absent in the Caligus mt genomes. These two genes normally reside in this region Table 4 Ranges of 16S rRNA gene divergence based on Kimura two-parameter distance and crustacean molecular clock calibrations Divergence Range (Myr) Pacific form L. salmonis vs. C. clemensi Pacific form L. salmonis vs. C. rogercressyi C. clemensi vs. C. rogercressyi The values for Distance are the Kimura two-parameter (K2P) distance between the species. Rates of molecular evolution used for the 16S rRNA gene include 0.38% K2P/million year (Myr) for anomurans (Ano; Cunningham et al. 1992), 0.90% K2P/Myr for fiddler crabs (Fid; Sturmbauer et al. 1996), and 0.65 (low)0.88% (high) K2P/Myr obtained from grapsid crabs (Gra; Schubart et al. 1998) and have probably been lost due to rearrangement. It is likely that this rearrangement event also has led to the trimming of their CRs in the two Caligus mt genomes. In the mt genomes of most animals, nad4L and atp8 are located together with nad4 and atp6, respectively (nad4Lnad4 and atp8-atp6), and nad4L- nad4 and atp8-atp6 are translated from a single mRNA (Amalric et al. 1978; Berthier et al. 1986). In contrast, several genes separate nad4 and nad4L in the mt genomes of L. salmonis and in the mt genomes of all copepods characterized so far: Tigriopus japonicas (Machida et al. 2002), Tigriopus californicus (Burton et al. 2007), Paracyclopina nana (Ki et al. 2009), and the partially sequenced mt genomes of Eucalanus bungii and Neocalanus cristatus (Machida et al. 2004). The atp6 and atp8 are also separated in the two Caligus species and in L. salmonis (Fig. 2). In addition, it has been reported that atp8 is absent in the mt genome of P. nana (Ki et al. 2009). Thus, it is most likely that these separations of nad4-nad4L and atp6-atp8 occurred during copepod evolution and led to the loss of nad4L in the two Caligus species and to the loss of atp8 in the P. nana. In summary, the mtDNA genes of the two Caligus species showed high levels of sequence divergence (Table 3). The A+T content is also quite different between Fig. 2 Genomic organization of the C. clemensi (13,440 bp) and the C. rogercressyi (13,468 bp) mt genomes. The complete mt genomes of the Atlantic (15,445 bp) and Pacific (16,148 bp) L. salmonis were previously reported, and these mt genomes are identical in gene organization (Tjensvoll et al. 2005; Yazawa et al. 2008). Boxes represent mtDNA genes. tRNA genes are denoted by the single letter amino acid code, and an underline indicates tRNA genes located on negative strand. rrnL and rrnS refer to 16S and 12S rRNA; cox1, cox2, and cox3 refer to cytochrome oxidase subunit I, II, and III; cob refers to cytochrome b; nad16 and nad4L refer to NADH dehydrogenase subunits 16 and 4 L, atp6 and atp8 refer to ATP synthase subunits 6 and 8, respectively, and CR refers to control region. Transcription directions for the protein-coding and rRNA genes are shown by arrowheads the two Caligus mt genomes (Supplemental Table 2). In addition, the orders of the genes in the two Caligus mt genomes are identical to each other, but different from the order in the L. salmonis mt genome (Fig. 2). Sea Lice as Ectoparasite Model System Since parasites by definition depend on a live host for growth and survival, in vitro culture system is typically very difficult to establish. Although procedures for experimental infections are established for some parasitic species, manipulation of the parasites may still be very difficult since removing them from the host is lethal for the parasite in general. Sea lice have life cycle features that make them promising as a model system. The life cycle features, consisting of both free-living larval developmental stages and pre-adults and adult stages that can move unrestricted on host surface, enable manipulation of these parasites. For L. salmonis, recent advances in larval production systems and infection procedures (see Hamre et al. 2009) have been crucial for the establishment of defined laboratory strains of the salmon louse with different properties (e.g., drug-resistant strains, inbred strains). Stable and predictable production conditions further enables specific breeding to create various types of hybrids (e.g., susceptible and drug-resistant family groups). The improvement of rearing facilities has been a crucial facilitator for establishment of RNAi in L. salmonis (Dalvin et al. 2009). Systemic RNAi is easily achieved in pre-adult or adult lice by injection of dsRNA in the animal. In addition, soaking free-living larval stages (e.g., copepodids) in dsRNA enables RNAi in copepodids (Campell et al. 2009). In addition, the genomes of both the Pacific and Atlantic variants of L. salmonis are currently being sequenced and together with the present cDNA resources this will open up for a new avenue in sea lice research. There is a wide diversity of arthropod parasites and good experimental parasite model systems are scarce, and we anticipate that experimental studies on salmon louse and other sea lice species will contribute to increase our knowledge about ectoparasites in general, particularly when more parasite genomes become available. We sequenced over 150,000 ESTs from Pacific L. salmonis (49,672 new ESTs in addition to 14,994 previously reported ESTs), Atlantic L. salmonis (57,349 ESTs), C. clemensi (14,821 ESTs), C. rogercresseyi (32,135 ESTs), and L. branchialis (16,441 ESTs; Table 1). A relational database with an intuitive web interface was developed to process and display the large quantities of EST data, their assemblies and associated annotation information, as well as possible full-length gene information (Fig. 1). This database provides a novel resource for the study of sea louse biology, population genetics, and control strategies. This genomic resource represents the largest compilation of any copepod species and provides the material basis for the development of a 38K microarray that can be used in conjunction with our existing salmon 44K microarray to study hostparasite interactions at the molecular level. The nuclear genes showed a high level of sequence divergence among the caligid copepods examined: L. salmonis, C. clemensi, C. rogercresseyi, and L. branchialis (Table 2). In addition, whole mt genome sequences of two Caligus species, C. clemensi (13,440 bp) and C. rogercresseyi (13,468 bp), were determined and compared. The L. salmonis, C. clemensi, and C. rogercresseyi mtDNA genes also exhibited extensive sequence divergence, ranging among these species from 66.7 to 68.8% nt and from 63.6% to 65.4% aa identities (Table 3). Both nuclear and mtDNA genes showed very high levels of sequence divergence between these ectoparastic copepods which suggested that they have been in existence for 37 113 million years and that parasitic association with marine organisms is likely also quite ancient. However, while the order of the genes in the two Caligus mt genomes is the same, they are different from L. salmonis (Fig. 2). The large sequence divergence observed among these copepods may help to explain an extensive variety of morphology, life history, and host association in copepods. Acknowledgments This project (GiLSGenomics in Lice and Salmon) was supported by Genome BC, Microtek Intl., Marine Harvest, Mainstream Canada, Greig Seafoods, and the University of Victoria. We would like to thank Rob Holt (Head of Sequencing, Genome Sciences Centre, Vancouver, BC, Canada), Richard Moore (Sequencing Group Leader, Genome Sciences Centre), Sarah Munro, Mike Mayo, and Susan Wagner (Genome Sciences Centre) for plating and sequencing. We also would like to thank John Burka (University of P.E.I., Canada), Frank Nilsen, and Heidi Kongshaug (University of Bergen, Norway) for Atlantic forms of L. salmonis; the Salmones Maullin Company (Chile) for C. rogercresseyi; Brendan Conners (Salmon Coast Field Station, Simoom Sound, BC, Canada) for C. clemensi; and James Bron and Sarah Barker (University of Stirling, Scotland, UK) for L. branchialis. Open Access This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1007%2Fs10126-011-9398-z.pdf

Motoshige Yasuike, Jong Leong, Stuart G. Jantzen, Kristian R. von Schalburg, Frank Nilsen, Simon R. M. Jones, Ben F. Koop. Genomic Resources for Sea Lice: Analysis of ESTs and Mitochondrial Genomes, Marine Biotechnology, 2012, 155-166, DOI: 10.1007/s10126-011-9398-z