Helminth.net: expansions to Nematode.net and an introduction to Trematode.net

Nucleic Acids Research, Jan 2015

Helminth.net (http://www.helminth.net) is the new moniker for a collection of databases: Nematode.net and Trematode.net. Within this collection we provide services and resources for parasitic roundworms (nematodes) and flatworms (trematodes), collectively known as helminths. For over a decade we have provided resources for studying nematodes via our veteran site Nematode.net (http://nematode.net). In this article, (i) we provide an update on the expansions of Nematode.net that hosts omics data from 84 species and provides advanced search tools to the broad scientific community so that data can be mined in a useful and user-friendly manner and (ii) we introduce Trematode.net, a site dedicated to the dissemination of data from flukes, flatworm parasites of the class Trematoda, phylum Platyhelminthes. Trematode.net is an independent component of Helminth.net and currently hosts data from 16 species, with information ranging from genomic, functional genomic data, enzymatic pathway utilization to microbiome changes associated with helminth infections. The databases’ interface, with a sophisticated query engine as a backbone, is intended to allow users to search for multi-factorial combinations of species’ omics properties. This report describes updates to Nematode.net since its last description in NAR, 2012, and also introduces and presents its new sibling site, Trematode.net.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://nar.oxfordjournals.org/content/43/D1/D698.full.pdf

Helminth.net: expansions to Nematode.net and an introduction to Trematode.net

John Martin 2 Bruce A. Rosa 2 Philip Ozersky 2 Kymberlie Hallsworth-Pepin 2 Xu Zhang 2 Veena Bhonagiri-Palsikar 2 Rahul Tyagi 2 Qi Wang 2 Young-Jun Choi 2 Xin Gao 2 Samantha N. McNulty 2 Paul J. Brindley 1 Makedonka Mitreva 0 2 0 Department of Internal Medicine and Department of Genetics, Washington University School of Medicine , St. Louis, MO 63108 , USA 1 Department of Microbiology, Immunology & Tropical Medicine, and Research Center for Neglected Diseases of Poverty, School of Medicine & Health Sciences, The George Washington University , Washington, DC 20037 , USA 2 The Genome Institute, Washington University School of Medicine , St. Louis, MO 63108 , USA Helminth.net (http://www.helminth.net) is the new moniker for a collection of databases: Nematode.net and Trematode.net. Within this collection we provide services and resources for parasitic roundworms (nematodes) and flatworms (trematodes), collectively known as helminths. For over a decade we have provided resources for studying nematodes via our veteran site Nematode.net (http://nematode.net). In this article, (i) we provide an update on the expansions of Nematode.net that hosts omics data from 84 species and provides advanced search tools to the broad scientific community so that data can be mined in a useful and user-friendly manner and (ii) we introduce Trematode.net, a site dedicated to the dissemination of data from flukes, flatworm parasites of the class Trematoda, phylum Platyhelminthes. Trematode.net is an independent component of Helminth.net and currently hosts data from 16 species, with information ranging from genomic, functional genomic data, enzymatic pathway utilization to microbiome changes associated with helminth infections. The databases' interface, with a sophisticated query engine as a backbone, is intended to allow users to search for multi-factorial combinations of species' omics properties. This report describes updates to Nematode.net since its last description in NAR, 2012, and also introduces and presents its new sibling site, Trematode.net. - Parasitic helminth infections are considered the great neglected tropical diseases (NTDs) (1), accounting for 8 of the 17 most important NTDs, resulting in a collective burden rivaling that of the major high-mortality conditionsm such as HIV/AIDS or malaria (according to the WHO Factsheet on NTDS; http://www.who.int/ neglected diseases/2010report/en/). The symptoms of diseases caused by helminth parasites range from the dramatic sequelae of elephantiasis, blindness, seizures from neurocysticercosis and bladder and liver cancers from urogenital schistosomiasis and opisthorchiasis, respectively, to the more subtle but widespread effects on child development, pregnancy, productivity and maintenance of poverty and predisposition toward other diseases (13). Helminth.net (www.helminth.net) is the new name for an evolving collection of databases hosting resources for helminths, which includes roundworms (Nematoda; Nematode.net, which has had significant updates since 2012 (4)) and flatworms (Platyhelminthes; Trematode.net, a new addition to the website, and Cestode.net, planned in future updates). Genomes of the major parasitic helminths of medical (hookworm, whipworm, ascaris, filarial species), agricultural (e.g. root-knot and cyst nematodes) and veterinary (e.g. gastrointestinal parasites of small ruminants) significance are now the subject of genome sequencing, annotation and other omics approaches (e.g. (510)). Helminth.net complements and expands the functionality of related databases, such as WormBase (11) and its sister site WormBase-Parasite, which provide high quality reference genomes and curated gene models for many of these species. Helminth.net, in addition, provides comprehensive functional gene/protein annotation, stage and tissuespecific expression information, population-based variant annotation, ChEMBL drug target association and interactive tools for performing complex multi-factor searches and analyzes in a user-friendly manner. With Trematode.net, we will provide the research community with these data and tools for schistosomes and foodborne trematodes (FBTs), as we already provide for Nematoda. Genome sequences of the three major species of human-parasitic schistosomes have been reported over the past 5 years (12). The FBTs represent a major group of NTDs, infecting more than 50 million people, and putting 750 million others worldwide (>10% of the worlds population) at risk (1,13). Over 100 species of FBTs are known to infect humans, 10 or so of which are responsible for much of the disease burden caused by infection with FBTs (14). Due to their importance the National Institutes of Health (NIH) is supporting sequencing the genomes of 14 FBT genomes (www.trematode.net/FBT proposal.html), which will be hosted, along with comprehensive annotations and analysis tools, on Trematode.net as they become available. IMPROVEMENT AND EXPANSION OF Nematode.net The amount of data hosted on Nematode.net has grown dramatically over the last few years (Table 1). NemaGene now hosts annotation for almost 1.1 million genes and transcripts spanning 67 nematode species, including 998 226 from the genomes of 54 species, 62 385 Roche/454 cDNA isotigs (49 908 transcripts) from 2 species and 44 475 Sanger EST contigs (40 917 transcripts) from 11 species. These species (plus an additional 17 in other data portals) include 16 human parasites, 36 animal parasites, 20 plant parasites, 2 insect parasites and 10 non-parasitic species. We have also added 14 billion nematode Illumina RNAseq reads, spanning numerous stages and tissues across 16 nematodes (Figure 1), providing accurate genome annotation and normalized expression profiles per gene. The NemaBrowse portal has been updated to feature tracks displaying SnpEff-annotated variants (15) from isolates with different phenotypes, which are viewable through GBrowse. This portal will be populated with more species data as more genome-wide single nucleotide polymorphism data based on high-throughput sequencing becomes available, providing an accessible way to explore variants with regard to the acquisition of drug resistance in helminths or other phenotypes. Alternative splicing (AS) of mRNA is a vital mechanism for enhancing evolutionary complexity, enabling single genes to have diverse molecular and biological functions across organs, tissues, developmental stage and environmental conditions. Predictions of 349 565 isoforms across 10 parasitic nematode species (16) are now hosted on Nematode.net, facilitating deeper investigation of AS and its implications, and AS information based on RNAseq data will be hosted soon. Many nematodes and trematodes reside in the gastrointestinal tract, directly modulating the immune system, and indirectly influencing the immune response through their effect on the microbiome of the alimentary tract of the host. We have built a Microbiome Interaction section of the database, where we host research summaries, highlights of important results and available data sets from publications examining microbiome structure and changes as a result of helminth infections. At present we host microbial communities profiled using targeted 16S rRNA gene sequencing during hookworm infections (17), whipworm infection (18) and polyparasitism (19). In addition, we host currently unpublished metagenome shotgun sequencing data examining microbial communities during nematode infections. Our Data Download section now hosts additional resources and supplemental data related to the publications (published and in progress) of several dozen nematode pathogens, including RNAseq gene expression data and mass spectrometry proteomic data for available species. Expansion of analysis and data-mining portals A number of new tools for exploring data and performing analyses have been introduced to Nematode.net (Figure 2). The NemaGene interface was redesigned to be more userfriendly, and now allows users to define queries using multiple species of interest, InterPro IDs (20,21), Gene Ontology (GO) terms (22), Kegg Orthology (KO) IDs (23) and transcript presence in a given stage and/or tissue according to Sanger EST contigs or 454/Roche cDNA isotigs (where available). NemaGene search results now provide protein or nucleotide sequence FASTA files for all results, and links to individual gene/transcript home pages, which provide: (i) available functional annotations for InterPro (21), GO (22) and KO (23), with links to parent annotation repositories; (ii) a link to view the gene model within NemaBrowse (if available); (iii) sequences, and links to forward sequences directly to NemaBlast; (iv) links from KEGG annotations to our own NemaPath resource (24), allowing users to further explore gene functionality; (v) where available, stage and/or tissue-specific normalized expression data (FPKM) for the genes (Table 1, Figure 1), with new expression values being added as they are produced (Supplementary Information); (vi) where applicable, indication of stage-specific transcript detection according to Sanger-based EST or 454/Rochebased sequences; (vii) links to ChEMBL (25), drug target annotations; (viii) annotations of putative chokepoint enzymes. The ChEMBL drug target annotations for our hosted genes contribute to our goal of becoming a central chemogenomic resource for helminths and facilitating systematic identification of anthelminthic drug target(s) and compound(s) targeting them which has already produced promising results for nematode proteins (26,27). We screened all the NemaGene protein products against the ChEMBL database (based on similar sequences and functional annotations) for possible homology to drug targets, annotating targets and the compounds targeting them. The ChEMBL database contains detailed information on the bioactivity, chemical information and structures of more than one million small molecules, providing abundant resources for pursuing nematode proteins as drug candidates. The NemaPath tool (24) was expanded to host the genesets of 53 nematodes (and transcript pathway annotation for 9 other species) and updated to release 68 of the KEGG genes database (db). Chokepoint enzymes, which catalyze chokepoint reactions (defined as a reaction that produces a unique compound or consumes a unique substrate (28)), Database ESTs and 454/Roche cDNA sequences Illumina RNAseq sequences No. species in NemaGene NemaGene entries No. splice isoforms Codon Usage table codon counts No. of species with proteomics data No. microbiome samples No. species represented No. species in TremaGene TremaGene entries Illumina RNAseq sequences No. of species with proteomics data No. microbiome samples were also annotated using a previously published approach (26) since they are potential drug targets due to the lethality resulting from the accumulation of a unique substrate or the organism being starved of a unique substrate (26,29). The NemaBLAST service has been updated to include the nucleotide sequence (transcript and/or coding DNA sequence (CDS)) for the genesets of 45 nematode species published since the last update. The WU-BLASTbased search engine has also been migrated to a powerful compute cluster to better support queries from concurrent users. Finally, the NemaBrowse viewer now hosts gene annotations for nine genomes (Ancylostoma caninum, Ancylostoma ceylanicum, Ancylostoma duodenale, Dictyocaulus viviparus, Necator americanus, Oesophagostomum dentatum, Teladorsagia circumcincta, Trichuris suis and Trichinella spiralis), and will soon be expanded further with addition of the upcoming genomes. Data integration We have made an ongoing effort to link all annotations we provide to their repositories of origin. NemaGene functional annotations and ChEMBL (25) annotations link back to the parent database entries for every reported ID. HelmCoPs (30) output provides links into the Protein Data Bank (PDB) (31) and DrugBank (32) and our species hub pages provide links to the Sanger Pathogens unit (http: //www.sanger.ac.uk), NemBase4 (33) and the appropriate NCBI BioProject ID summary page (34), where available. The hubs also provide links to the species-specific pages available in WormBase (11) and WormBase-Parasite (parasite.wormbase.org) for organisms hosted in those complementary resources. Education Nematode.nets Education section now features the Introduction to Nematodes teaching package presentation, a comprehensive introduction to the field of nematology created by E.C. McGawley, C. Overstreet, M.J. Pontif and A.M. Skantar (Society of Nematologists http://www. nematologists.org). Our team also participated the NIHfunded filarial resource FR3 (35), an annual course organized, among others, to train parasitologists to use parasitic nematode websites/databases. The Education section features this tutorial outlining the use of Nematode.net, and we also detailed the use of each portal of our new site expansion (Trematode.net) as Supplementary Information (Sup plementary Information SI1). Site navigation URL redirection has been provided for jumping directly to species pages as well as to the major analytical tools. Species pages can be accessed directly using the URL nematode.net/<Species name>.html (e.g. nematode.net/Necator americanus.html), and the various analysis portals can be accessed similarly (e.g. nematode.net/nemagene.html). This feature is also available for Trematode.net pages. INTRODUCTION TO Trematode.net Trematode.net was recently developed to provide omics data dissemination, from an initiative to study the genomes of the etiological agents of FBTs, with a primary goal of studying trematode (fluke) genome-wide gene and protein annotations online via GBrowse (36). However, Trematode.net now also provides numerous additional services and tools to serve the Trematode research community (Supplementary Information S1, Figure 2), and currently houses information for 16 trematode species. Our overall design priority was to make Trematode.net mirror Nematode.net as much as possible (both in terms of layout and functionality) to create a seamless user experience across Helminth.net. The navigation menu interfaces for Nematode.net (Figure 3A) and Trematode.net (Figure 3B) link to each other, and are organized in a similar fashion, providing one-click access to interactive tools used to mine hosted data (Figure 3C). TremaGene (Supplementary Figure S1) is the central repository of trematode data hosted within Trematode.net, and currently houses 221 003 annotated genes from 12 trematode species (Table 2). Genes are annotated with InterPro IDs and GO terms (using InterProScan) (2022), KO IDs (KEGG version 68.0, using WU-BLAST 2.0) (23), and we also have stage and/or tissue-specific expression data for Fasciola hepatica and Schistosoma mansoni, which are displayed in the gene details pages. The TremaGene search interface operates similarly to NemaGene, where users can search based on any combination of species, with filters based on combinations of InterPro, GO and KO IDs, or specific genes (Supplementary Figure S1). Compared to NemaGene, TremaGene only lacks the stage-based expression filter (which was based on identifying stage-specific sanger EST sequences or 454/Roche cDNA sequences in nematodes), because our TremaGene data is entirely based upon analysis of draft genome assemblies. Search results can be downloaded in their entirety, or each gene can be accessed for a view of the detailed annotation, with links to TremaPath from annotated KOs, TremaBrowse to view gene models (if available), and TremaBlast to search for putative orthologs (Supplementary Figures S2 and S3). TremaBlast allows users to search custom sequence(s) directly against deduced protein sets from TremaGene (Supplementary Figure S4). Our currently available database covers 12 trematodes (Table 2), which can be selected in any combination and used as the subject for mapping using WU-BLAST 2.0 (ran in either BLASTx or BLASTp mode). SEG (ftp://ftp.ncbi.nih.gov/pub/seg/seg/) and RepeatMasker (http://www.repeatmasker.org) filters are available if the user wishes to screen out low-complexity sequence or mask repeats in their query. Jobs are submitted to a backend compute farm and results are mailed directly to the user (Supplementary Figure S5). TremaBrowse provides a window into gene annotations of finished and/or draft genomic assemblies using the GBrowse viewer (36). Currently, we host the current draft build of Fasciola hepatica (Supplementary Figure S6) as our first annotated FBT, with an aim to provide at least five more novel genomes within the next 6 months. Displayed information can include Maker (37) gene predictions, RNA genes predicted by RNAmmer (38), tRNAs predicted by tRNAscan (39) and Single Nucleotide Polymorphism (SNP) loci annotated using SnpEff (15) (Supplementary Figure S7). One goal of the TremaBrowse resource is to provide the research community with a view of in-progress trematode genomes, representing our current best draft, in advance of final genome submissions. TremaPath provides a visualization of pathway usage for trematodes, based on KO annotations (23) for all genes, which are then painted onto predefined KEGG pathway maps. Users are provided a graphical distribution of the number of KO hits with varying e-value confidence scores for their chosen species, and then set a desired threshold stringency to assign KOs (Supplementary Figure S8). Users are then presented with a menu of pathways supported by TremaPath (Supplementary Figure S9). Currently, we support four broad KEGG categories: Metabolism, Genetic Information Processing, Environmental Information Processing and Cellular Processes. After pathway selection, a graphic displaying the compounds and reactions of that pathway for their species of choice is shown, with identified enzymes colored green and darker shading indicating multiple genes annotated (Supplementary Figure S10). The user can then optionally choose a second species for comparison, mapping genes onto the same pathway and highlighting differences in pathway usage between the species. TremaPath is currently populated with 204 647 proteins from 11 trematodes (with Opisthorchis viverrini coming soon). Microbiome interaction As with its sister site, Trematode.net also hosts microbial community structure information for trematode-infected subjects, including research summaries, highlights of important results and available data sets related to the interaction of trematodes and their host environment. Currently, we host data from a recent study on infection with Opisthorchis viverrini (40) (Supplementary Figure S11). We will continue to expand this section as research findings emerge. CONCLUSION AND FUTURE PLANS The primary goal of these databases is to provide the helminth research community with access to integrated data and tools for helminths undergoing targeted active research studies, as well as those available in the public domain. The focus of this release was on: (i) the dramatic increase in the number of gene sets and RNAseq data sets providing functional genomics information on these species; (ii) the major improvements made to the NemaGene (and now also Status Published or annotated Species Clonorchis sinensis Echinostoma caproni Fasciola hepatica Opisthorchis viverrini Schistosoma curassoni Schistosoma haematobium Schistosoma japonicum Schistosoma mansoni Schistosoma margrebowiei Schistosoma mattheei Schistosoma rodhaini Trichobilharzia regenti Fasciola gigantica Fasciola buski Haplorchis taichui Opisthorchis felineus Opisthorchis viverrini Paragonimus kellicotti Paragonimus miyazaki Paragonimus westermani Paragonimus spp. (3x) Annotated gene count or project status assembly material acquisition material acquisition material acquisition annotation data production assembly annotation material acquisition Genome sequencing project in progress TremaGene) interface, enabling a much more user-friendly experience; (iii) providing chemogenomic information, i.e. annotation of helminth genes as putative targets, and the compounds putatively targeting them and (iv) the introduction Trematode.net, providing similar assistance and value to the community as Nematode.net does. We also described the expansion of a number of veteran Nematode.net tools and novel data types, including NemaGene, NemaPath, NemaBlast, NemaBrowse and our Microbiome Interaction data collection. Future expansions and improvements With over 15.1 billion reads of RNAseq currently in hand, and much more coming soon, one of our major priorities is to effectively disseminate useful analyses of this data through Helminth.net, by implementing several new data analyses and visualizations. For example, we will implement a dynamic gene expression plot viewer, allowing users to select single or multiple species of interest, life cycle stages of interest and/or genes of interest (from a custom list, or imported from other Helminth.net tools). We also plan to implement a fuzzy c-means clustering tool for gene expression data, to group sets of genes of interest according to expression patterns across stages of development and/or longitudinal sections of tissues of interest. This will include statistical cutoffs clustering, color-coded visualization of clusters and annotation information for gene members within each cluster. Genes within a cluster will also be able to be fed directly to our planned expansion of enrichment testing tools (described below), allowing for a custom de novo analysis of stage and tissue-specific functions with just a few clicks. NemaBrowse and TremaBrowse represent our central repository for the display of genomic information, and we intend to continue their use for new helminth genomes, and to expand its functionality. As we receive sequence data from clinical/field isolates of the same species, we will annotate isolate-specific variant loci in coding regions by mapping to the latest genome assembly references, and we will annotate population-specific effects of each SNP (15). These annotated SNPs, and the underlying sequence alignments to the reference, will be available as separate tracks within NemaBrowse/TremaBrowse and will be available for download as isolate-specific Variant Call Format (VCF) files. Our data currently hosted in our NemaSNP database will also be merged into NemaBrowse to simplify access to this data, and NemaSNP will be decommissioned. Eventually, we plan to provide a comparative view among user-defined sets of orthologous genes within NemaBrowse/TremaBrowse. Initially, we plan to use GBrowse with views of groups of SNP-annotated genes in individual tabs, scaled equivalently for easy comparison, but later iterations may provide more elegant solutions to view in a single window against a common reference. The NemaGene/TremaGene resource will be further expanded to allow users to download gene annotations directly to a tab-delimited text file after searching using custom filters (as described above). We will also calculate and annotate gene expression values (in units of FPKM) per available stage and/or tissue for all genes/species, and display this data in search results, with links to view the mapping information in GBrowse. We will provide links to the complete RNAseq read data set(s), either as accession IDs within NCBIs SRA (http://www.ncbi.nlm.nih.gov/sra) or as direct links if the data is pending official release. Additional annotation for all genes, including transmembrane domains and detected signal peptides (41) or predicted non-classical secretion (42), as well as degradome information for peptidases and inhibitors (43) will also be added. We also plan to track and annotate isoforms within NemaGene/TremaGene, initially using the gene and/or transcript as the central database entity, but eventually annotating individual isoforms with the same comprehensive annotations we provide for simple genes and transcripts. Isoform information will also be viewable in GBrowse. NemaPath/TremaPath metabolic pathway reconstruction will be expanded by improving both enzyme predictions and pathway mapping. This will be accomplished by undertaking alternate and independent methods for functional annotation including (i) analyzing enzyme class sequence diversity to refine the likelihood estimation in protein annotation (44); (ii) performing Functionally Discriminating Residue recognition (45); (iii) discriminating between the module characteristics of discrete enzyme activities (46) and (iv) comparing pathways across diverse taxa to detect similar topologies (4749) and translate pathway information into adjacency matrices amenable to topological alignments (50). Other planned updates include: (i) NemaFUNC/TremaFUNC, using the FUNC tool (51) to allow users to statistically analyze GO functional enrichment of a custom set of genes against a custom background set of genes; (ii) NemaIPR/TremaIPR, to perform a similar enrichment analysis on InterPro domains using internally developed tools; (iii) a tool for exploring pathway enrichment, utilizing KO ID annotations; (iv) a drug-target prioritization approach based on numerical weights assigned to annotation criteria used for querying; (v) NemaGroup/TremaGroup, to view a gene of interest in the context of the global orthologous group collection with filters to restrict the view to specific phylogenetic levels (e.g. clade-specific analyses (52)); (vi) a database hosting microbial community structure (bacterial taxa and their abundance) on a per sample basis, and related results including alpha and beta diversity, and/or metabolic capability of the community (for shotgun metagenomic data). Users will be able to perform advanced parsing and compare microbiomes among infected or non-infected individuals, as well as across infected and non-infected individuals and (vii) more expanded integration with other community resources, particularly WormBase (11) and WormBase Parasite due to the high quality of reference genomes and curated gene models that they provide. By adding information such as comprehensive functional annotation, stage and tissue-specific expression, genomewide detection and variant annotation, ChEMBL drug target association and more, Helminth.net is an excellent complement to Wormbase. Overall, these planned expansions will ease user accessibility to more data, and to more types of emerging data, to better disseminate information to the community in a way that is intuitive and that provides extremely useful analysis tools to the end user. SUPPLEMENTARY DATA Supplementary Data are available at NAR Online. ACKNOWLEDGEMENTS We sincerely thank the numerous collaborators in the helminth community (nematode.net/collaborators.html and trematode.net/collaborators.html), for providing invaluable worm material and being involved in data generation/analysis activities, and the dedicated members of the production group at The Genome Institute (http://genome.wustl.edu/) for the library construction and sequencing. National Institutes of Health (NIH) [AI081803 and GM097435 to M.M.]; NIFA [2013-01109 to M.M.]; OPP [GH 1083853]. NIH-NHGRI [U54HG003079]. NIH [AI098639, CA164719 and CA155297 to P.J.B.]. Funding for open access charge: NIH [AI081803]. Conflict of interest statement. None declared.


This is a preview of a remote PDF: https://nar.oxfordjournals.org/content/43/D1/D698.full.pdf

John Martin, Bruce A. Rosa, Philip Ozersky, Kymberlie Hallsworth-Pepin, Xu Zhang, Veena Bhonagiri-Palsikar, Rahul Tyagi, Qi Wang, Young-Jun Choi, Xin Gao, Samantha N. McNulty, Paul J. Brindley, Makedonka Mitreva. Helminth.net: expansions to Nematode.net and an introduction to Trematode.net, Nucleic Acids Research, 2015, D698-D706, DOI: 10.1093/nar/gku1128