MycoBASE: expanding the functional annotation coverage of mycobacterial genomes

Dec 2015

Background Central to most omic scale experiments is the interpretation and examination of resulting gene lists corresponding to differentially expressed, regulated, or observed gene or protein sets. Complicating interpretation is a lack of functional annotation assigned to a large percentage of many microbial genomes. This is particularly noticeable in mycobacterial genomes, which are significantly divergent from many of the microbial model species used for gene and protein functional characterization, but which are extremely important clinically. Mycobacterial species, ranging from M. tuberculosis to M. abscessus, are responsible for deadly infectious diseases that kill over 1.5 million people each year across the world. A better understanding of the coding capacity of mycobacterial genomes is therefore necessary to shed increasing light on putative mechanisms of virulence, pathogenesis, and functional adaptations. Description Here we describe the improved functional annotation coverage of 11 important mycobacterial genomes, many involved in human diseases including tuberculosis, leprosy, and nontuberculous mycobacterial (NTM) infections. Of the 11 mycobacterial genomes, we provide 9899 new functional annotations, compared to NCBI and TBDB annotations, for genes previously characterized as genes of unknown function, hypothetical, and hypothetical conserved proteins. Functional annotations are available at our newly developed web resource MycoBASE (Mycobacterial Annotation Server) at strong.ucdenver.edu/mycobase. Conclusion Improved annotations allow for better understanding and interpretation of genomic and transcriptomic experiments, including analyzing the functional implications of insertions, deletions, and mutations, inferring the function of understudied genes, and determining functional changes resulting from differential expression studies. MycoBASE provides a valuable resource for mycobacterial researchers, through improved and searchable functional annotations and functional enrichment strategies. MycoBASE will be continually supported and updated to include new genomes, enabling a powerful resource to aid the quest to better understand these important pathogenic and environmental species.

Article PDF cannot be displayed. You can download it here:

http://www.biomedcentral.com/content/pdf/s12864-015-2311-9.pdf

MycoBASE: expanding the functional annotation coverage of mycobacterial genomes

Garcia et al. BMC Genomics (2015) 16:1102 DOI 10.1186/s12864-015-2311-9 DATABASE Open Access MycoBASE: expanding the functional annotation coverage of mycobacterial genomes Benjamin J. Garcia1,2*, Gargi Datta1,2, Rebecca M. Davidson2 and Michael Strong1,2* Abstract Background: Central to most omic scale experiments is the interpretation and examination of resulting gene lists corresponding to differentially expressed, regulated, or observed gene or protein sets. Complicating interpretation is a lack of functional annotation assigned to a large percentage of many microbial genomes. This is particularly noticeable in mycobacterial genomes, which are significantly divergent from many of the microbial model species used for gene and protein functional characterization, but which are extremely important clinically. Mycobacterial species, ranging from M. tuberculosis to M. abscessus, are responsible for deadly infectious diseases that kill over 1.5 million people each year across the world. A better understanding of the coding capacity of mycobacterial genomes is therefore necessary to shed increasing light on putative mechanisms of virulence, pathogenesis, and functional adaptations. Description: Here we describe the improved functional annotation coverage of 11 important mycobacterial genomes, many involved in human diseases including tuberculosis, leprosy, and nontuberculous mycobacterial (NTM) infections. Of the 11 mycobacterial genomes, we provide 9899 new functional annotations, compared to NCBI and TBDB annotations, for genes previously characterized as genes of unknown function, hypothetical, and hypothetical conserved proteins. Functional annotations are available at our newly developed web resource MycoBASE (Mycobacterial Annotation Server) at strong.ucdenver.edu/mycobase. Conclusion: Improved annotations allow for better understanding and interpretation of genomic and transcriptomic experiments, including analyzing the functional implications of insertions, deletions, and mutations, inferring the function of understudied genes, and determining functional changes resulting from differential expression studies. MycoBASE provides a valuable resource for mycobacterial researchers, through improved and searchable functional annotations and functional enrichment strategies. MycoBASE will be continually supported and updated to include new genomes, enabling a powerful resource to aid the quest to better understand these important pathogenic and environmental species. Keywords: Mycobacteria, Annotation, Database Background Mycobacterium species represent both environmental and pathogenic organisms that fall into two major groups: tuberculosis complex such as M. tuberculosis and M. bovis (MTBC), and Non-tuberculous mycobacteria (NTM) such as M. avium complex, M. abscessus and M. * Correspondence: ; 1 Computational Bioscience Program, University of Colorado Denver, Anschutz Medical Campus, Aurora, CO, USA Full list of author information is available at the end of the article smegmatis. It is estimated that across the world 9.6 million people are infected with tuberculosis every year, 3.6 million of these people are not given proper treatment, and 1.5 million people die from infection [1]. NTM infections have become a growing concern as more people with lung infections have positive cultures for NTM species [2], with cystic fibrosis patients representing a disproportionate amount of detected infections [3]. The prevalence of NTM disease, while relatively rare at 86,244 cases in 2010 in the United States [4], is increasing throughout the world [5, 6], with incidence of NTM exceeding that of © 2015 Garcia et al. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Garcia et al. BMC Genomics (2015) 16:1102 Page 2 of 9 tuberculosis in the United States [6]. Treatment of NTM disease also presents a problem due to the chronic nature of the disease, antibiotic treatments lasting up to 18 months, and the cost of treatment being higher than that of multidrug resistant tuberculosis [4]. Better understanding of gene function for these species allows for better interpretation of clinical experiments, leading to an increased understanding of gene roles and potential drug targets. Predictive functional annotation methods are a standard practice in analyzing genome sequencing data [7]. Current gene annotation and protein functional annotations are the result of both manual curation and prediction based upon machine-learning tools such as GenemarkS [8], RAST [9] and various homology-based methods such as FASTA [10]. Over the past few years, there has been a development of methods that take into account orthology, protein-protein interactions, and text mining, such as eggNOG [11, 12], a tool used to better annotate the M. tuberculosis genome. There have also been improvements to homology-based methods, allowing for both improved accuracy and the assigning of GO terms to genes [13]. Improvements in common methodology for annotation prediction has allowed for both better understanding of genomic content and improved analyses performed on genomic and transcriptomic data. While there are a couple of well curated databases for M. tuberculosis data through TBDB [14, 15] and TubercuList [16] and a database devoted to M. abscessus in MabsBASE [17], there remains a lack of well-curated databases for mycobacterium genomes as a whole. One early attempt to fill this gap was made by GenoMycDB [18], a collection of six mycobacterial genomes; however, this database has not been updated to include more genomes. TubercuList was later extended into MycoBrowser [19]. This website contains a comprehensive genomic and proteomic database for three additional mycobacterial species; although, it still lacks commonly studied NTM such as M. avium complex and M. abscessus. While TBDB [14, 15] has grown to include other NTM species, annotations for these species remain limited. PATRIC [20] contains a wide array of annotated genomes, including mycobacteria, however their functional annotations do not perform well for genomes with large amounts of pseudo genes such as M. leprae, leading to 3607 extra genes being annotated despite validation of these as pseudogenes [21]. The MycoBASE database was created to extend the functional annotation knowledge of mycobacteria in general, allowing for a better genomic understanding of both a highly prevalent group of infectious agents, tuberculosi (...truncated)


This is a preview of a remote PDF: http://www.biomedcentral.com/content/pdf/s12864-015-2311-9.pdf
Article home page: http://www.biomedcentral.com/1471-2164/16/1102

Benjamin Garcia, Gargi Datta, Rebecca Davidson, Michael Strong. MycoBASE: expanding the functional annotation coverage of mycobacterial genomes, 2015, pp. 1102, 16, DOI: 10.1186/s12864-015-2311-9