MycoBASE: expanding the functional annotation coverage of mycobacterial genomes
Garcia et al. BMC Genomics (2015) 16:1102
DOI 10.1186/s12864-015-2311-9
DATABASE
Open Access
MycoBASE: expanding the functional
annotation coverage of mycobacterial
genomes
Benjamin J. Garcia1,2*, Gargi Datta1,2, Rebecca M. Davidson2 and Michael Strong1,2*
Abstract
Background: Central to most omic scale experiments is the interpretation and examination of resulting gene lists
corresponding to differentially expressed, regulated, or observed gene or protein sets. Complicating interpretation is
a lack of functional annotation assigned to a large percentage of many microbial genomes. This is particularly
noticeable in mycobacterial genomes, which are significantly divergent from many of the microbial model species
used for gene and protein functional characterization, but which are extremely important clinically. Mycobacterial
species, ranging from M. tuberculosis to M. abscessus, are responsible for deadly infectious diseases that kill over 1.5
million people each year across the world. A better understanding of the coding capacity of mycobacterial
genomes is therefore necessary to shed increasing light on putative mechanisms of virulence, pathogenesis, and
functional adaptations.
Description: Here we describe the improved functional annotation coverage of 11 important mycobacterial
genomes, many involved in human diseases including tuberculosis, leprosy, and nontuberculous mycobacterial
(NTM) infections. Of the 11 mycobacterial genomes, we provide 9899 new functional annotations, compared to
NCBI and TBDB annotations, for genes previously characterized as genes of unknown function, hypothetical, and
hypothetical conserved proteins. Functional annotations are available at our newly developed web resource
MycoBASE (Mycobacterial Annotation Server) at strong.ucdenver.edu/mycobase.
Conclusion: Improved annotations allow for better understanding and interpretation of genomic and transcriptomic
experiments, including analyzing the functional implications of insertions, deletions, and mutations, inferring the
function of understudied genes, and determining functional changes resulting from differential expression studies.
MycoBASE provides a valuable resource for mycobacterial researchers, through improved and searchable functional
annotations and functional enrichment strategies. MycoBASE will be continually supported and updated to include
new genomes, enabling a powerful resource to aid the quest to better understand these important pathogenic and
environmental species.
Keywords: Mycobacteria, Annotation, Database
Background
Mycobacterium species represent both environmental
and pathogenic organisms that fall into two major
groups: tuberculosis complex such as M. tuberculosis
and M. bovis (MTBC), and Non-tuberculous mycobacteria
(NTM) such as M. avium complex, M. abscessus and M.
* Correspondence: ;
1
Computational Bioscience Program, University of Colorado Denver,
Anschutz Medical Campus, Aurora, CO, USA
Full list of author information is available at the end of the article
smegmatis. It is estimated that across the world 9.6 million
people are infected with tuberculosis every year, 3.6 million of these people are not given proper treatment, and
1.5 million people die from infection [1]. NTM infections
have become a growing concern as more people with lung
infections have positive cultures for NTM species [2], with
cystic fibrosis patients representing a disproportionate
amount of detected infections [3]. The prevalence of
NTM disease, while relatively rare at 86,244 cases in 2010
in the United States [4], is increasing throughout the
world [5, 6], with incidence of NTM exceeding that of
© 2015 Garcia et al. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to
the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Garcia et al. BMC Genomics (2015) 16:1102
Page 2 of 9
tuberculosis in the United States [6]. Treatment of NTM
disease also presents a problem due to the chronic nature
of the disease, antibiotic treatments lasting up to 18 months,
and the cost of treatment being higher than that of multidrug resistant tuberculosis [4]. Better understanding of gene
function for these species allows for better interpretation of
clinical experiments, leading to an increased understanding
of gene roles and potential drug targets.
Predictive functional annotation methods are a standard
practice in analyzing genome sequencing data [7]. Current
gene annotation and protein functional annotations are the
result of both manual curation and prediction based upon
machine-learning tools such as GenemarkS [8], RAST [9]
and various homology-based methods such as FASTA [10].
Over the past few years, there has been a development of
methods that take into account orthology, protein-protein
interactions, and text mining, such as eggNOG [11, 12], a
tool used to better annotate the M. tuberculosis genome.
There have also been improvements to homology-based
methods, allowing for both improved accuracy and the
assigning of GO terms to genes [13]. Improvements in common methodology for annotation prediction has allowed for
both better understanding of genomic content and improved
analyses performed on genomic and transcriptomic data.
While there are a couple of well curated databases for
M. tuberculosis data through TBDB [14, 15] and TubercuList [16] and a database devoted to M. abscessus in
MabsBASE [17], there remains a lack of well-curated databases for mycobacterium genomes as a whole. One
early attempt to fill this gap was made by GenoMycDB
[18], a collection of six mycobacterial genomes; however,
this database has not been updated to include more genomes. TubercuList was later extended into MycoBrowser
[19]. This website contains a comprehensive genomic and
proteomic database for three additional mycobacterial species; although, it still lacks commonly studied NTM such as
M. avium complex and M. abscessus. While TBDB [14, 15]
has grown to include other NTM species, annotations for
these species remain limited. PATRIC [20] contains a wide
array of annotated genomes, including mycobacteria, however their functional annotations do not perform well for
genomes with large amounts of pseudo genes such as M.
leprae, leading to 3607 extra genes being annotated despite
validation of these as pseudogenes [21]. The MycoBASE
database was created to extend the functional annotation
knowledge of mycobacteria in general, allowing for a better
genomic understanding of both a highly prevalent group of
infectious agents, tuberculosi (...truncated)