Experimental annotation of the human pathogen Histoplasma capsulatum transcribed regions using high-resolution tiling arrays
BMC Microbiology
Experimental annotation of the human pathogen Histoplasma capsulatum transcribed regions using high-resolution tiling arrays
Mark Voorhies 0 1
Catherine K Foo 0 3
Anita Sil 0 1 2
0 achieved by incubating cells at 37C. Once introduced into the host, H. capsulatum colonizes host immune cells. Understanding both how H. capsulatum switches its growth program in response to temperature and how this pathogen subverts the innate immune system are major areas of inquiry. The elucidation of H. capsulatum pathogenesis and biology has been greatly aided by the genome sequen- cing of H. capsulatum strains G217B and G186AR at the Genome Sequencing Center (GSC) at Washington University in St. Louis and strains G186AR, WU24, H88, and H143 at the BROAD Institute. These sequenced genomes open up a wealth of possibilities for the H. capsulatum community, enabling or abetting tools such as expression arrays , insertional mutagenesis, and bioinformatic analysis. However, these approaches
1 Department of Microbiology & Immunology, University of California San Francisco , San Francisco, California, 94143 , USA
2 Howard Hughes Medical Institute, University of California San Francisco , San Francisco, California, 94143 , USA
3 Department of Biochemistry and Biophysics, University of California San Francisco , San Francisco, California, 94143 , USA
Background: The fungal pathogen Histoplasma capsulatum is thought to be the most common cause of fungal respiratory infections in immunocompetent humans, yet little is known about its biology. Here we provide the first genome-wide studies to experimentally validate its genome annotation. A functional interrogation of the Histoplasma genome provides critical support for continued investigation into the biology and pathogenesis of H. capsulatum and related fungi. Results: We employed a three-pronged approach to provide a functional annotation for the H. capsulatum G217B strain. First, we probed high-density tiling arrays with labeled cDNAs from cells grown under diverse conditions. These data defined 6,172 transcriptionally active regions (TARs), providing validation of 6,008 gene predictions. Interestingly, 22% of these predictions showed evidence of anti-sense transcription. Additionally, we detected transcription of 264 novel genes not present in the original gene predictions. To further enrich our analysis, we incorporated expression data from whole-genome oligonucleotide microarrays. These expression data included profiling under growth conditions that were not represented in the tiling experiment, and validated an additional 2,249 gene predictions. Finally, we compared the G217B gene predictions to other available fungal genomes, and observed that an additional 254 gene predictions had an ortholog in a different fungal species, suggesting that they represent genuine coding sequences. Conclusions: These analyses yielded a high confidence set of validated gene predictions for H. capsulatum. The transcript sets resulting from this study are a valuable resource for further experimental characterization of this ubiquitous fungal pathogen. The data is available for interactive exploration at http://histo.ucsf.edu.
-
Background
Histoplasma capsulatum is a dimorphic fungal pathogen
that is thought to infect up to 500,000 individuals per
year in the U.S[1]. Notably, H. capsulatum is a primary
pathogen that causes significant morbidity in
immunocompetent hosts[2]. Normally found in a filamentous
mycelial form in the soil of endemic regions, H.
capsulatum converts to the pathogenic yeast form in the lungs
of the host after inhalation of infectious particles (Figure
1). In the laboratory, temperature is a sufficient signal to
specify growth in either the mycelial form (at room
temperature) or growth in the yeast form, which can be
Macrophage colonization and disease
Figure 1 Histoplasma capsulatum is a dimorphic fungal pathogen. Histoplasma capsulatum grows as a saprophytic mold in the soil (left)
but, upon inhalation by a mammalian host, converts to a pathogenic yeast form (center) capable of intracellular growth within host
macrophages (right). Both small and large vegetative spores (micro and macroconidia, respectively) are depicted in the mold form. Within the
macrophage, yeast cells are shown within a membrane-bound phagosome, and the macrophage nucleus is also depicted.
are limited by the gene annotations associated with the
genome assemblies. This limitation is pronounced in H.
capsulatum given this eukaryotes sparse gene structure
and a limited set of known transcripts with which to
train gene prediction algorithms. Accordingly, although
the GSC used a variety of tools to generate a set of
predicted genes for G217B and G186AR http://genome.
wustl.edu/genomes/view/histoplasma_capsulatum/, these
predictions are based on limited experimental data.
In other systems where the gene finding problem has
presented itself, whole genome tiling has proven a
reliable technique for direct observation of the
transcriptome[3-6]. To this end, we generated a set of tiling
microarrays spanning the non-repetitive regions of the
G217B genome and hybridized these arrays with a pool
of cDNA derived from yeast-form Histoplasma growing
under a diverse set of conditions. The resultant data
give an unbiased measure of expression level as a
function of genome position, and thus identify the locations
and boundaries of expressed genes. The results of this
study are available, along with tools for interactive
exploration of the data, at http://histo.ucsf.edu.
Results and Discussion
Whole-genome tiling array expression profiling
To survey the transcriptome of G217B, we designed a set
of 93 unique tiling microarrays (Figure 2). The G217B
genome contains a large number of repeat regions,
including the MAGGY retrotransposon[7], which were
excluded from the tiling microarray probes. Both strands
of the remaining sequence were tiled with 50 mer probes
at an average frequency of one probe every 60 base pairs
(Figure 2). These arrays were hybridized with a pool of
fluorescently labeled cDNA generated from cells grown
under a variety of conditions. Because technical
limitations did not allow us to isolate sufficient
polyadenylated-RNA from filamentous cells (which represent
the soil form of this organism and must be grown under
biosafety level three conditions due to the production of
aerosolizable infectious spores), we focused on the
pathogenic yeast form. G217B yeast cells were subjected to
numerous growth conditions (see Materials and
Methods) which had previously been observed to elicit potent
transcriptional responses[8,9]. Tiles that passed an
empirically determined detection threshold were merged
into TARs, as described in the Materials and Methods.
Detection of predicted genes
The GSC predicted that the G217B genome contains
11,221 genes, but 1,611 of these gene predictions
contain repeat sequence, including the MAGGY transposon,
and were excluded from (...truncated)