Experimental annotation of the human pathogen Histoplasma capsulatum transcribed regions using high-resolution tiling arrays

BMC Microbiology, Sep 2011

Background The fungal pathogen Histoplasma capsulatum is thought to be the most common cause of fungal respiratory infections in immunocompetent humans, yet little is known about its biology. Here we provide the first genome-wide studies to experimentally validate its genome annotation. A functional interrogation of the Histoplasma genome provides critical support for continued investigation into the biology and pathogenesis of H. capsulatum and related fungi. Results We employed a three-pronged approach to provide a functional annotation for the H. capsulatum G217B strain. First, we probed high-density tiling arrays with labeled cDNAs from cells grown under diverse conditions. These data defined 6,172 transcriptionally active regions (TARs), providing validation of 6,008 gene predictions. Interestingly, 22% of these predictions showed evidence of anti-sense transcription. Additionally, we detected transcription of 264 novel genes not present in the original gene predictions. To further enrich our analysis, we incorporated expression data from whole-genome oligonucleotide microarrays. These expression data included profiling under growth conditions that were not represented in the tiling experiment, and validated an additional 2,249 gene predictions. Finally, we compared the G217B gene predictions to other available fungal genomes, and observed that an additional 254 gene predictions had an ortholog in a different fungal species, suggesting that they represent genuine coding sequences. Conclusions These analyses yielded a high confidence set of validated gene predictions for H. capsulatum. The transcript sets resulting from this study are a valuable resource for further experimental characterization of this ubiquitous fungal pathogen. The data is available for interactive exploration at http://histo.ucsf.edu.

Article PDF cannot be displayed. You can download it here:

http://www.biomedcentral.com/content/pdf/1471-2180-11-216.pdf

Experimental annotation of the human pathogen Histoplasma capsulatum transcribed regions using high-resolution tiling arrays

BMC Microbiology Experimental annotation of the human pathogen Histoplasma capsulatum transcribed regions using high-resolution tiling arrays Mark Voorhies 0 1 Catherine K Foo 0 3 Anita Sil 0 1 2 0 achieved by incubating cells at 37C. Once introduced into the host, H. capsulatum colonizes host immune cells. Understanding both how H. capsulatum switches its growth program in response to temperature and how this pathogen subverts the innate immune system are major areas of inquiry. The elucidation of H. capsulatum pathogenesis and biology has been greatly aided by the genome sequen- cing of H. capsulatum strains G217B and G186AR at the Genome Sequencing Center (GSC) at Washington University in St. Louis and strains G186AR, WU24, H88, and H143 at the BROAD Institute. These sequenced genomes open up a wealth of possibilities for the H. capsulatum community, enabling or abetting tools such as expression arrays , insertional mutagenesis, and bioinformatic analysis. However, these approaches 1 Department of Microbiology & Immunology, University of California San Francisco , San Francisco, California, 94143 , USA 2 Howard Hughes Medical Institute, University of California San Francisco , San Francisco, California, 94143 , USA 3 Department of Biochemistry and Biophysics, University of California San Francisco , San Francisco, California, 94143 , USA Background: The fungal pathogen Histoplasma capsulatum is thought to be the most common cause of fungal respiratory infections in immunocompetent humans, yet little is known about its biology. Here we provide the first genome-wide studies to experimentally validate its genome annotation. A functional interrogation of the Histoplasma genome provides critical support for continued investigation into the biology and pathogenesis of H. capsulatum and related fungi. Results: We employed a three-pronged approach to provide a functional annotation for the H. capsulatum G217B strain. First, we probed high-density tiling arrays with labeled cDNAs from cells grown under diverse conditions. These data defined 6,172 transcriptionally active regions (TARs), providing validation of 6,008 gene predictions. Interestingly, 22% of these predictions showed evidence of anti-sense transcription. Additionally, we detected transcription of 264 novel genes not present in the original gene predictions. To further enrich our analysis, we incorporated expression data from whole-genome oligonucleotide microarrays. These expression data included profiling under growth conditions that were not represented in the tiling experiment, and validated an additional 2,249 gene predictions. Finally, we compared the G217B gene predictions to other available fungal genomes, and observed that an additional 254 gene predictions had an ortholog in a different fungal species, suggesting that they represent genuine coding sequences. Conclusions: These analyses yielded a high confidence set of validated gene predictions for H. capsulatum. The transcript sets resulting from this study are a valuable resource for further experimental characterization of this ubiquitous fungal pathogen. The data is available for interactive exploration at http://histo.ucsf.edu. - Background Histoplasma capsulatum is a dimorphic fungal pathogen that is thought to infect up to 500,000 individuals per year in the U.S[1]. Notably, H. capsulatum is a primary pathogen that causes significant morbidity in immunocompetent hosts[2]. Normally found in a filamentous mycelial form in the soil of endemic regions, H. capsulatum converts to the pathogenic yeast form in the lungs of the host after inhalation of infectious particles (Figure 1). In the laboratory, temperature is a sufficient signal to specify growth in either the mycelial form (at room temperature) or growth in the yeast form, which can be Macrophage colonization and disease Figure 1 Histoplasma capsulatum is a dimorphic fungal pathogen. Histoplasma capsulatum grows as a saprophytic mold in the soil (left) but, upon inhalation by a mammalian host, converts to a pathogenic yeast form (center) capable of intracellular growth within host macrophages (right). Both small and large vegetative spores (micro and macroconidia, respectively) are depicted in the mold form. Within the macrophage, yeast cells are shown within a membrane-bound phagosome, and the macrophage nucleus is also depicted. are limited by the gene annotations associated with the genome assemblies. This limitation is pronounced in H. capsulatum given this eukaryotes sparse gene structure and a limited set of known transcripts with which to train gene prediction algorithms. Accordingly, although the GSC used a variety of tools to generate a set of predicted genes for G217B and G186AR http://genome. wustl.edu/genomes/view/histoplasma_capsulatum/, these predictions are based on limited experimental data. In other systems where the gene finding problem has presented itself, whole genome tiling has proven a reliable technique for direct observation of the transcriptome[3-6]. To this end, we generated a set of tiling microarrays spanning the non-repetitive regions of the G217B genome and hybridized these arrays with a pool of cDNA derived from yeast-form Histoplasma growing under a diverse set of conditions. The resultant data give an unbiased measure of expression level as a function of genome position, and thus identify the locations and boundaries of expressed genes. The results of this study are available, along with tools for interactive exploration of the data, at http://histo.ucsf.edu. Results and Discussion Whole-genome tiling array expression profiling To survey the transcriptome of G217B, we designed a set of 93 unique tiling microarrays (Figure 2). The G217B genome contains a large number of repeat regions, including the MAGGY retrotransposon[7], which were excluded from the tiling microarray probes. Both strands of the remaining sequence were tiled with 50 mer probes at an average frequency of one probe every 60 base pairs (Figure 2). These arrays were hybridized with a pool of fluorescently labeled cDNA generated from cells grown under a variety of conditions. Because technical limitations did not allow us to isolate sufficient polyadenylated-RNA from filamentous cells (which represent the soil form of this organism and must be grown under biosafety level three conditions due to the production of aerosolizable infectious spores), we focused on the pathogenic yeast form. G217B yeast cells were subjected to numerous growth conditions (see Materials and Methods) which had previously been observed to elicit potent transcriptional responses[8,9]. Tiles that passed an empirically determined detection threshold were merged into TARs, as described in the Materials and Methods. Detection of predicted genes The GSC predicted that the G217B genome contains 11,221 genes, but 1,611 of these gene predictions contain repeat sequence, including the MAGGY transposon, and were excluded from (...truncated)


This is a preview of a remote PDF: http://www.biomedcentral.com/content/pdf/1471-2180-11-216.pdf
Article home page: http://www.biomedcentral.com/1471-2180/11/216

Mark Voorhies, Catherine K Foo, Anita Sil. Experimental annotation of the human pathogen Histoplasma capsulatum transcribed regions using high-resolution tiling arrays, BMC Microbiology, 2011, pp. 216, 11, DOI: 10.1186/1471-2180-11-216