Visualization and analysis of microarray and gene ontology data with treemaps

BMC Bioinformatics, Jun 2004

The increasing complexity of genomic data presents several challenges for biologists. Limited computer monitor views of data complexity and the dynamic nature of data in the midst of discovery increase the challenge of integrating experimental results with information resources. The use of Gene Ontology enables researchers to summarize results of quantitative analyses in this framework, but the limitations of typical browser presentation restrict data access. Here we describe extensions to the treemap design to visualize and query genome data. Treemaps are a space-filling visualization technique for hierarchical structures that show attributes of leaf nodes by size and color-coding. Treemaps enable users to rapidly compare sizes of nodes and sub-trees, and we use Gene Ontology categories, levels of RNA, and other quantitative attributes of DNA microarray experiments as examples. Our implementation of treemaps, Treemap 4.0, allows user-defined filtering to focus on the data of greatest interest, and these queried files can be exported for secondary analyses. Links to model system web pages from Treemap 4.0 enable users access to details about specific genes without leaving the query platform. Treemaps allow users to view and query the data from an experiment on a single computer monitor screen. Treemap 4.0 can be used to visualize various genome data, and is particularly useful for revealing patterns and details within complex data sets.

Article PDF cannot be displayed. You can download it here:

https://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/1471-2105-5-84

Visualization and analysis of microarray and gene ontology data with treemaps

BMC Bioinformatics Software Visualization and analysis of microarray and gene ontology data with treemaps Eric H Baehrecke 1 Niem Dang 0 Ketan Babaria 0 Ben Shneiderman 0 0 Department of Computer Science and Human-Computer Interaction Laboratory, University of Maryland , College Park, Maryland 20742 , USA 1 Center for Biosystems Research, University of Maryland Biotechnology Institute , College Park, Maryland 20742 , USA Background: The increasing complexity of genomic data presents several challenges for biologists. Limited computer monitor views of data complexity and the dynamic nature of data in the midst of discovery increase the challenge of integrating experimental results with information resources. The use of Gene Ontology enables researchers to summarize results of quantitative analyses in this framework, but the limitations of typical browser presentation restrict data access. Results: Here we describe extensions to the treemap design to visualize and query genome data. Treemaps are a space-filling visualization technique for hierarchical structures that show attributes of leaf nodes by size and color-coding. Treemaps enable users to rapidly compare sizes of nodes and sub-trees, and we use Gene Ontology categories, levels of RNA, and other quantitative attributes of DNA microarray experiments as examples. Our implementation of treemaps, Treemap 4.0, allows user-defined filtering to focus on the data of greatest interest, and these queried files can be exported for secondary analyses. Links to model system web pages from Treemap 4.0 enable users access to details about specific genes without leaving the query platform. Conclusions: Treemaps allow users to view and query the data from an experiment on a single computer monitor screen. Treemap 4.0 can be used to visualize various genome data, and is particularly useful for revealing patterns and details within complex data sets. - Background Genome sequencing has presented biologists with new challenges in data analysis. This advance coupled with the advent of methods to empirically analyze whole genome changes in RNA levels, protein levels, and protein activities [1-5], presents difficulties in visualizing summaries of data while obtaining meaningful details. The use of colored mosaics and hierarchical clustering to query relative RNA levels revolutionized DNA microarray analyses [6], and has been the prominent mechanism for assessing this data. While continued development of this data analysis platform is useful, these methods limit the ability to simultaneously visualize multiple data attributes including the analysis of qualitative information about either gene families or biological function and quantitative information such as RNA level and p-value simultaneously. The Gene Ontology (GO) consortium has established a vocabulary that provides a hierarchical structure for the analysis of genome data [7,8]. GO provides a classification of gene products into molecular functions, biological processes, and cellular components. Therefore, GO classification is particularly useful for getting overviews of data such as the percentage of genes transcribed within each category or node, and also provides a rapid mechanism for researchers to classify genes that are often given nondescript numerical names during genome annotation. The dynamic nature of GO data, which is updated weekly for active genome projects, however, challenges researchers to be vigilant in the analysis and re-analysis of data. Ideally, researchers would be able to obtain information about both qualitative attributes such as GO category, and quantitative attributes such as RNA level for an entire experiment, and query these data sets for details without losing the overview of the entire data structure. Several computational approaches have been developed to visualize and query microarray data including Spotfire [9] and Genespring [10]. While both of these platforms are capable of analyzing both qualitative and quantitative data, neither provides an ideal platform to visualize multiple attributes simultaneously while allowing dynamic queries of data in the context of the GO classification. Further, limited mechanisms exist for merging quantitative attributes such as RNA level with GO categories. Several programs have been developed to edit, browse, and facilitate studies of GO [8]. Among these applications, FatiGO [11], GoMiner [12], MAPPFinder [13], and GoSurfer [14] provide useful platforms for the analysis of microarray data in the context of the GO hierarchy, but their use of typical windows-style browsers and tree diagrams lacking quantitative data limits the ability to rapidly see patterns and obtain details on demand. Treemaps were developed to facilitate visualization of both hierarchical and quantitative information [15,16]. This technique has been used to visualize and query several forms of data including the stock market [17] and electronic product catalogs [18]. While treemaps have been recognized as a strategy that can be used to visualize clusters and GO annotation data [19], previous use of this strategy has been limited to static approaches of preselected data and have failed to take advantage of several important strengths of Treemap 4.0. Here we extend Treemap 4.0 to visualize and query microarray data in the GO framework, and use studies of programmed cell death during development [20] as an example. We present the advantages of the use of size and color to represent attributes of the genome. By using code that merges up-to-date GO assignments with various userdefined attributes such as RNA level, p-value, and others, we utilize treemaps to get overviews of data, and to query for details. Treemap 4.0 provides users with rapid responses to queries (usually a fraction of a second), and the results can be saved and exported such that they can be analyzed using other software. A link between Treemap 4.0 and organism-specific web sites enables users to get details about specific genes as needed. Thus, Treemap 4.0 fills a critical void for genome researchers who want to integrate and query GO information with various quantitative data. Results and Discussion Treemaps enable visual overviews of complex genome data with details on demand We have used treemaps to visualize DNA microarray data that examine changes in RNA levels during steroid-triggered programmed cell death in Drosophila [20]. RNA was extracted from salivary glands dissected from animals that were staged 6 and 12 hours following puparium formation; before and after the rise in steroid that triggers cell death. Three independent salivary gland RNA samples were collected from each stage and used to hybridize Drosophila Affymetrix oligonucleotide Genechips. These Genechips contain 13,197 unique gene transcripts, and 2,876 gene transcripts were consistently detected in all 3 samples of either 6- or 12-hour salivary glands. Treemap 4.0 was used to analyze the 2,876 (...truncated)


This is a preview of a remote PDF: https://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/1471-2105-5-84
Article home page: http://www.biomedcentral.com/1471-2105/5/84

Eric H Baehrecke, Niem Dang, Ketan Babaria, Ben Shneiderman. Visualization and analysis of microarray and gene ontology data with treemaps, BMC Bioinformatics, 2004, pp. 84, Volume 5, Issue 1, DOI: 10.1186/1471-2105-5-84