Visualization and analysis of microarray and gene ontology data with treemaps
BMC Bioinformatics
Software Visualization and analysis of microarray and gene ontology data with treemaps
Eric H Baehrecke 1
Niem Dang 0
Ketan Babaria 0
Ben Shneiderman 0
0 Department of Computer Science and Human-Computer Interaction Laboratory, University of Maryland , College Park, Maryland 20742 , USA
1 Center for Biosystems Research, University of Maryland Biotechnology Institute , College Park, Maryland 20742 , USA
Background: The increasing complexity of genomic data presents several challenges for biologists. Limited computer monitor views of data complexity and the dynamic nature of data in the midst of discovery increase the challenge of integrating experimental results with information resources. The use of Gene Ontology enables researchers to summarize results of quantitative analyses in this framework, but the limitations of typical browser presentation restrict data access. Results: Here we describe extensions to the treemap design to visualize and query genome data. Treemaps are a space-filling visualization technique for hierarchical structures that show attributes of leaf nodes by size and color-coding. Treemaps enable users to rapidly compare sizes of nodes and sub-trees, and we use Gene Ontology categories, levels of RNA, and other quantitative attributes of DNA microarray experiments as examples. Our implementation of treemaps, Treemap 4.0, allows user-defined filtering to focus on the data of greatest interest, and these queried files can be exported for secondary analyses. Links to model system web pages from Treemap 4.0 enable users access to details about specific genes without leaving the query platform. Conclusions: Treemaps allow users to view and query the data from an experiment on a single computer monitor screen. Treemap 4.0 can be used to visualize various genome data, and is particularly useful for revealing patterns and details within complex data sets.
-
Background
Genome sequencing has presented biologists with new
challenges in data analysis. This advance coupled with the
advent of methods to empirically analyze whole genome
changes in RNA levels, protein levels, and protein
activities [1-5], presents difficulties in visualizing summaries of
data while obtaining meaningful details. The use of
colored mosaics and hierarchical clustering to query
relative RNA levels revolutionized DNA microarray analyses
[6], and has been the prominent mechanism for assessing
this data. While continued development of this data
analysis platform is useful, these methods limit the ability to
simultaneously visualize multiple data attributes
including the analysis of qualitative information about either
gene families or biological function and quantitative
information such as RNA level and p-value
simultaneously.
The Gene Ontology (GO) consortium has established a
vocabulary that provides a hierarchical structure for the
analysis of genome data [7,8]. GO provides a
classification of gene products into molecular functions, biological
processes, and cellular components. Therefore, GO
classification is particularly useful for getting overviews of data
such as the percentage of genes transcribed within each
category or node, and also provides a rapid mechanism
for researchers to classify genes that are often given
nondescript numerical names during genome annotation. The
dynamic nature of GO data, which is updated weekly for
active genome projects, however, challenges researchers to
be vigilant in the analysis and re-analysis of data. Ideally,
researchers would be able to obtain information about
both qualitative attributes such as GO category, and
quantitative attributes such as RNA level for an entire
experiment, and query these data sets for details without losing
the overview of the entire data structure.
Several computational approaches have been developed
to visualize and query microarray data including Spotfire
[9] and Genespring [10]. While both of these platforms
are capable of analyzing both qualitative and quantitative
data, neither provides an ideal platform to visualize
multiple attributes simultaneously while allowing dynamic
queries of data in the context of the GO classification.
Further, limited mechanisms exist for merging quantitative
attributes such as RNA level with GO categories. Several
programs have been developed to edit, browse, and
facilitate studies of GO [8]. Among these applications, FatiGO
[11], GoMiner [12], MAPPFinder [13], and GoSurfer [14]
provide useful platforms for the analysis of microarray
data in the context of the GO hierarchy, but their use of
typical windows-style browsers and tree diagrams lacking
quantitative data limits the ability to rapidly see patterns
and obtain details on demand.
Treemaps were developed to facilitate visualization of
both hierarchical and quantitative information [15,16].
This technique has been used to visualize and query
several forms of data including the stock market [17] and
electronic product catalogs [18]. While treemaps have
been recognized as a strategy that can be used to visualize
clusters and GO annotation data [19], previous use of this
strategy has been limited to static approaches of
preselected data and have failed to take advantage of several
important strengths of Treemap 4.0.
Here we extend Treemap 4.0 to visualize and query
microarray data in the GO framework, and use studies of
programmed cell death during development [20] as an
example. We present the advantages of the use of size and
color to represent attributes of the genome. By using code
that merges up-to-date GO assignments with various
userdefined attributes such as RNA level, p-value, and others,
we utilize treemaps to get overviews of data, and to query
for details. Treemap 4.0 provides users with rapid
responses to queries (usually a fraction of a second), and
the results can be saved and exported such that they can
be analyzed using other software. A link between Treemap
4.0 and organism-specific web sites enables users to get
details about specific genes as needed. Thus, Treemap 4.0
fills a critical void for genome researchers who want to
integrate and query GO information with various
quantitative data.
Results and Discussion
Treemaps enable visual overviews of complex genome
data with details on demand
We have used treemaps to visualize DNA microarray data
that examine changes in RNA levels during
steroid-triggered programmed cell death in Drosophila [20]. RNA was
extracted from salivary glands dissected from animals that
were staged 6 and 12 hours following puparium
formation; before and after the rise in steroid that triggers cell
death. Three independent salivary gland RNA samples
were collected from each stage and used to hybridize
Drosophila Affymetrix oligonucleotide Genechips. These
Genechips contain 13,197 unique gene transcripts, and
2,876 gene transcripts were consistently detected in all 3
samples of either 6- or 12-hour salivary glands.
Treemap 4.0 was used to analyze the 2,876 (...truncated)