Visual annotation display (VLAD): a tool for finding functional themes in lists of genes
Mamm Genome (2015) 26:567–573
DOI 10.1007/s00335-015-9570-2
Visual annotation display (VLAD): a tool for finding functional
themes in lists of genes
Joel E. Richardson1 • Carol J. Bult1
Received: 6 March 2015 / Accepted: 19 May 2015 / Published online: 6 June 2015
The Author(s) 2015. This article is published with open access at Springerlink.com
Abstract Experiments that employ genome scale technology platforms frequently result in lists of tens to thousands of genes with potential significance to a specific
biological process or disease. Searching for biologically
relevant connections among the genes or gene products in
these lists is a common data analysis task. We have
implemented a software application for uncovering functional themes in sets of genes based on their annotations to
bio-ontologies, such as the gene ontology and the mammalian phenotype ontology. The application, called VisuaL
Annotation Display (VLAD), performs a statistical analysis
to test for the enrichment of ontology terms in a set of
genes submitted by a researcher. The results for each
analysis using VLAD includes a table of ontology terms,
sorted in decreasing order of significance. Each row contains the term, statistics such as the number of annotated
terms, the p value, etc., and the symbols of annotated
genes. An accompanying graphical display shows portions
of the ontology hierarchy, where node sizes are scaled
based on p values. Although numerous ontology term
enrichment programs already exist, VLAD is unique in that
it allows users to upload their own annotation files and
ontologies for customized term enrichment analyses,
Electronic supplementary material The online version of this
article (doi:10.1007/s00335-015-9570-2) contains supplementary
material, which is available to authorized users.
& Carol J. Bult
Joel E. Richardson
1
Mouse Genome Informatics (MGI) Database Consortium,
The Jackson Laboratory, 600 Main Street, Bar Harbor,
ME 04609, USA
supports the analysis of multiple gene sets at once, provides
interfaces to customize graphical output, and is tightly
integrated with functional and biological details about
mouse genes in the Mouse Genome Informatics (MGI)
database. VLAD is available as a web-based application
from the MGI web site (http://proto.informatics.jax.org/
prototypes/vlad/).
Introduction
One of the challenges facing biologists in the era of genome scale science is to glean biological meaning from
large experimental datasets such as those generated by
microarray, RNA Seq, ChIP (chromatin immunoprecipitation) Seq, genome wide copy number variation (CNV)
analysis, and exome sequencing. The development of
biomedical ontologies such as the Gene Ontology (GO)
(Ashburner et al. 2000; Gene Ontology 2015) and annotated gene sets (Subramanian et al. 2005) have been
essential for mining functional properties of genes from
large-scale datasets.
Numerous software tools that use curated annotations
and ontologies for extracting functional information from
gene sets have been developed over the years including
GO::TermFinder (Boyle et al. 2004), DAVID (da Huang
et al. 2009), BiNGO (Maere et al. 2005), AmiGO (Carbon
et al. 2009), GoMiner (Zeeberg et al. 2003), and WebGestalt (Wang et al. 2013). In general, these programs are
designed to analyze gene sets that show statistically significant patterns of gene expression, variation, etc. Other
gene set analysis methods such as gene set enrichment
analysis (GSEA) (Subramanian et al. 2005), parametric
analysis of gene set enrichment (PAGE) (Kim and Volsky
123
568
2005), and generally applicable gene set enrichment
(GAGE) (Luo et al. 2009) allow for the analysis of all
genes in global transcriptomics studies. These methods
were developed to address the issue that not all meaningful
gene expression changes rise to the level of statistical
significance. Both the ‘‘cutoff -based’’ and ‘‘cutoff-free’’
methods (Luo et al. 2009) rely on comparisons of experimental gene sets to annotated gene sets and ontologies to
facilitate data interpretation.
We describe here a web-based application called VLAD
(VisuaL Annotation Display) for finding functional themes
in sets of genes based on their ontology term annotations.
VLAD uses the hypergeometric test for determining significance and is appropriate for the analysis of gene sets
that are generated by ‘‘cutoff-based’’ statistical analyses
methods. VLAD is highly configurable; there are many
parameters that can be set by users that control input, data
processing, and output. A unique feature of the software
relative to existing term enrichment tools is that it is not
limited to the two native ontologies in the system: the Gene
Ontology (Ashburner et al. 2000) and the Mammalian
Phenotype Ontology (Smith and Eppig 2012); rather,
VLAD can compare lists of genes to any structured
vocabulary that is in the standard open biological and
biomedical ontologies (OBO) format (http://www.obo
foundry.org) (Smith et al. 2007) and for which there is a
file of gene-to-annotation-term associations in the GO
Annotation Format (GAF; http://geneontology.org/page/
go-annotation-file-gaf-format-10). VLAD also provides
users with a level of control over the graphical display of
results that is not available in other similar analysis sites.
VLAD is available as a web-based application from the
Mouse Genome Informatics (MGI) web site (http://proto.
informatics.jax.org/prototypes/vlad/).
Materials and methods
Data sources
To illustrate the functionality of VLAD, we analyzed genes
from a previously published study that described the genome wide gene expression patterns across key developmental stages of normal mouse diaphragm (Russell et al.
2012). In this study, the investigators used time-series
analysis (Ernst and Bar-Joseph 2006) of microarray-based
expression data to identify over 650 genes whose expression levels increased significantly between embryonic day
11.5 and embryonic day 16.5 and over 360 genes whose
expression levels decreased significantly over this same
time period.
To demonstrate the extensibility of VLAD to user-provided ontologies, an OBO ontology of mouse biochemical
123
J. E. Richardson, C. J. Bult: Visual annotation display (VLAD)…
pathways (mousecyc_obo.txt) and a corresponding set of
annotations in GAF format (mousecyc_gaf.txt) from the
curated MouseCyc database (Evsikov et al. 2009) were
downloaded from the MouseCyc project ftp site (ftp://
informatics.jax.org/pub/curatorwork/MouseCycDB/) and
chosen as the basis for a custom term enrichment analysis
using the Annotation Data Set options on the VLAD
homepage. The mouse diaphragm gene lists, OBO ontology, and GAF files are available as supplemental data and
from the following ftp site: ftp://informatics.jax.org/pub/
supplemental/MammGenome2015.
Running VLAD
VLAD is preconfigured to work with either gene-function
annotations from MGI using the GO and/or gene-phenotype annotations from MGI using the Mammalian Phenotype (MP) Ontology. The (...truncated)