GeneclusterViz: a tool for conserved gene cluster visualization, exploration and analysis
BIOINFORMATICS APPLICATIONS NOTE
Genome analysis
Vol. 28 no. 11 2012, pages 1527–1529
doi:10.1093/bioinformatics/bts177
Advance Access publication April 11, 2012
GeneclusterViz: a tool for conserved gene cluster visualization,
exploration and analysis
Vikas R. Pejaver1 , Jaehyun An2 , SungMin Rhee2 Ankita Bhan3 , Jeong-Hyeon Choi4 ,
Boshu Liu5 , Heewook Lee1 , Pamela J. Brown6 , David Kysela6 , Yves V. Brun6,∗ and
Sun Kim2,∗
1 School of Informatics and Computing, Indiana University Bloomington, IN 47404, USA, 2 School of Computer
Science and Engineering, Bioinformatics Institute, Seoul National University, Seoul, Korea, 3 Abbott Laboratories,
Chicago, IL, 4 Cancer Center and Biostatistics, Georgia Health Sciences University, Augusta, GA 30912, 5 Center for
Genomics and Bioinformatics, Indiana University Bloomington, IN 47404, 6 Department of Biology, Indiana University
Bloomington, IN 47405-3700, USA
ABSTRACT
Motivation: Gene clusters are arrangements of functionally related
genes on a chromosome. In bacteria, it is expected that evolutionary
pressures would conserve these arrangements due to the functional
advantages they provide. Visualization of conserved gene clusters
across multiple genomes provides key insights into their evolutionary
histories. Therefore, a software tool that enables visualization and
functional analyses of gene clusters would be a great asset to the
biological research community.
Results: We have developed GeneclusterViz, a Java-based tool that
allows for the visualization, exploration and downstream analyses of
conserved gene clusters across multiple genomes. GeneclusterViz
combines an easy-to-use exploration interface for gene clusters
with a host of other analysis features such as multiple sequence
alignments, phylogenetic analyses and integration with the KEGG
pathway database.
Availability:
http://biohealth.snu.ac.kr/GeneclusterViz/;
http://
microbial.informatics.indiana.edu/GeneclusterViz/
Contact: ;
Supplementary information: Supplementary data are available at
Bioinformatics online.
Received on November 2, 2011; revised on February 27, 2012;
accepted on April 3, 2012
1
INTRODUCTION
Recent advances in sequencing technology and ortholog detection
methods have given rise to several methods for the detection of gene
clusters conserved across two or more genomes (Calabrese et al.,
2003; Fujibuchi et al., 2000; Haas et al., 2004; Kim et al., 2007;
Yang and Sze, 2008; Zheng et al., 2005). However, gene cluster
information obtained from these programs has been very difficult
to analyze for bench scientists due to the lack of a more intuitive
analysis tool. The visualization of gene clusters in multiple genomes,
with respect to their spatial and functional features is a fundamental
problem in this area.
There are currently several visualization tools that have been
developed for a variety of applications in comparative genomics
∗ To whom correspondence should be addressed.
[reviewed in Nielsen et al. (2010)]. A subset of these have been
designed for the visualization of gene clusters. However, these tools
are limited in different ways. In some cases, only paired-genome
visualization is supported. Even in multi-genome visualization,
scalability is a challenge and, thus, limits the utility of web-based
tools. Moreover, web-based tools rely on server-hosted data and
cannot utilize data provided by users. This is true even in the
case of Absynte, a tool that was recently developed specifically for
the task of visualizing bacterial and archaeal clusters [Despalins
et al. (2011)]. Standalone applications overcome these drawbacks
but involve cumbersome installation procedures or database setups
that demand programming skills beyond that of most end-users.
Finally, existing gene cluster visualization tools are limited in their
support for further downstream analyses. We have implemented
GeneclusterViz, a robust and dynamic standalone tool that provides
a global and local view of gene clusters. Moreover, with a host
of flexible sequence and function analysis features, we believe
GeneclusterViz can be a very useful tool in comparative genomics.
2
IMPLEMENTATION
GeneclusterViz has been implemented in Java (JDK 1.6). For the
construction and depiction of phylogenetic trees, the Phylogenetic
Analysis library was used (Drummond and Strimmer, 2001). To
establish a server connection and to communicate with the back-end
CGI program, the Jakarta Commons HTTPClient Java library from
Apache Commons has been used. The server-side CGI programs and
wrappers that run CLUSTAL W (Thompson et al., 1994), HMMER
v2.3.2 (Eddy, 1998) and KEGG pathway searches have been written
in Perl.
3
FEATURES
The features of GeneclusterViz can be summarized into four broad
categories—input, visualization, exploration and analyses features.
Input: GeneclusterViz accepts output files from the EGGS [Kim
et al. (2007)] and PhyloEGGS [part of ISGA; Hemmerich et al.
(2010)] algorithms. These files are mainly tab-delimited plaintext formats that contain NC numbers and NCBI GI numbers as
identifiers for genomes and individual gene products, respectively.
© The Author 2012. Published by Oxford University Press. All rights reserved. For Permissions, please email:
1527
Associate Editor: John Quackenbush
V.R.Pejaver et al.
Fig. 1. Screenshots of the main and detailed views of GeneclusterViz for a dataset of eight alphaproteobacteria genomes. In the main view, clusters 13 (NADH
dehydrogenase complex cluster) and 21 (translation-related cluster) are close together in the Caulobacterales genomes but are separated in the Rhizobiales
genomes. They are even farther apart in the outgroup (Escherichia coli). This shows how separations between clusters can correlate with phylogeny—(A) cluster
list with position information, (B) viewing area, (C) viewing and exploration options and (D) connected genes within the clusters. The detailed view shows
the NADH dehydrogenase cluster and some analysis features—(E) cluster table, (F) viewing area, (G) viewing and export options, (H) phylogenetic tree for
selected gene, (I) browser connection to KEGG pathway with the selected gene highlighted, (J) strand ‘flip’ button and (K) gene selection (red)
1528
GeneclusterViz
GeneclusterViz also accepts gene family files in an in-house file
format (GFAM) that was developed to represent genes from common
COG families.
in the newly input genome, it is displayed in the detailed view, along
with the cluster in the other genomes for manual investigation.
Visualization: Users can easily zoom-in/out of cluster
visualizations with respect to both the X and Y -axes. The
‘Connect Clusters’ feature displays all the connections between
clusters across multiple genomes. Individual clusters can be
accessed by clicking on a cluster in the table-pane on the left of the
GeneclusterViz window. This highlights the particular cluster on
the genome. The ‘Connect Genes’ feature displays how individual
genes are connec (...truncated)