GeneclusterViz: a tool for conserved gene cluster visualization, exploration and analysis

Jun 2012

Motivation: Gene clusters are arrangements of functionally related genes on a chromosome. In bacteria, it is expected that evolutionary pressures would conserve these arrangements due to the functional advantages they provide. Visualization of conserved gene clusters across multiple genomes provides key insights into their evolutionary histories. Therefore, a software tool that enables visualization and functional analyses of gene clusters would be a great asset to the biological research community. Results: We have developed GeneclusterViz, a Java-based tool that allows for the visualization, exploration and downstream analyses of conserved gene clusters across multiple genomes. GeneclusterViz combines an easy-to-use exploration interface for gene clusters with a host of other analysis features such as multiple sequence alignments, phylogenetic analyses and integration with the KEGG pathway database. Availability: http://biohealth.snu.ac.kr/GeneclusterViz/; http://microbial.informatics.indiana.edu/GeneclusterViz/ Contact: sunkim.bioinfo{at}snu.ac.kr; ybrun{at}indiana.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Article PDF cannot be displayed. You can download it here:

https://bioinformatics.oxfordjournals.org/content/28/11/1527.full.pdf

GeneclusterViz: a tool for conserved gene cluster visualization, exploration and analysis

BIOINFORMATICS APPLICATIONS NOTE Genome analysis Vol. 28 no. 11 2012, pages 1527–1529 doi:10.1093/bioinformatics/bts177 Advance Access publication April 11, 2012 GeneclusterViz: a tool for conserved gene cluster visualization, exploration and analysis Vikas R. Pejaver1 , Jaehyun An2 , SungMin Rhee2 Ankita Bhan3 , Jeong-Hyeon Choi4 , Boshu Liu5 , Heewook Lee1 , Pamela J. Brown6 , David Kysela6 , Yves V. Brun6,∗ and Sun Kim2,∗ 1 School of Informatics and Computing, Indiana University Bloomington, IN 47404, USA, 2 School of Computer Science and Engineering, Bioinformatics Institute, Seoul National University, Seoul, Korea, 3 Abbott Laboratories, Chicago, IL, 4 Cancer Center and Biostatistics, Georgia Health Sciences University, Augusta, GA 30912, 5 Center for Genomics and Bioinformatics, Indiana University Bloomington, IN 47404, 6 Department of Biology, Indiana University Bloomington, IN 47405-3700, USA ABSTRACT Motivation: Gene clusters are arrangements of functionally related genes on a chromosome. In bacteria, it is expected that evolutionary pressures would conserve these arrangements due to the functional advantages they provide. Visualization of conserved gene clusters across multiple genomes provides key insights into their evolutionary histories. Therefore, a software tool that enables visualization and functional analyses of gene clusters would be a great asset to the biological research community. Results: We have developed GeneclusterViz, a Java-based tool that allows for the visualization, exploration and downstream analyses of conserved gene clusters across multiple genomes. GeneclusterViz combines an easy-to-use exploration interface for gene clusters with a host of other analysis features such as multiple sequence alignments, phylogenetic analyses and integration with the KEGG pathway database. Availability: http://biohealth.snu.ac.kr/GeneclusterViz/; http:// microbial.informatics.indiana.edu/GeneclusterViz/ Contact: ; Supplementary information: Supplementary data are available at Bioinformatics online. Received on November 2, 2011; revised on February 27, 2012; accepted on April 3, 2012 1 INTRODUCTION Recent advances in sequencing technology and ortholog detection methods have given rise to several methods for the detection of gene clusters conserved across two or more genomes (Calabrese et al., 2003; Fujibuchi et al., 2000; Haas et al., 2004; Kim et al., 2007; Yang and Sze, 2008; Zheng et al., 2005). However, gene cluster information obtained from these programs has been very difficult to analyze for bench scientists due to the lack of a more intuitive analysis tool. The visualization of gene clusters in multiple genomes, with respect to their spatial and functional features is a fundamental problem in this area. There are currently several visualization tools that have been developed for a variety of applications in comparative genomics ∗ To whom correspondence should be addressed. [reviewed in Nielsen et al. (2010)]. A subset of these have been designed for the visualization of gene clusters. However, these tools are limited in different ways. In some cases, only paired-genome visualization is supported. Even in multi-genome visualization, scalability is a challenge and, thus, limits the utility of web-based tools. Moreover, web-based tools rely on server-hosted data and cannot utilize data provided by users. This is true even in the case of Absynte, a tool that was recently developed specifically for the task of visualizing bacterial and archaeal clusters [Despalins et al. (2011)]. Standalone applications overcome these drawbacks but involve cumbersome installation procedures or database setups that demand programming skills beyond that of most end-users. Finally, existing gene cluster visualization tools are limited in their support for further downstream analyses. We have implemented GeneclusterViz, a robust and dynamic standalone tool that provides a global and local view of gene clusters. Moreover, with a host of flexible sequence and function analysis features, we believe GeneclusterViz can be a very useful tool in comparative genomics. 2 IMPLEMENTATION GeneclusterViz has been implemented in Java (JDK 1.6). For the construction and depiction of phylogenetic trees, the Phylogenetic Analysis library was used (Drummond and Strimmer, 2001). To establish a server connection and to communicate with the back-end CGI program, the Jakarta Commons HTTPClient Java library from Apache Commons has been used. The server-side CGI programs and wrappers that run CLUSTAL W (Thompson et al., 1994), HMMER v2.3.2 (Eddy, 1998) and KEGG pathway searches have been written in Perl. 3 FEATURES The features of GeneclusterViz can be summarized into four broad categories—input, visualization, exploration and analyses features. Input: GeneclusterViz accepts output files from the EGGS [Kim et al. (2007)] and PhyloEGGS [part of ISGA; Hemmerich et al. (2010)] algorithms. These files are mainly tab-delimited plaintext formats that contain NC numbers and NCBI GI numbers as identifiers for genomes and individual gene products, respectively. © The Author 2012. Published by Oxford University Press. All rights reserved. For Permissions, please email: 1527 Associate Editor: John Quackenbush V.R.Pejaver et al. Fig. 1. Screenshots of the main and detailed views of GeneclusterViz for a dataset of eight alphaproteobacteria genomes. In the main view, clusters 13 (NADH dehydrogenase complex cluster) and 21 (translation-related cluster) are close together in the Caulobacterales genomes but are separated in the Rhizobiales genomes. They are even farther apart in the outgroup (Escherichia coli). This shows how separations between clusters can correlate with phylogeny—(A) cluster list with position information, (B) viewing area, (C) viewing and exploration options and (D) connected genes within the clusters. The detailed view shows the NADH dehydrogenase cluster and some analysis features—(E) cluster table, (F) viewing area, (G) viewing and export options, (H) phylogenetic tree for selected gene, (I) browser connection to KEGG pathway with the selected gene highlighted, (J) strand ‘flip’ button and (K) gene selection (red) 1528 GeneclusterViz GeneclusterViz also accepts gene family files in an in-house file format (GFAM) that was developed to represent genes from common COG families. in the newly input genome, it is displayed in the detailed view, along with the cluster in the other genomes for manual investigation. Visualization: Users can easily zoom-in/out of cluster visualizations with respect to both the X and Y -axes. The ‘Connect Clusters’ feature displays all the connections between clusters across multiple genomes. Individual clusters can be accessed by clicking on a cluster in the table-pane on the left of the GeneclusterViz window. This highlights the particular cluster on the genome. The ‘Connect Genes’ feature displays how individual genes are connec (...truncated)


This is a preview of a remote PDF: https://bioinformatics.oxfordjournals.org/content/28/11/1527.full.pdf
Article home page: http://bioinformatics.oxfordjournals.org/content/28/11/1527.abstract

Vikas R. Pejaver, Jaehyun An, SungMin Rhee, Ankita Bhan, Jeong-Hyeon Choi, Boshu Liu, Heewook Lee, Pamela J. Brown, David Kysela, Yves V. Brun, Sun Kim. GeneclusterViz: a tool for conserved gene cluster visualization, exploration and analysis, 2012, pp. 1527-1529, 28/11, DOI: 10.1093/bioinformatics/bts177