GeneViTo: Visualizing gene-product functional and structural features in genomic datasets
BMC Bioinformatics
BioMed Central
Software
Open Access
GeneViTo: Visualizing gene-product functional and structural
features in genomic datasets
Georgios S Vernikos†, Christos G Gkogkas†, Vasilis J Promponas and
Stavros J Hamodrakas*
Address: Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Panepistimiopolis, 15701, Athens, Greece
Email: Georgios S Vernikos - ; Christos G Gkogkas - ; Vasilis J Promponas - ;
Stavros J Hamodrakas* -
* Corresponding author †Equal contributors
Published: 31 October 2003
BMC Bioinformatics 2003, 4:53
Received: 22 July 2003
Accepted: 31 October 2003
This article is available from: http://www.biomedcentral.com/1471-2105/4/53
© 2003 Vernikos et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all
media for any purpose, provided this notice is preserved along with the article's original URL.
Abstract
Background: The availability of increasing amounts of sequence data from completely sequenced
genomes boosts the development of new computational methods for automated genome
annotation and comparative genomics. Therefore, there is a need for tools that facilitate the
visualization of raw data and results produced by bioinformatics analysis, providing new means for
interactive genome exploration. Visual inspection can be used as a basis to assess the quality of
various analysis algorithms and to aid in-depth genomic studies.
Results: GeneViTo is a JAVA-based computer application that serves as a workbench for genomewide analysis through visual interaction. The application deals with various experimental
information concerning both DNA and protein sequences (derived from public sequence databases
or proprietary data sources) and meta-data obtained by various prediction algorithms, classification
schemes or user-defined features. Interaction with a Graphical User Interface (GUI) allows easy
extraction of genomic and proteomic data referring to the sequence itself, sequence features, or
general structural and functional features. Emphasis is laid on the potential comparison between
annotation and prediction data in order to offer a supplement to the provided information,
especially in cases of "poor" annotation, or an evaluation of available predictions. Moreover, desired
information can be output in high quality JPEG image files for further elaboration and scientific use.
A compilation of properly formatted GeneViTo input data for demonstration is available to
interested readers for two completely sequenced prokaryotes, Chlamydia trachomatis and
Methanococcus jannaschii.
Conclusions: GeneViTo offers an inspectional view of genomic functional elements, concerning
data stemming both from database annotation and analysis tools for an overall analysis of existing
genomes. The application is compatible with Linux or Windows ME-2000-XP operating systems,
provided that the appropriate Java Runtime Environment is already installed in the system.
Background
The impressive progress in Molecular Biology, enhanced
by the development of rapid genome sequencing technol-
ogies, led to an exponential growth of the number of
available DNA/protein sequences deposited in public
databases. Between the early 90's, when the Human
Page 1 of 15
(page number not for citation purposes)
BMC Bioinformatics 2003, 4
Genome Project began, and 1996 the complete genome
sequences of 5 unicellular organisms had been determined. By the time of this writing (September 2003) 160
genomes (including the Human Genome) have been
completely sequenced, while 643 genome projects are still
in progress [1,2]. On the other hand, the intensive
research activity in the field of Bioinformatics generates a
large amount of heterogeneous meta-data which, examined on a large-scale, demand further analysis in order to
extract valuable biological information.
DNA or protein sequence retrieval from specialized
curated databases (GenBank [3], SWISS-PROT [4]) is
quite effective, by the means of well-established tools,
such as SRS [5] or Entrez [6]. Cross-references between
entries from disseminated biological databases are abundant, helping for easy navigation over the World Wide
Web, but are unable to offer an overview of the way
sequence features are distributed in ordered sequence sets,
such as complete genomes.
Moreover, several bioinformatics analysis and prediction
tools are available, either as web services or as standalone
applications, attempting to give further insight to existing
sequence information. These tools produce different output, according to the analysis type, and results representation is mainly oriented towards a per functional element
basis. These analyses complement experimental data and
guide further research activities.
Once information concerning a genome is obtained
(sequence, annotation and meta-data), an integration
step is required in order to come to advanced biological
conclusions. Such a task is time-consuming and painstaking, as long as data for hundreds/thousands of sequences
are "thick-set" in structured text files. Furthermore, the
monotonous machine-readable file format does not
reveal at once features contained in a set of sequences in
an intuitive way. It becomes quite clear, especially in cases
of completely sequenced genomes, that organizing data in
text files constitutes only a primordial level of presentation. Thus, a more sophisticated approach for easier, efficient, more productive and less chaotic representation is
required.
Data visualization, using specialized Computer Graphics
Software, act as an intermediate link between raw data
and the user for more effective and elaborate manipulation of numerous genomes. Such computational workbenches become even more useful when they incorporate,
apart from the already deposited data, additional tools,
making large-scale in silico experiments easier.
Several powerful genome visualization tools are already
available, mainly focused on features related to nucleotide
http://www.biomedcentral.com/1471-2105/4/53
sequences: gff2ps [7], Artemis [8], SeqVista [9], NCBI Map
Viewer [10], TIGR Genome browser [11], ENSEMBL
project viewer [12], ERGO™ [13]. Each of these methods
follows a different philosophy in the type of input data
(e.g. sequences, maps, nucleotide sequence features), the
accepted formats and the way that features are visualized.
Our approach is mainly focused in presenting features
related to gene products and their distribution along
genomic regions.
We have developed GeneViTo, a JAVA-based computer
application to incorporate in a single depiction sequence
features existing in annotation records from nucleotide
and protein sequence databases (GenBank, SWISS-PROT)
and prediction methods output (e.g. PRED-CLASS [14],
PRED-TMR2 [15], orienTM [16], SIGNALP [17]).
GeneViTo provides interfaces to additional analysis tools,
as well as several search utilities, to easily manipulate a (...truncated)