GeneViTo: Visualizing gene-product functional and structural features in genomic datasets (pdf)

Article PDF cannot be displayed. You can download it here:

https://bmcbioinformatics.biomedcentral.com/counter/pdf/10.1186/1471-2105-4-53

GeneViTo: Visualizing gene-product functional and structural features in genomic datasets

BMC Bioinformatics BioMed Central Software Open Access GeneViTo: Visualizing gene-product functional and structural features in genomic datasets Georgios S Vernikos†, Christos G Gkogkas†, Vasilis J Promponas and Stavros J Hamodrakas* Address: Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Panepistimiopolis, 15701, Athens, Greece Email: Georgios S Vernikos - ; Christos G Gkogkas - ; Vasilis J Promponas - ; Stavros J Hamodrakas* - * Corresponding author †Equal contributors Published: 31 October 2003 BMC Bioinformatics 2003, 4:53 Received: 22 July 2003 Accepted: 31 October 2003 This article is available from: http://www.biomedcentral.com/1471-2105/4/53 © 2003 Vernikos et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL. Abstract Background: The availability of increasing amounts of sequence data from completely sequenced genomes boosts the development of new computational methods for automated genome annotation and comparative genomics. Therefore, there is a need for tools that facilitate the visualization of raw data and results produced by bioinformatics analysis, providing new means for interactive genome exploration. Visual inspection can be used as a basis to assess the quality of various analysis algorithms and to aid in-depth genomic studies. Results: GeneViTo is a JAVA-based computer application that serves as a workbench for genomewide analysis through visual interaction. The application deals with various experimental information concerning both DNA and protein sequences (derived from public sequence databases or proprietary data sources) and meta-data obtained by various prediction algorithms, classification schemes or user-defined features. Interaction with a Graphical User Interface (GUI) allows easy extraction of genomic and proteomic data referring to the sequence itself, sequence features, or general structural and functional features. Emphasis is laid on the potential comparison between annotation and prediction data in order to offer a supplement to the provided information, especially in cases of "poor" annotation, or an evaluation of available predictions. Moreover, desired information can be output in high quality JPEG image files for further elaboration and scientific use. A compilation of properly formatted GeneViTo input data for demonstration is available to interested readers for two completely sequenced prokaryotes, Chlamydia trachomatis and Methanococcus jannaschii. Conclusions: GeneViTo offers an inspectional view of genomic functional elements, concerning data stemming both from database annotation and analysis tools for an overall analysis of existing genomes. The application is compatible with Linux or Windows ME-2000-XP operating systems, provided that the appropriate Java Runtime Environment is already installed in the system. Background The impressive progress in Molecular Biology, enhanced by the development of rapid genome sequencing technol- ogies, led to an exponential growth of the number of available DNA/protein sequences deposited in public databases. Between the early 90's, when the Human Page 1 of 15 (page number not for citation purposes) BMC Bioinformatics 2003, 4 Genome Project began, and 1996 the complete genome sequences of 5 unicellular organisms had been determined. By the time of this writing (September 2003) 160 genomes (including the Human Genome) have been completely sequenced, while 643 genome projects are still in progress [1,2]. On the other hand, the intensive research activity in the field of Bioinformatics generates a large amount of heterogeneous meta-data which, examined on a large-scale, demand further analysis in order to extract valuable biological information. DNA or protein sequence retrieval from specialized curated databases (GenBank [3], SWISS-PROT [4]) is quite effective, by the means of well-established tools, such as SRS [5] or Entrez [6]. Cross-references between entries from disseminated biological databases are abundant, helping for easy navigation over the World Wide Web, but are unable to offer an overview of the way sequence features are distributed in ordered sequence sets, such as complete genomes. Moreover, several bioinformatics analysis and prediction tools are available, either as web services or as standalone applications, attempting to give further insight to existing sequence information. These tools produce different output, according to the analysis type, and results representation is mainly oriented towards a per functional element basis. These analyses complement experimental data and guide further research activities. Once information concerning a genome is obtained (sequence, annotation and meta-data), an integration step is required in order to come to advanced biological conclusions. Such a task is time-consuming and painstaking, as long as data for hundreds/thousands of sequences are "thick-set" in structured text files. Furthermore, the monotonous machine-readable file format does not reveal at once features contained in a set of sequences in an intuitive way. It becomes quite clear, especially in cases of completely sequenced genomes, that organizing data in text files constitutes only a primordial level of presentation. Thus, a more sophisticated approach for easier, efficient, more productive and less chaotic representation is required. Data visualization, using specialized Computer Graphics Software, act as an intermediate link between raw data and the user for more effective and elaborate manipulation of numerous genomes. Such computational workbenches become even more useful when they incorporate, apart from the already deposited data, additional tools, making large-scale in silico experiments easier. Several powerful genome visualization tools are already available, mainly focused on features related to nucleotide http://www.biomedcentral.com/1471-2105/4/53 sequences: gff2ps [7], Artemis [8], SeqVista [9], NCBI Map Viewer [10], TIGR Genome browser [11], ENSEMBL project viewer [12], ERGO™ [13]. Each of these methods follows a different philosophy in the type of input data (e.g. sequences, maps, nucleotide sequence features), the accepted formats and the way that features are visualized. Our approach is mainly focused in presenting features related to gene products and their distribution along genomic regions. We have developed GeneViTo, a JAVA-based computer application to incorporate in a single depiction sequence features existing in annotation records from nucleotide and protein sequence databases (GenBank, SWISS-PROT) and prediction methods output (e.g. PRED-CLASS [14], PRED-TMR2 [15], orienTM [16], SIGNALP [17]). GeneViTo provides interfaces to additional analysis tools, as well as several search utilities, to easily manipulate a (...truncated)