GeneCards Version 3: the human gene integrator

Jan 2010

GeneCards (www.genecards.org) is a comprehensive, authoritative compendium of annotative information about human genes, widely used for nearly 15 years. Its gene-centric content is automatically mined and integrated from over 80 digital sources, resulting in a web-based deep-linked card for each of >73 000 human gene entries, encompassing the following categories: protein coding, pseudogene, RNA gene, genetic locus, cluster and uncategorized. We now introduce GeneCards Version 3, featuring a speedy and sophisticated search engine and a revamped, technologically enabling infrastructure, catering to the expanding needs of biomedical researchers. A key focus is on gene-set analyses, which leverage GeneCards’ unique wealth of combinatorial annotations. These include the GeneALaCart batch query facility, which tabulates user-selected annotations for multiple genes and GeneDecks, which identifies similar genes with shared annotations, and finds set-shared annotations by descriptor enrichment analysis. Such set-centric features address a host of applications, including microarray data analysis, cross-database annotation mapping and gene-disorder associations for drug targeting. We highlight the new Version 3 database architecture, its multi-faceted search engine, and its semi-automated quality assurance system. Data enhancements include an expanded visualization of gene expression patterns in normal and cancer tissues, an integrated alternative splicing pattern display, and augmented multi-source SNPs and pathways sections. GeneCards now provides direct links to gene-related research reagents such as antibodies, recombinant proteins, DNA clones and inhibitory RNAs and features gene-related drugs and compounds lists. We also portray the GeneCards Inferred Functionality Score annotation landscape tool for scoring a gene’s functional information status. Finally, we delineate examples of applications and collaborations that have benefited from the GeneCards suite. Database URL: www.genecards.org

Article PDF cannot be displayed. You can download it here:

https://database.oxfordjournals.org/content/2010/baq020.full.pdf

GeneCards Version 3: the human gene integrator

Marilyn Safran 1 2 Irina Dalah 2 Justin Alexander 2 Naomi Rosen 2 Tsippi Iny Stein 2 Michael Shmoish 0 2 Noam Nativ 2 Iris Bahir 2 Tirza Doniger 2 Hagit Krug 2 Alexandra Sirota-Madi 2 4 Tsviya Olender 2 Yaron Golan 3 Gil Stelzer 2 Arye Harel 2 Doron Lancet 2 0 Bioinformatics Knowledge Unit, Lorry I. Lokey Interdisciplinary Center for Life Sciences and Engineering, Technion - Israel Institute of Technology , Haifa, Israel 1 Department of Biological Services, Weizmann Institute of Science , Rehovot, Israel 2 Department of Molecular Genetics 3 Xennex Inc, Cambridge, MA, USA 4 The Sackler School of Medicine, Tel Aviv University , Tel Aviv, Israel GeneCards (www.genecards.org) is a comprehensive, authoritative compendium of annotative information about human genes, widely used for nearly 15 years. Its gene-centric content is automatically mined and integrated from over 80 digital sources, resulting in a web-based deep-linked card for each of >73 000 human gene entries, encompassing the following categories: protein coding, pseudogene, RNA gene, genetic locus, cluster and uncategorized. We now introduce GeneCards Version 3, featuring a speedy and sophisticated search engine and a revamped, technologically enabling infrastructure, catering to the expanding needs of biomedical researchers. A key focus is on gene-set analyses, which leverage GeneCards' unique wealth of combinatorial annotations. These include the GeneALaCart batch query facility, which tabulates user-selected annotations for multiple genes and GeneDecks, which identifies similar genes with shared annotations, and finds set-shared annotations by descriptor enrichment analysis. Such set-centric features address a host of applications, including microarray data analysis, cross-database annotation mapping and gene-disorder associations for drug targeting. We highlight the new Version 3 database architecture, its multi-faceted search engine, and its semi-automated quality assurance system. Data enhancements include an expanded visualization of gene expression patterns in normal and cancer tissues, an integrated alternative splicing pattern display, and augmented multi-source SNPs and pathways sections. GeneCards now provides direct links to gene-related research reagents such as antibodies, recombinant proteins, DNA clones and inhibitory RNAs and features gene-related drugs and compounds lists. We also portray the GeneCards Inferred Functionality Score annotation landscape tool for scoring a gene's functional information status. Finally, we delineate examples of applications and collaborations that have benefited from the GeneCards suite. Database URL: www.genecards.org Introduction With the recent accumulation of data from worldwide genome projects, the individual scientist faces the time consuming and laborious task of sifting through the expanding labyrinth of gene information. This can be partly alleviated by the use of sophisticated integrated and searchable databases. For many years, GeneCards (www.genecards.org) (13) has provided such a remedy, with carefully selected, comprehensive information about human genes, mined and integrated from over 80 data sources. By bringing together gene information from large public sources such as HGNC (4), NCBI (5), ENSEMBL (6) and UniProtKB (7), as well as many other smaller resources (8), GeneCards has provided concise genome, proteome, transcriptome, disease and function data on all known and predicted human genes. It has successfully overcome barriers of data format heterogeneity using standard nomenclature, especially HUGO nomenclature committee approved gene symbols (4). The information is organized in a card format for each gene, in distinct functional sections and including a variety of features such as textual summaries and links to other genome-wide and specialized databases. GeneCards has evolved significantly since initially described (1,9,10), and its progress is documented in a number of past publications (2,3,1115). In this article, we introduce the new GeneCards Version 3 (V3) and describe its features in detail. We place special emphasis on the novel set-centric capabilities (beyond and in conjunction with the new GeneCards search engine), which address a variety of applications, including microarray data analysis, cross-database annotation mapping and gene-disorder associations for drug targeting. Readers who are new to GeneCards might want to read the Applications section below first, familiarize themselves with previous articles (13), and then read the rest of this article, possibly skipping the Methods section. GeneCards version 3 The new home page The new GeneCards V3 home page, shown in Figure 1, hosts the new search facility, provides links to a sample gene and its various sections on the card via labeled oval buttons, and enables one to view a variety of differently categorized and annotated genes, from pre-defined links as well as by interacting with a random-gene generator, customizable by category and/or GeneCards Inferred Functionality Score (GIFtS). The GIFtS algorithm (11) uses the wealth of GeneCards annotations to produce annotation scores aimed at predicting the degree of a genes functionality. Since the degree of known functionality is correlated with the amount of research done on a particular gene or its product, these annotation scores are presented as inferred functionality measures. The extended GIFtS tool, linked to from the home page, facilitates browsing the human genome by searching for the annotation level of a specified gene, retrieving a list of genes within a specified range of GIFtS values, obtaining random genes with a specific GIFtS value, and experimenting with the GIFtS weighting algorithm for a variety of annotation categories. The left hand side of the home page retains the logos and links to the GeneCards suites sitesGeneDecks, GeneALaCart, GeneLoc, GeneNote, GeneAnnot and GeneTide. The new search engine The new version 3 search engine is extremely fast, and is capable of matching complex field-specific queries of the entire database in milliseconds. For example, a search for a very common keyword like cancer returns 8000 results in 3 ms. In contrast, V2 could not handle such a query, or even a more focused one such as melanoma (too many results to be efficiently displayed); a considerably more restricted search in V2 such as schizophrenia yielded 1100 results and took 80 s. Efficient V3 performance is achieved by breaking the search process into distinct phases, and also by returning results in limited pages of data. The two primary stages of each search are: (i) to first quickly identify the list of genes that have information matching the search term, and (ii) upon demand, discover the detailed relevant context and annotation details of those hits, and highlight them in minicards (Figure 2). The Methods section details the design of the new search engine. The upgraded GeneCards webcard The card presented for (...truncated)


This is a preview of a remote PDF: https://database.oxfordjournals.org/content/2010/baq020.full.pdf
Article home page: http://database.oxfordjournals.org/content/2010/baq020.abstract

Marilyn Safran, Irina Dalah, Justin Alexander, Naomi Rosen, Tsippi Iny Stein, Michael Shmoish, Noam Nativ, Iris Bahir, Tirza Doniger, Hagit Krug, Alexandra Sirota-Madi, Tsviya Olender, Yaron Golan, Gil Stelzer, Arye Harel, Doron Lancet. GeneCards Version 3: the human gene integrator, 2010, 2010, DOI: 10.1093/database/baq020