Organelle DB: an updated resource of eukaryotic protein localization and function

Nucleic Acids Research, Jan 2007

Organelle DB (http://organelledb.lsi.umich.edu) is a web-accessible relational database presenting a supplemented catalog of organelle-localized proteins and major protein complexes. Since its release in 2004, Organelle DB has grown by 20% to encompass over 30 000 proteins from 138 eukaryotic organisms. Each protein in Organelle DB is presented with its subcellular localization, primary sequence and a detailed description of its function, as available. All records in Organelle DB have been annotated using controlled vocabulary from the Gene Ontology consortium. Protein localization data are inherently visual, and Organelle DB is a significant repository of biological images, housing 1500 micrographs of yeast cells carrying stained proteins. Furthermore, we report here the development of Organelle View, an extension of Organelle DB for the interactive visualization of organelles and subcellular structures in the budding yeast Saccharomyces cerevisiae. Organelle View offers a dimensional representation of a yeast cell; users can search Organelle View for proteins of interest, and the organelles housing these proteins will be highlighted in the cell image. Among other applications, Organelle View may serve as an educational aid engaging introductory biology students through a visually ‘fun’ interface. Organelle View can be accessed from the Organelle DB home page or directly at http://organelleview.lsi.umich.edu.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://nar.oxfordjournals.org/content/35/suppl_1/D810.full.pdf

Organelle DB: an updated resource of eukaryotic protein localization and function

Nuwee Wiwatwattana 0 Christopher M. Landau 0 G. Jamie Cope 0 Gabriel A. Harp 0 Anuj Kumar 0 0 Department of Molecular, Cellular, and Developmental Biology and Life Sciences Institute, University of Michigan , Ann Arbor, MI 48109-2216, USA Organelle DB (http://organelledb.lsi.umich.edu) is a web-accessible relational database presenting a supplemented catalog of organelle-localized proteins and major protein complexes. Since its release in 2004, Organelle DB has grown by 20% to encompass over 30 000 proteins from 138 eukaryotic organisms. Each protein in Organelle DB is presented with its subcellular localization, primary sequence and a detailed description of its function, as available. All records in Organelle DB have been annotated using controlled vocabulary from the Gene Ontology consortium. Protein localization data are inherently visual, and Organelle DB is a significant repository of biological images, housing 1500 micrographs of yeast cells carrying stained proteins. Furthermore, we report here the development of Organelle View, an extension of Organelle DB for the interactive visualization of organelles and subcellular structures in the budding yeast Saccharomyces cerevisiae. Organelle View offers a dimensional representation of a yeast cell; users can search Organelle View for proteins of interest, and the organelles housing these proteins will be highlighted in the cell image. Among other applications, Organelle View may serve as an educational aid engaging introductory biology students through a visually 'fun' interface. Organelle View can be accessed from the Organelle DB home page or directly at http://organelleview.lsi.umich.edu. - Since its inception in 2004, Organelle DB has provided a freely accessible information resource cataloging eukaryotic proteins that are known components of an organelle or major protein complex (1). Organelle DB presents a list of proteins organized essentially by subcellular localization and/or by organism. Each protein record housed within Organelle DB presents systematic and common gene/protein names, gene descriptions, phenotypic information (as available), biological terms from the Gene Ontology (GO) consortium, amino acid sequence and, in some cases, micrograph images (Figure 1A). To facilitate data interoperability, we have taken care to describe all protein localizations using the controlled vocabulary established by the GO consortium. In total, Organelle DB encompasses 60 GO localization terms; these terms have been described previously (1). Organelle DB has been populated in two ways. First, we have extracted protein localization data from each major model organism database [i.e. the Saccharomyces Genome Database SGD (2), the Drosophila melanogaster database FlyBase (3), the Caenorhabditis elegans database WormBase (4), the Mouse Genome Database MGD (5) and the Arabidopsis Information Resource TAIR (6)]. Localization data for human proteins and for other proteins outside of the standard model organisms have been extracted from SWISS-PROT (7) and from GO (8). Second, we have manually compiled protein localization data from large-scale and systematic studies in the budding yeast Saccharomyces cerevisiae (911) in supplement to localization data deposited in SGD. Since localization data have been drawn from several databases and studies, the particular source of a given protein localization record is indicated within each protein data report in the Data Source field. By maintaining updated localization data from these sources, we have grown Organelle DB to encompass over 31 000 proteins spanning 138 organisms across the eukaryotic kingdom. Numerical listings of protein localizations for major organisms of study are presented in Table 1. Note that we now present these data tallies by specific organism rather than by broad organismal groupings (e.g. Arabidopsis thaliana rather than plants). VISUALIZING PROTEIN LOCALIZATION DATA Although localization data are inherently visual, text-based representations of protein localization can be difficult to understand in some cases. For example, in S.cerevisiae, a number of polarity proteins have been found localized to the bud tip; however, this descriptive term is likely unfamiliar and difficult to visualize for any researcher working with an organism other than yeast. From a simple micrograph of yeast cells, the bud tip localization becomes clearly understood as the extreme tip of the growing bud at the opposite end of the larger, so-called mother cell. Moreover, the subtleties of a given protein localization can be lost in a simple text-based Subcellular localizations Nucleus Mitochondria Note that records do not correspond exactly with proteins; one protein may have more than one record if it has been found within more than one organelle. classification scheme: cytoplasmic staining can be patchy (possibly cytoskeletal) or diffuse (from a soluble protein), with very different implications regarding protein function in each instance (9). Thus, often, the subcellular distribution of a protein is best considered by viewing an image of a cell in which the protein of interest is visualized either as a fusion to a fluorescent protein (10) or by indirect immunofluorescence staining (9). We have included this type of primary localization data in Organelle DB whenever possible; specifically, Organelle DB presents 1500 fluorescent micrographs of yeast cells visualized with antibodies directed against epitope-tagged proteins (indirect immunofluorescence) from our own studies of protein localization in S.cerevisiae (9,11). In addition, we welcome submissions from the scientific community of any such images for any protein reported in Organelle DB. To further facilitate the visualization of protein localization data, we have developed an extension of Organelle DB called Organelle View. Organelle View is a scientific visualization application allowing users to dynamically generate a visual interpretation of data from Organelle DB. Organelle View presents a searchable interface with a three-dimensional representation of an archetypical cell (Figure 1B). Rather than representing organelles and subcellular structures by text, Organelle View offers an artists rendering of a cell and its major organelles. At present, we have chosen a budding yeast cell (S.cerevisiae) as the model for Organelle View, largely because protein localization has been studied quite extensively in yeast (9,10); future versions of Organelle View will incorporate additional cell types from other organisms. Users can search Organelle View for any yeast protein, and the organelle to which that protein localizes will be highlighted in the cell image. An additional text-based summary of gene function is also presented for each searched protein. Organelle View, therefore, offers an alternative mode of presentation for the information housed in Organelle DB; it also stands as a useful educational tool, providing an easily accessible and engaging platform from which introductory biology students can explore the basics of cell biology. DESIGN AND IMPLEMENTATION Organelle DB was developed using the PHP server-side scripting language version 4.3.9 on a Linux server running the MySQL database version 5.0.18. We populated the most recent protein localization data from the GO database and major model organism databases [the databases described above plus the Rat Genome Database RGD (12), the Dictyostelium Database dictyBase (13) and the Zebrafish Information Network ZFIN (14)]. The scripts we implemented were configured to automatically add new genes, delete obsolete genes and update the gene information obtained from each of the source databases. We also developed a facility to add/delete/edit a particular gene per curator request. The size of our current database is 324 MB. The Organelle View application is a web-based Java applet. This applet interfaces with the existing database Organelle DB and renders a three-dimensional model of a cell with accompanying text and dynamic functionality. The rendering code was provided by the program WireFusion (Demicron). All functionality code is written in Java and JavaScript and is provided by Nformation Design (Philadelphia, PA). All buttons and text areas outside of the applet were created using the PHP and HTML languages. Models for Organelle View were created using the open-source three-dimensional modeling program Blender. The Organelle View applet requires Java 1.4.1 to function correctly. USING ORGANELLE DB Organelle DB is fully searchable and presents users with a variety of options for convenient data access and retrieval. From the Organelle DB home page, users may specifically search for proteins localized to a given organelle, subcellular structure or protein complex. Additional options are provided in the Quick Search form such that users may alternatively browse records related to a single organism or gene/protein. The Quick Search form on the Organelle DB home page provides six broad protein localization groupings as follows: endoplasmic reticulum (ER), nucleus, membrane protein, mitochondrion, protein complex and others. Detailed subcategories of organelles, protein complexes and organisms may be directly accessed from our Advanced Search forms (on the Search page at Organelle DB). These Advanced Search options offer a full list of organelles and organisms contained within Organelle DB; for example, through our Advanced Search, users may select an organelle (e.g. endoplasmic reticulum) and further select a subcategory of that organelle (e.g. integral to endoplasmic reticulum membrane). In addition, users may specify an organelle and organism, thereby limiting output to only those organelle-localized proteins from the indicated organism. Search results are presented as a list, with protein names and a brief description of each protein indicated. By clicking on a protein name, users are taken to a full protein report (Figure 1A) containing the genes systematic name and standard/common name, gene description including phenotypic information as available, GO classifications, amino acid sequence and any captured images supporting the reported protein localization (available for some yeast proteins in Organelle DB). We have taken particular care to maintain proper nomenclature for a given organism in presenting gene names. In cases where multiple isoforms of a given protein are reported, the amino acid sequence of each isoform is presented in Organelle DB. As an alternative to individual search queries, users may download datasets from Organelle DB in bulk. Specifically, all data in Organelle DB may be downloaded as tab-delimited text files. In total, we offer three such files. Protein localization records from Organelle DB may be downloaded in a single file. GO annotations for each protein presented in Organelle DB are provided in a separate file. A third file provides amino acid sequences in the FASTA format for all protein entries. Multiple sequences are available for certain proteins in Organelle DB; these protein sequences can be correlated to a single protein entry in the tab-delimited text file described above through the Accession ID field. USING ORGANELLE VIEW Like Organelle DB, Organelle View is fully searchable. Users can enter up to four proteins in the Quick Search form to the left of our home page (Figure 1B). The proteins with localizations will be displayed in text to the right of the search form boxes, color-coded as indicated (the first protein name and localization printed in red, etc.). The corresponding organelle for each protein will also be colored accordingly in the cell image and can be highlighted by rolling over the protein name and its localization. The cell image provided in Organelle View is an artists rendering of a budding yeast cell; descriptive terms and organelle names related to this image are presented in the Localization Key at the bottom left of our home page. A brief description of each cellular landmark and/or organelle is provided here; the text may be viewed by scrolling over the desired Localization Key image. The cell image can be manipulated by the cursor; e.g. the cell image can be rotated by clicking/dragging the image. Also, by clicking with the right mouse button, users can zoom in and out of the cell. Organelle View also provides much of the protein function information presented in Organelle DB. Users can view summary information regarding the function of any selected protein by clicking on the proteins corresponding number in the Information box. The resulting text display presents systematic and standard gene/protein names, a brief functional description of the protein and any comments related to the proteins function or localization. The color scheme described above can also be animated if multiple proteins sharing a common localization are entered into Organelle View. By this feature, the organelle common to both proteins will shift in color in the cell image, transitioning, e.g. from red (for Protein 1) to orange (for Protein 2). This automatic color shift can be toggled on/off by clicking the Animate button to the lower right of the cell image. A complete tutorial describing the use of Organelle View may be accessed on-line by clicking the About/How To button from the Organelle View home page. APPLICATIONS AND SIGNIFICANCE Organelle DB is a cross-species information resource for researchers utilizing nearly any eukaryotic organism of study. Data from 138 organisms are encompassed in Organelle DB. In particular, we are taking significant care to ensure that Organelle DB is fully integrated with major model organism sites and relevant external databases. Each protein report in Organelle DB is linked to the appropriate external database (i.e. the model organism sites SGD, TAIR, MGD, FlyBase, WormBase or the protein database SWISS-PROT). Thus, users can quickly drill deeper into specific proteins of interest. Furthermore, we have maintained a controlled vocabulary as much as possible in annotating Organelle DB in order to ensure significant data interoperability. We have carefully utilized proper gene/protein names for each record in Organelle DB; whenever possible, we have drawn protein names from the appropriate model organism sites in order to comply with all naming conventions for each respective organism. Users can then easily navigate between our information and relevant information in other external sites. Collectively, the data in Organelle DB may be used by researchers in a broad swath of disciplines ranging from evolutionary biology to molecular/cellular biology and genomics/bioinformatics. Through the datasets in Organelle DB, evolutionary biologists will be able to consider organelle evolution through the eukaryota, particularly as we generate more complete datasets of eukaryotic protein localization. By integrating localization data with proteinprotein interaction data, molecular, cellular and developmental biologists can ferret out higher-confidence subsets of protein interactions for further study. Researchers applying genomics and bioinformatics can use the data in Organelle DB to investigate the organelle as a functional unit, profiling and cataloging the dynamics of the organelle (i.e. its known constituent proteins) in response to cell growth and cell stress. Thus, Organelle DB is a cross-species and crossdiscipline resource of general interest to the greater scientific community. In its present form, Organelle View is a first step toward the development of non-text-based resources for the presentation of protein localization data. In addition, we view it as a particularly useful resource in the instruction of younger students (e.g. high school biology students and college undergraduates), introducing them to complicated concepts in cellular and molecular biology through an interface that is visually arresting and fun. FUTURE DIRECTIONS As part of our ongoing maintenance of Organelle DB, we intend to update protein localization records, by placing a particular emphasis upon proteins differentially localized during cell development and/or during cell stress responses. We also plan to modify Organelle View. We envision Organelle View as a true complement to Organelle DBan interface for the graphical visualization of proteins within organelles and protein complexes both in a static and dynamic model of the cell. Thus, we are currently working to generate a dynamic representation of the yeast cell cycle, such that differentially localized proteins can be represented during cell cycle progression. The current prototype for Organelle View presents a budding yeast cell, but once this platform is well established, we expect to develop similar threedimensional models for other organisms as well, providing both a broader range of cell types as well as a more detailed cell view with finer subcellular resolution. DATABASE ACCESS Organelle DB may be accessed freely at http://organelledb. lsi.umich.edu through the Life Sciences Institute at the University of Michigan. User support may be obtained from Organelle DB by contacting . Please direct all technical questions and concerns to this address as well. When referencing Organelle DB and/or Organelle View, please cite this article. ACKNOWLEDGEMENTS We thank Hosagrahar Jagadish for his assistance in establishing this project and Harry Caul for expert data monitoring and recording. This work was supported by NSF grant DBI0543017, American Cancer Society Research Scholar grant RSG-06-179-01-MBC and March of Dimes Basil OConnor Starter Scholar Research Award 5-FY05-1224 (to A.K.) Funding to pay the Open Access publication charges for this article was provided by grant DBI 0543017 from the National Science Foundation. Conflict of interest statement. None declared.


This is a preview of a remote PDF: https://nar.oxfordjournals.org/content/35/suppl_1/D810.full.pdf

Nuwee Wiwatwattana, Christopher M. Landau, G. Jamie Cope, Gabriel A. Harp, Anuj Kumar. Organelle DB: an updated resource of eukaryotic protein localization and function, Nucleic Acids Research, 2007, D810-D814, DOI: 10.1093/nar/gkl1000