CEBS object model for systems biology data, SysBio-OM (pdf)

Article PDF cannot be displayed. You can download it here:

https://academic.oup.com/bioinformatics/article-pdf/20/13/2004/48905888/bioinformatics_20_13_2004.pdf

CEBS object model for systems biology data, SysBio-OM

Vol. 20 no. 13 2004, pages 2004–2015 doi:10.1093/bioinformatics/bth189 BIOINFORMATICS CEBS object model for systems biology data, SysBio-OM Sandhya Xirasagar1,†, ∗, Scott Gustafson1,† , B. Alex Merrick2 , Kenneth B. Tomer2 , Stanley Stasiewicz2 , Denny D. Chan1 , Kenneth J. Yost III1 , John R. Yates III3 , Susan Sumner4 , Nianqing Xiao1 and Michael D. Waters2 Applications International Corporation, 20201 Century Building, 3rd Floor, Germantown, MD 20874, USA, 2 NIEHS, National Center for Toxicogenomics, P.O. Box 12233, Research Triangle Park, NC 27709, USA, 3 Department of Cell Biology, The Scripps Research Institute, 10550 North Torrey Pine Road, La Jolla, CA 92037, USA and 4 Paradigm Genetics, Inc., 108 TW Alexander Drive, P.O. Box 14528, Research Triangle Park, NC 27709, USA Received on September 24, 2003; revised on January 14, 2004; accepted on February 25, 2004 Advance Access publication March 25, 2004 ABSTRACT Motivation: To promote a systems biology approach to understanding the biological effects of environmental stressors, the Chemical Effects in Biological Systems (CEBS) knowledge base is being developed to house data from multiple complex data streams in a systems friendly manner that will accommodate extensive querying from users. Unified data representation via a single object model will greatly aid in integrating data storage and management, and facilitate reuse of software to analyze and display data resulting from diverse differential expression or differential profile technologies. Data streams include, but are not limited to, gene expression analysis (transcriptomics), protein expression and protein–protein interaction analysis (proteomics) and changes in low molecular weight metabolite levels (metabolomics). Results: To enable the integration of microarray gene expression, proteomics and metabolomics data in the CEBS system, we designed an object model, Systems Biology Object Model (SysBio-OM). The model is comprehensive and leverages other open source efforts, namely the MicroArray Gene Expression Object Model (MAGE-OM) and the Proteomics Experiment Data Repository (PEDRo) object model. SysBio-OM is designed by extending MAGE-OM to represent protein expression data elements (including those from PEDRo), protein–protein interaction and metabolomics data. SysBio-OM promotes the standardization of data representation and data quality by facilitating the capture of the minimum annotation required for an experiment. Such standardization refines the accuracy of data mining and interpretation. The ∗ To whom correspondence should be addressed. † The authors wish it to be known that, in their opinion, these two authors should be regarded as joint First Authors. 2004 open source SysBio-OM model, which can be implemented on varied computing platforms is presented here. Availability: A universal modeling language depiction of the entire SysBio-OM is available at http://cebs.niehs.nih.gov/ SysBioOM/. The Rational Rose object model package is distributed under an open source license that permits unrestricted academic and commercial use and is available at http://cebs.niehs.nih.gov/cebsdownloads. The database and interface are being built to implement the model and will be available for public use at http://cebs.niehs.nih.gov. Contact: INTRODUCTION Current research trends emphasize the need to integrate data from studies monitoring changes in expression of genes, proteins and metabolites as a consequence of perturbing biological systems (Ideker et al., 2001; Waters et al., 2003). Comparisons of gene, protein and metabolite data will be invaluable in promoting a global understanding of how biological systems function and respond to environmental stressors (Amin et al., 2002; Witzmann and Grant, 2003; Lindon et al., 2003). The Chemical Effects in Biological Systems (CEBS) toxicogenomics knowledge base is designed to integrate data resulting from these three disciplines in addition to conventional toxicology data (Waters et al., 2003). A standard representation of data types within each discipline is an important prerequisite for efficient and accurate storage, access, analysis, comparison and data exchange. Furthermore, valid conclusions are possible only if the data is sufficiently well annotated with contextual information regarding its origin. These observations have led to the development of a draft Minimum Information About Microarray Experiment (MIAME)/Tox guideline (http://www.mged.org/ Bioinformatics 20(13) © Oxford University Press 2004; all rights reserved. 1 Science CEBS object model Transcriptomics High-throughput platforms such as microarrays (DeRisi et al., 1996, 1997; Chu et al., 1998; Hughes et al., 2000) and Serial Analysis of Gene Expression (SAGE) (Velculescu et al., 1995) have evolved to monitor gene expression patterns in biological samples. The research objective is often focused on quantitative comparisons of gene expression between control states and states induced by specific chemicals, diseases, environments or therapeutic treatments. The general workflow of a typical DNA microarray experiment consists of hybridizing fluorescently labeled mRNA from control and experimental samples to DNA microarray chips on which thousands of gene-specific reporter(s) sequences are arrayed. Images of the array are then acquired and analyzed to obtain the ratio of control and experimental signal intensities, which enable the inference of relative fold changes in gene expression exerted by the experimental treatment or disease state. The basic concept of SAGE rests on two principles: first, a small sequence of nucleotides from the transcript, called a ‘tag’, can effectively identify the original transcript from where it came, and second, these tags can be linked enabling rapid sequence analysis of multiple transcripts. The number of times that a specific transcript is identified in a given sample is used as an indicator of the level of gene expression corresponding to the transcript. Proteomics Proteomics research involves measurement of changing protein expression profiles as affected by chemical toxicity, disease state, environmental insult or therapeutic treatment. The plethora of proteomics platforms reflects the choice of measuring specific attributes of proteins including protein identity (amino acid sequence), mass, charge, post-translational modifications, protein–protein interactions, subcellular location and biological activity. Currently, most proteomics laboratories employ combinations of proteomics platforms since no single technique can measure all protein characteristics in a comprehensive manner (Figeys, 2003). Consequently, several scenarios for proteomics experiments can be envisioned, as illustrated below, reflecting the complex workflow in these experiments. Depending on the cellular compartment of interest, procedures for organelle separation (Rappsilber et al., 2002; Galeva and Altermann, 2002) may be used prior to p (...truncated)