CEBS—Chemical Effects in Biological Systems: a public data repository integrating study design and toxicity data with microarray and proteomics data
Michael Waters
2
Stanley Stasiewicz
2
B. Alex Merrick
2
Kenneth Tomer
2
Pierre Bushel
2
Richard Paules
2
Nancy Stegman
2
Gerald Nehls
2
Kenneth J. Yost
1
C. Harris Johnson
1
Scott F. Gustafson
1
Sandhya Xirasagar
1
Nianqing Xiao
1
Cheng-Cheng Huang
1
Paul Boyer
1
Denny D. Chan
1
Qinyan Pan
1
Hui Gong
1
John Taylor
0
Danielle Choi
4
5
Asif Rashid
5
Ayazaddin Ahmed
3
Reese Howle
3
James Selkirk
2
Raymond Tennant
2
Jennifer Fostel
5
0
Large Scale Biology Corporation
, 3333 Vaca Valley Parkway, Vacaville,
CA 95688
1
Science Applications International Corporation
, 1710 SAIC Drive, McLean,
VA 22101
2
NIEHS,
National Center for Toxicogenomics
, PO Box 12233,
Research Triangle Park
,
NC 27709
3
Alpha Gamma Technologies, Inc., 4700 Falls of Neuse Road, Suite 350, Raleigh,
NC
, 27609,
USA
4
Research Triangle Institute
, PO Box 12194,
Research Triangle Park
,
NC 27709
5
Lockheed Martin Information Technologies, PO Box 12233,
Research Triangle Park
,
North Carolina 27709
-
CEBS (Chemical Effects in Biological Systems) is an
integrated public repository for toxicogenomics
data, including the study design and timeline,
clinical chemistry and histopathology findings and
microarray and proteomics data. CEBS contains
data derived from studies of chemicals and of
genetic alterations, and is compatible with clinical
and environmental studies. CEBS is designed to
permit the user to query the data using the study
conditions, the subject responses and then, having
identified an appropriate set of subjects, to move to
the microarray module of CEBS to carry out gene
signature and pathway analysis. Scope of CEBS:
CEBS currently holds 22 studies of rats, four studies
of mice and one study of Caenorhabditis elegans.
CEBS can also accommodate data from studies of
human subjects. Toxicogenomics studies currently
in CEBS comprise over 4000 microarray
hybridizations, and 75 2D gel images annotated with protein
identification performed by MALDI and MS/MS.
CEBS contains raw microarray data collected in
accordance with MIAME guidelines and provides
tools for data selection, pre-processing and analysis
resulting in annotated lists of genes of interest.
Additionally, clinical chemistry and histopathology
findings from over 1500 animals are included in
CEBS. CEBS/BID: The BID (Biomedical Investigation
Database) is another component of the CEBS
system. BID is a relational database used to load
and curate study data prior to export to CEBS, in
addition to capturing and displaying novel data
types such as PCR data, or additional fields of
interest, including those defined by the HESI
Toxicogenomics Committee (in preparation). BID
has been shared with Health Canada and the US
Environmental Protection Agency. CEBS is available
at http://cebs.niehs.nih.gov. BID can be accessed
via the user interface from https://dir-apps.niehs.
nih.gov/arc/. Requests for a copy of BID and for
depositing data into CEBS or BID are available at
http://www.niehs.nih.gov/cebs-df/.
CEBS (Chemical Effects in Biological Systems) is a public
repository for toxicogenomics data developed by the
National Center for Toxicogenomics (NCT) within the
National Institute of Environmental Health Science
(NIEHS). Development of CEBS began in 2002 (1) and
focused first on capture of microarray and proteomics
data. The CEBS SysBio Object Model (2), based on
MIAME (3) and MIAPE Standard (4), was used for this
portion of the development of CEBS. CEBS1 was released
in August 2003, followed by the start of development of
CEBS2. The aim of the second stage of CEBS
development was to integrate study design and toxicological assay
data with theomics data captured in CEBS. Thus, the
CEBS SysTox Object Model (5) and the CEBS Data
Dictionary (CEBS-DD) (6) were developed to permit
accurate management of study data. CEBS2 was released
in November 2006.
As of July 2007, there are 27 toxicogenomics studies in
CEBS. A study refers to an observational or
perturbational experiment carried out over a defined timeline to
understand a biological system, address a scientific
question and/or to generate hypotheses. Of the 27 studies,
22 are of rat, 4 are of mouse and 1 is of Caenorhabditis
elegans. Twenty-six of the studies have associated
microarray data, one has proteomics data. Companies which
have published data in CEBS include Iconix Biosciences
(http://www.iconixpharm.com/), Johnson & Johnson
(7,8), Pfizer Inc. (9) and Sankyo Co., Ltd (10). Other
data in CEBS have been submitted by researchers at the
National Cancer Institute (11,12), the University of
Tennessee (1315), the University College of London
(16,17), the HESI Toxicogenomics Committee (18,19) and
the Toxicogenomics Research Consortium (submitted).
Additional data have been deposited in CEBS from
inhouse studies carried out at the NIEHS (20) and at the
National Toxicology Program (NTP) (21,22).
CEBS can store data from studies of laboratory
animals, cultured cells or humans. Most studies in CEBS
contain observations or measurements made of the study
subjects and of specimens such as blood or tissue sections
derived from these subjects. The objective of CEBS is to
permit the user to integrate various data types and studies.
The CEBS user can select groups of subjects drawn from
different studies, based on subject responses or study
conditions. Once the subjects are selected, any associated
microarray data can be analyzed to produce lists of
annotated genes that can shed light on the biological and
toxicological processes occurring in the subjects.
MATERIALS AND METHODS
Scope and Utility of CEBS
CEBS is the first public repository designed to integrate
toxicological, histopathological and other biological
measures withomics data. A number of other databases,
for instance the Gene Expression Omnibus (GEO) (23,24),
capture microarray data and information about the
sample treatment. The ArrayExpress database (25)
captures observations and measures taken on the study
subject concurrently with preparation of the tissue for
microarray analysis (26). A distinguishing feature of
CEBS is that the data captured from toxicogenomics
studies includes observations made of the subject
throughout the study timeline, potentially both before and after
a specimen was taken for toxicological, histopathological
or other biological analysis. Since the descriptions of the
protocols used in the study and associated analyses are
captured using controlled vocabularies rather than in free
text form, these data are available for effective filtering
and query. These protocols, measures and temporal events
are useful in anchoring the transcriptomics or proteomics
profile displayed by the specimen within the time- and
dose-dependent biological responses seen in the study.
Thus CEBS supports phenotypic anchoring (2731),
defined as the linking of microarray or proteomics data
with a pathophysiological phenotype.
CEBS includes both microarray and proteomics data.
Microarray data in CEBS includes 965 hybridizations to
Affymetr (...truncated)