Chemical Effects in Biological Systems—Data Dictionary (CEBS-DD): A Compendium of Terms for the Capture and Integration of Biological Study Design Description, Conventional Phenotypes, and ‘Omics Data
Jennifer Fostel
3
4
Danielle Choi
3
4
Craig Zwickl
0
3
Norman Morrison
2
3
Asif Rashid
3
4
6
Atif Hasan
3
6
Wenjun Bao
3
5
Ann Richard
3
5
Weida Tong
3
k Pierre R. Bushel
3
kj Roger Brown
3
kk Maribel Bruno
3
kj Michael L. Cunningham
1
3
David Dix
3
5
William Eastin
1
3
Carlos Frade
3
4
Alex Garcia
0
3
Alexandra Heinloth
3
kj Rick Irwin
1
3
Jennifer Madenspacher
3
kj B. Alex Merrick
3
kj Thomas Papoian
3
\\ Richard Paules
3
kj Philippe Rocca-Serra
0
3
Assunta-Susanna Sansone
0
3
James Stevens
0
3
Kenneth Tomer
3
kj Chihae Yang
3
Michael Waterskj
3
0
Lilly Research Laboratory
,
Greenfield, Indiana 46140
1
U.S. National Toxicology Program
,
Research Triangle Park, North Carolina 27709
2
National Environmental Research Council (NERC), University of Manchester
,
Manchester M13 9PL
,
U.K.
3
The information in this document has been funded in part by the National Institute of Environmental Health Sciences, the National Center for Toxicoge- nomics, and the U.S. Environmental Protection Agency. It has been reviewed by the National Health and Environmental Effects Research Laboratory and approved for publication. Approval does not signify that the contents reflect the views of the Agency, nor does mention of trade names or commercial products constitute endorsement or recommendation for use. Toxicogenomics
,
PO Box 12233 Mail Drop F1-05, 111 Alexander Drive, Research Triangle Park NC 27709-2233. Fax: (919) 541-1460
4
LIMT Lockheed Martin Information Technology (LMIT)
,
Research Triangle Park, North Carolina 27709
5
U.S. Environmental Protection Agency
,
Research Triangle Park, North Carolina 27711
6
Alpha-Gamma Technologies, Inc.
,
Raleigh, North Carolina 27609
acute toxicity studies, but with a design that will permit it to be extended to other areas of toxicology and biology with the addition of domain-specific terms. To illustrate the utility of the CEBS-DD, we present an example of integrating data from two proteomics and transcriptomics studies of the response to acute acetaminophen toxicity (A. N. Heinloth et al., 2004, Toxicol. Sci. 80, 193-202).
-
A critical component in the design of the Chemical Effects in
Biological Systems (CEBS) Knowledgebase is a strategy to
capture toxicogenomics study protocols and the toxicity endpoint
data (clinical pathology and histopathology). A Study is generally
an experiment carried out during a period of time for the purpose
of obtaining data, and the Study Design Description captures the
methods, timing, and organization of the Study. The CEBS Data
Dictionary (CEBS-DD) has been designed to define and organize
terms in an attempt to standardize nomenclature needed to
describe a toxicogenomics Study in a structured yet intuitive format
and provide a flexible means to describe a Study as conceptualized
by the investigator. The CEBS-DD will organize and annotate
information from a variety of sources, thereby facilitating the
capture and display of toxicogenomics data in biological context
in CEBS, i.e., associating molecular events detected in
highlyparallel data with the toxicology/pathology phenotype as observed
in the individual Study Subjects and linked to the experimental
treatments. The CEBS-DD has been developed with a focus on
The ability to archive, retrieve, and exchange high content
data sets, including transcript profiles, among laboratories,
industries, and government agencies is a crucial step in
exploiting the power of high content technologies to describe
the response of an organism to the environment. A key step in
achieving this end is to develop a publicly accessible database
and associated standards for the exchange of data with
associated metadata that provide experimental context so that
the data can be mined efficiently and intuitively. Currently no
national or international standard provides the necessary
nucleus of metadata standards around which such a database
can be organized to facilitate the facile electronic exchange of
information among interested stakeholders. Therefore, a
consortium of the stakeholders from the private and public sectors
have contributed to the development of a data dictionary
containing terms, definitions, relationships, and controlled
vocabularies for the Chemical Effects in Biological Systems
(CEBS) Knowledgebase.
The CEBS Knowledgebase is being developed at the
National Center for Toxicogenomics (NCT). Currently still
early in the development process, CEBS will become a public
toxicogenomics resource integrating traditional toxicology and
pathology phenotype data with data from highly parallel
technologies, such as from microarray or proteomics studies, in
biological context using the Study Design Description (Waters,
2004; Waters et al., 2003). To accomplish this, CEBS captures
the relevant characteristics of Study Subjects and methods
(Protocols), and the Study Timeline on which Events such as
treatment, animal care, and exit (euthanasia) occur. These
characteristics are collectively termed the Study Design
Description. The CEBS Data Dictionary (CEBS-DD) includes the
terms, definitions, and relationships to support the accurate
capture of elements of the Study Design Description by CEBS.
Once the Study Design Description has been captured, it can be
used to organize and annotate the data derived from Study
Subjects and to display the data in a meaningful biological
context within CEBS.
The challenges inherent in building the CEBS-DD are
twofold. First, the minimal information needed to interpret
a toxicogenomics Study must be identified to ensure that data
deposited in CEBS meet a common minimum standard. This
need is satisfied by CEBS-DD, which extends the original
Minimal Information about a Microarray
Experiment/Toxicology (MIAME/Tox) standard developed by the NCT, the
European Bioinformatics Institute (EBI) and the International
Life Sciences Institute Health and Environmental Sciences
Institute (ILSI-HESI)
(www.mged.org/MIAME1.1-DenverDraft.DOC). The minimal information requirement is highly
dependent upon the biological conduct of the experiment, and
has been extended in the CEBS-DD primarily within the
framework of an acute toxicity Study. CEBS will offer
a graphical user interface (GUI) to capture minimal Study
information (see Figure 1).
The second challenge, also met by the CEBS-DD, is to
define the maximum information that can be provided by
a data depositor. CEBS must be able to accurately capture any
and all relevant pieces of information about the Study and then
interpret and present the data in a way that permits querying by
the CEBS user. In most cases well-annotated sources are
already in an electronic format; thus, it is anticipated that
transfer to CEBS of data from richly annotated studies will
occur electronically rather than through a manual input web
interface, and that the CEBS-DD will facilitate the writing of
parsers by supplying annotation and synonyms for different
data formats. This electronic parsing would occur apart from
the CEBS web interfac (...truncated)