CEBS: a comprehensive annotated database of toxicological data
D964–D971 Nucleic Acids Research, 2017, Vol. 45, Database issue
doi: 10.1093/nar/gkw1077
Published online 29 November 2016
CEBS: a comprehensive annotated database of
toxicological data
Isabel A. Lea1,* , Hui Gong1 , Anand Paleja1 , Asif Rashid1 and Jennifer Fostel2,*
1
ASRCFederal Vistronix, 430 Davis Dr, Suite 260, Morrisville, NC 27569, USA and 2 Division of the National
Toxicology Program, National Institute of Environmental Health Sciences, PO Box 12233, Research Triangle Park,
NC 27709, USA
Received August 08, 2016; Revised October 12, 2016; Editorial Decision October 24, 2016; Accepted November 01, 2016
ABSTRACT
INTRODUCTION
The National Toxicology Program (NTP) was established
by the US Department of Health and Human Services in
1978 in response to concerns about potential human health
effects of environmental chemicals. The NTP provides scientific data to regulatory agencies and other health-related
* To whom correspondence should be addressed. Tel: +1 919 972 7985; Email:
Correspondence may also be addressed to Jennifer Fostel. Email:
Published by Oxford University Press on behalf of Nucleic Acids Research 2016.
This work is written by (a) US Government employee(s) and is in the public domain in the US.
The Chemical Effects in Biological Systems database
(CEBS) is a comprehensive and unique toxicology
resource that compiles individual and summary animal data from the National Toxicology Program (NTP)
testing program and other depositors into a single electronic repository. CEBS has undergone significant updates in recent years and currently contains over 11 000 test articles (exposure agents) and
over 8000 studies including all available NTP carcinogenicity, short-term toxicity and genetic toxicity
studies. Study data provided to CEBS are manually
curated, accessioned and subject to quality assurance review prior to release to ensure high quality.
The CEBS database has two main components: data
collection and data delivery. To accommodate the
breadth of data produced by NTP, the CEBS data collection component is an integrated relational design
that allows the flexibility to capture any type of electronic data (to date). The data delivery component of
the database comprises a series of dedicated user
interface tables containing pre-processed data that
support each component of the user interface. The
user interface has been updated to include a series
of nine Guided Search tools that allow access to NTP
summary and conclusion data and larger non-NTP
datasets. The CEBS database can be accessed online at http://www.niehs.nih.gov/research/resources/
databases/cebs/.
research groups. Chemicals studied at the NTP can be endocrine disruptors, occupational exposure mixtures, pesticides, pharmaceuticals, metals, food additives and herbal
supplements; anything with the potential to impact health.
The NTP conducts comprehensive testing of each substance
or test article (exposure agent) in an effort to provide data
for a strong scientific basis to make credible decisions that
will protect public health. Testing can include evaluations
of toxicity and carcinogenicity, prenatal developmental and
reproductive toxicology, neurobehavioral effects, immunological effects, genetic toxicity, toxicogenomic responses, as
well as chemical disposition and toxicokinetic analysis. Results and conclusions from the NTP testing program are released into the public domain as published reports or journal articles.
A great deal of toxicity information has been generated
by the NTP since its inception in the 1970s. Until recently
these data were made available to the public only as webbased PDF reports on an individual study basis. This made
it a challenge to compare and contrast results for multiple
test articles or different data endpoints for individual animals. To address this issue, the NTP designated the Chemical Effects in Biological Systems (CEBS) database as the
primary repository for its data and has invested significant
effort into making the data available for searching, downloading and data mining.
CEBS was developed as a public repository for toxicogenomics data by the National Center for Toxicogenomics (NCT) within the National Institute of Environmental Health Science (NIEHS). Our most recent publication in 2008 described development of CEBS to capture
microarray (gene expression) and proteomics (protein expression) data (1,2) and illustrated the integration of study
design parameters with toxicological assay data. The CEBS
SysTox Object Model (3) and the CEBS Data Dictionary (4)
were developed to promote this database model. This first
version of the database permitted the CEBS user to select
groups of subjects drawn from different studies, and analyze the associated microarray data. It also provided a good
platform on which to build the current NTP data reposi-
Nucleic Acids Research, 2017, Vol. 45, Database issue D965
DATABASE DESCRIPTION
As technology and techniques continue to evolve, the generation and analysis of data has become increasingly complex. Databases designed to capture a ‘standard’ data structure are likely not to realize their full potential as a data
repository. However, these databases are ideal for general,
narrower scientific inquisitions (e.g. the Gene Expression
Omnibus (GEO)). CEBS on the other hand, has been designed to capture a wide range of endpoints including various study design details, in-life observation data and qualitative and quantitative assay data for individual test subjects from in vivo and in vitro exposures (1). CEBS is a
freely-available online toxicology resource that is a curated
repository of empirical toxicology data (http://tools.niehs.
nih.gov/cebs3/ui/).
The CEBS database has two main components: data collection and data delivery (CEBS database schema: ftp://
anonftp.niehs.nih.gov/ntp-cebs/tools/Database/). The data
collection component has a flexible design capable of collecting any data (to date) using the terms provided by the
depositor. The data delivery component integrates data and
utilizes curated synonyms, conversion rules for data units
and standard normalization methods to faithfully and accurately collapse disparate depositor assay names and units
into a CEBS ‘standard’ (defined in the CEBS Data Dictionary). The data delivery component is optimized for consistent and rapid presentation of the data to the user; the data
collection component is optimized to accommodate data as
it is deposited.
The CEBS database is able to capture metadata for any
study design. An Investigation in CEBS is defined as a selfcontained scientific enquiry. Each NTP test article is assigned to an Investigation, and each Investigation contains
one or more studies. These studies encompass the comprehensive testing that the NTP conducts for each test article
and may include genetic toxicity, carcinogenicity, general
toxicity, toxicogenomics, and others. NTP studies are designed with multiple treatment groups, subjects, proto (...truncated)