The mouse Gene Expression Database (GXD): 2007 update
Constance M. Smith
0
Jacqueline H. Finger
0
Terry F. Hayamizu
0
Ingeborg J. McCright
0
Janan T. Eppig
0
James A. Kadin
0
Joel E. Richardson
0
Martin Ringwald
0
0
The Jackson Laboratory
, 600 Main Street, Bar Harbor,
ME 04609, USA
The Gene Expression Database (GXD) provides the scientific community with an extensive and easily searchable database of gene expression information about the mouse. Its primary emphasis is on developmental studies. By integrating different types of expression data, GXD aims to provide comprehensive information about expression patterns of transcripts and proteins in wild-type and mutant mice. Integration with the other Mouse Genome Informatics (MGI) databases places the gene expression information in the context of genetic, sequence, functional and phenotypic information, enabling valuable insights into the molecular biology that underlies developmental and disease processes. In recent years the utility of GXD has been greatly enhanced by a large increase in data content, obtained from the literature and provided by researchers doing large-scale in situ and cDNA screens. In addition, we have continued to refine our query and display features to make it easier for users to interrogate the data. GXD is available through the MGI web site at http://www.informatics.jax.org/ or directly at http://www.informatics.jax.org/menus/ expression_menu.shtml.
-
The laboratory mouse serves as a premier animal model in
studying the complex molecular networks that underlie the
processes of human development, differentiation and disease.
To gain insights into these networks, it is essential to know
where, when and in what amounts transcripts and proteins
are expressed, and how their expression varies in different
mouse strains and mutants. The Gene Expression Database
(GXD) addresses this objective in a uniquely comprehensive
way. GXD is the only resource that acquires mouse
expression data from the literature in a systematic manner, as well
as acquiring data directly from conventional and large-scale
providers via electronic data submission and bulk data
downloads. GXD integrates various types of mRNA and protein
expression information, collects data from all tissue and
developmental stages and includes data from many different
mouse strains and mutants. Annotations in GXD make
extensive use of controlled vocabularies and ontologies to
provide the standardization of data that enables complex
queries. In addition, GXD is fully integrated with the other
databases of the Mouse Genome Informatics (MGI) resource,
including the Mouse Genome Database (MGD) (1,2) and the
MGI part of the Gene Ontology Project (GO) (3). MGI also
maintains comprehensive links to external resources such as
sequence databases, Entrez Gene, UniProt, InterPro, Online
Mendelian Inheritance in Man (OMIM), PubMed and other
mammalian databases (415). This robust integration puts
the expression data annotated in GXD into a much larger
biological and analytical context. Thus, users are able to query
using extensive genetic, sequence, functional, expression
and phenotypic information.
Other public and laboratory databases have been developed
in recent years to store mouse expression data (1626). They
store data from one or two specific assay types and/or focus
on specific tissues/developmental stages; they are often
dedicated to specific data generation projects. These
databases are complementary to the GXD effort. Due to its
broad scope, its thorough approach and its data integration
and querying capabilities, though, GXD provides a unique
resource to the biomedical research community. New data
are entered and made publicly available on a daily basis.
GXD and its query interfaces have been described previously
(2730). Here we focus on recent progress in terms of data
acquisition and querying capabilities.
The Gene Expression Literature Index
GXD curators survey journals to find all published papers that
describe endogenous gene expression and knock-in reporter
studies done in the embryonic mouse. In a first annotation
step, the curators record the genes and ages analyzed and
the expression assay types used in these publications. GXD
combines these data with information obtained from PubMed
and makes them available for searching via the Gene
Expression Literature Index. Therefore, users can query for specific
types of expression information in combination with
bibliographic information (author, journal, year) or specific words
in the title or abstract of publications. The Literature Index
is comprehensive and up-to-date; it contains all pertinent
journal articles from 1993 to the present and articles from
major developmental journals from 1990 to the present.
Currently, the index contains >56 500 entries covering nearly
12 300 references analyzing nearly 8700 genes. Thus, it
provides a powerful tool to quickly locate expression
information in the literature.
Gene expression data
GXD currently collects detailed expression data from the
following assay types: RNA in situ hybridization,
immunohistochemistry, in situ reporter (knock in), northern
blot, RTPCR, western blot, RNase protection and nuclease
S1 protection studies. Work is underway to incorporate
microarray data as well. As illustrated in Figure 1, expression
records in GXD are detailed. Each entry contains a
description of the assay type and the molecular probe used in the
assay, the genetic origin of the sample and the experimental
conditions used. The time and tissue of expression, the
authors description of pattern and strength of expression,
the number and sizes of detected bands and sequence
information are also recorded. Expression patterns are
described using an extensive dictionary of standardized
anatomical terms that lists the anatomical structures for each
developmental stage in a hierarchical fashion, thus enabling
the recording of expression results from assays with different
spatial resolution in a consistent manner. The embryonic part
of the anatomical dictionary was developed by our
collaborators from the Edinburgh Mouse Atlas and Gene Expression
Database (EMAGE) project (31); the adult part was
developed by the GXD project (32). As well as enabling complex
querying capabilities, these detailed annotations make it
easier to interpret and compare expression data.
GXDs data content has increased significantly in recent
years (Figure 2). Currently, GXD contains data from
>24 600 assays that provide >260 000 detailed expression
results for nearly 7700 genes, including expression data from
almost 1000 different mouse mutants. Two-thirds of these
data are linked to images of the primary expression data;
GXD currently contains >43 000 images of expression data.
This rapid growth in data content was made possible by the
daily annotation of expression data from the literature and
through the incorporation of large sets of expression data
from large-scale RNA in situ hybridization and RTPCR
screens. Recently acquired large dat (...truncated)