The mouse Gene Expression Database (GXD): 2014 update
Constance M. Smith
0
Jacqueline H. Finger
0
Terry F. Hayamizu
0
Ingeborg J. McCright
0
Jingxia Xu
0
Joanne Berghout
0
Jeff Campbell
0
Lori E. Corbani
0
Kim L. Forthofer
0
Pete J. Frost
0
Dave Miers
0
David R. Shaw
0
Kevin R. Stone
0
Janan T. Eppig
0
James A. Kadin
0
Joel E. Richardson
0
Martin Ringwald
0
0
The Jackson Laboratory
, 600 Main Street, Bar Harbor,
ME 04609, USA
The Gene Expression Database (GXD; http://www. informatics.jax.org/expression.shtml) is an extensive and well-curated community resource of mouse developmental expression information. GXD collects different types of expression data from studies of wild-type and mutant mice, covering all developmental stages and including data from RNA in situ hybridization, immunohistochemistry, RT-PCR, northern blot and western blot experiments. The data are acquired from the scientific literature and from researchers, including groups doing largescale expression studies. Integration with the other data in Mouse Genome Informatics (MGI) and interconnections with other databases places GXD's gene expression information in the larger biological and biomedical context. Since the last report, the utility of GXD has been greatly enhanced by the addition of new data and by the implementation of more powerful and versatile search and display features. Web interface enhancements include the capability to search for expression data for genes associated with specific phenotypes and/or human diseases; new, more interactive data summaries; easy downloading of data; direct searches of expression images via associated metadata; and new displays that combine image data and their associated annotations. At present, GXD includes >1.4 million expression results and 250 000 images that are accessible to our search tools.
-
INTRODUCTION
The laboratory mouse serves as a premier animal model in
studying the complex molecular mechanisms that underlie
the processes of human development, differentiation
and disease. Tissues from all stages of mouse development
and from many different mouse strains and mutants
are being subjected to detailed expression analysis. The
Gene Expression Database (GXD) collects these data
from disparate sources, integrates them and makes them
readily accessible to many types of biologically and
biomedically relevant database searches.
By capturing multiple types of mRNA and protein
expression information, including data from RNA in situ
hybridization, immunohistochemistry, in situ reporter
(knock in), reverse transcriptase-polymerase chain
reaction (RT-PCR), northern blot and western blot
experiments, GXD aims to provide increasingly
complete information about where, when and in what
amounts transcripts and proteins are expressed during
development, as well as how their expression varies in
different mouse strains and mutants. Data are acquired
from the literature and from researchers, in particular
from groups doing large-scale expression studies. All
these data are annotated by GXD curators, making
extensive use of controlled vocabularies and ontologies to
provide the standardization of data that enables data
integration and thereby complex queries. GXD forms an
integral component of the larger Mouse Genome
Informatics (MGI) resource. Through this association,
the expression data can be combined with extensive
genetic, functional, phenotypic and disease-orientated
data (1). This robust integration, as well as
interconnections with other resources (216), puts the expression data
in GXD into a much larger analytical context.
Owing to its broad scope, thorough approach, data
integration and querying capabilities, GXD provides an
important and unique resource to the research
community. GXD and its user interfaces have been described
previously (1720). Here we focus on recent progress in terms
of data acquisition and improvements to the querying
capabilities and web displays.
DATA CONTENT AND PROGRESS IN DATA
ACQUISITION
Detailed expression data
GXD provides detailed records of expression results.
The core entry is an assay details record (Figure 1).
Each assay details record includes information about the
gene studied, the probes and experimental conditions
used, the specimen(s) analyzed, the expression results
obtained for each specimen, as well as images of the
data when available. These data are annotated using
standard nomenclature and ontologies and serve as
integration points within the GXD and MGI database.
Expression patterns are described using an extensive,
hierarchically structured anatomical ontology. As well as
allowing for the integration of expression results from
assays with differing spatial resolution, the hierarchical
nature of the ontology allows expression searches by
anatomical term to include all substructures for the
term. The developmental portion of the anatomical
ontology was begun by our collaborators from the
eMouseAtlas project (21) and is being extended and
refined jointly with GXD; the postnatal part was
developed by the GXD project (22).
Genes studied in expression references
Genes with expression assay results
Expression assays
Expression assay results
Expression images
Mouse mutants with expression data
Since our last report, the amount of expression data in
GXD has increased significantly both due to the GXD
curators annotation of data from the literature and
through the incorporation of data obtained from
largescale expression projects. In all cases, the curators review
the data and standardize it. When necessary, they work
with laboratories to resolve issues pertinent to
nomenclature and data inconsistencies. The most recently acquired
large data sets include RNA in situ studies of gene
expression in the day 14.5 embryo [GenePaint; (23)]; RNA in situ
and immunohistochemistry studies of gene expression
in the genitourinary tract [GUDMAP; (24)]; and RNA
in situ studies of gene expression in the embryonic and
adult mouse nervous system [BGEM; (25)]. The
integration of these data into GXD greatly expands the research
communitys ability to query these data, increasing their
utility.
As shown in Table 1, GXD currently contains detailed
expression data for nearly 13 800 genes. There are 1.4
million expression result annotations; 82% are from
RNA in situ hybridization studies and 10% from
RTPCR studies. These results include data from >1850
mouse mutants, as well as numerous strains of wild-type
mice. In addition, the database contains >250 000 images
of primary expression data.
Comprehensive literature survey
GXD maintains a comprehensive and up-to-date index
of the embryonic gene expression literature that can
be searched using the Gene Expression Literature query
(http://www.informatics.jax.org/gxdlit). The curators
survey journals to find all published articles that
describe endogenous gene expression and knock-in
reporter studies done in the embryonic mouse. They
then review the entire publication, including any
supplemental material, and record the genes and ages ana (...truncated)