The Mouse Genome Database: enhancements and updates
D586–D592 Nucleic Acids Research, 2010, Vol. 38, Database issue
doi:10.1093/nar/gkp880
Published online 27 October 2009
The Mouse Genome Database: enhancements
and updates
Carol J. Bult*, James A. Kadin, Joel E. Richardson, Judith A. Blake and Janan T. Eppig
and the Mouse Genome Database Groupy
The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609 USA
Received September 15, 2009; Accepted October 1, 2009
ABSTRACT
INTRODUCTION
The Mouse Genome Database (MGD) is a major
component of the Mouse Genome Informatics
(MGI, http://www.informatics.jax.org/) database
resource and serves as the primary community
model organism database for the laboratory
mouse. MGD is the authoritative source for mouse
gene, allele and strain nomenclature and for
phenotype and functional annotations of mouse
genes. MGD contains comprehensive data and
information related to mouse genes and their
functions, standardized descriptions of mouse
phenotypes, extensive integration of DNA and
protein sequence data, normalized representation
of genome and genome variant information
including comparative data on mammalian genes.
Data for MGD are obtained from diverse sources
including manual curation of the biomedical literature and direct contributions from individual investigator’s laboratories and major informatics
resource centers, such as Ensembl, UniProt and
NCBI. MGD collaborates with the bioinformatics
community on the development and use of biomedical ontologies such as the Gene Ontology and
the Mammalian Phenotype Ontology. Recent
improvements in MGD described here includes
integration of mouse gene trap allele and
sequence data, integration of gene targeting information from the International Knockout Mouse
Consortium, deployment of an MGI Biomart, and
enhancements to our batch query capability for
customized data access and retrieval.
The Mouse Genome Database (MGD) is an integrated
database of genetic, genomic and phenotypic data for
the laboratory mouse (1–3). MGD is a central component of the Mouse Genome Informatics (MGI) database
resource (http://www.informatics.jax.org), the community model organism database for the laboratory
mouse. Other MGI data resources integrated with
MGD includes the Gene Expression Database (GXD)
(4), the Mouse Tumor Biology Database (MTB) (5),
the Gene Ontology (GO) project (6) and the MouseCyc
database of biochemical pathways (7). Data in MGD are
updated daily. There are typically four to six major
software releases per year to support access and display
of new data types.
The primary data types maintained in MGD include
mouse genes and other genome features along with
their function and phenotype annotations, associations
of genome features with nucleotide and protein sequences,
genetic and physical maps, gene families, mutant
phenotypes, SNPs and other polymorphisms animal
models of human disease, and mammalian homology. A
recent summary of MGD content is shown in Table 1.
MGD is the authoritative source for mouse gene, allele
and strain nomenclature, Gene Ontology annotations for
mouse gene function, and Mammalian Phenotype (MP)
Ontology (8) annotations for phenotype associations.
MGD contains the most comprehensive source of mouse
phenotype information and associations between human
diseases and mouse models. MGI curatorial staff acquire
data by direct data loads from other databases, from
direct submission from researchers and from published
literature. To facilitate data integration, MGI employs
recognized standards for genetic nomenclature and functional annotation to describe mouse sequence data, genes,
*To whom correspondence should be addressed. Tel: +1 207 288 6248; Fax: +1 207 288 6132; Email:
y
The Mouse Genome Database Group: M. T. Airey, A. Anagnostopoulos, R. Babiuk, R. M. Baldarelli, M. Baya, J. S. Beal, S. M. Bello, D. W.
Bradt, D. L. Burkart, N. E. Butler, J. Campbell, L. E. Corbani, S. L. Cousins, D. J. Dahmen, H. Dene, A. D. Diehl, M. E. Dolan, K. L. Forthofer,
K. S. Frazer, P. Frost, D. E. Geel, M. Hall, M. Knowlton, J. R. Lewis, L. J. Maltais, M. McAndrews-Hill, S. McClatchy, M. J. McCrossin,
J. Mason, T. F. Meehan, D. B. Miers, L. A. Miller, L. Ni, H. Onda, J. E. Ormsby, D. J. Reed, B. Richards-Smith, D. R. Shaw, R. Sinclair,
D. Sitnikov, C. L. Smith, P. Szauter, M. Tomczuk, L. L. Washburn, I. T. Witham, Y. Zhu.
ß The Author(s) 2009. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/
by-nc/2.5/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Nucleic Acids Research, 2010, Vol. 38, Database issue
Table 1. Summary of MGD data content (10 September 2009)
NEW IN 2009
MGD data statistics
Completing the representation of Mouse Gene Traps
10 September 2009
Genes with nucleotide sequence data
28 891
Genes with protein sequence data
26 255
Genes (including uncloned mutations)
36 323
Genes with GO annotations
18 167
Mouse/human orthologs
17 787
Mouse/rat orthologs
16 768
17 227
Genes with one or more mutant allelesa
8363
Genes with one or more phenotypic allelesb
a
524 527
Total mutant alleles
22 666
Phenotypic allelesb
Targeted alleles
13 721
Gene trapped alleles
501 232
Human diseases with one or more mouse models
964
QTLs
4248
Number of references
146 597
Mouse RefSNPs
10 089 692
a
Mutant alleles include those occurring in mice and/or in ES cell lines.
Phenotypic alleles include only those mutant alleles present in mice.
b
strains, expression data, alleles and phenotypes. All data
associations in MGD are supported with evidence and
citations.
Researchers can query MGD using keyword searches,
vocabulary browsers and advanced web-based query
forms. Keyword search supports the use of the wildcard
characters (i.e.*) for broad searches and the use of
quotation marks for specific phrases search. MGD also
provides vocabulary browsers for GO annotations,
MP annotations and Human Disease Term annotations
to support browsing of the database content. The webbased query forms in MGD allow, users to construct
queries of differing degrees of specificity. For example,
using the Genes and Markers Query form in MGD, a
researcher query broadly for all genes on mouse
Chromosome 3 or specifically for genes on Chromosome
3 that are associated with specific phenotypes and/or
functions (i.e. show me all genes on mouse
Chromosome 3 that are associated with respiratory
distress and that have been annotated functionally as
being enzymes). The MGI MouseBLAST server allows
users to interrogate the MGI database using nucleotide
and/or protein sequences. Access to data in MGD is also
facilitated by summary data files that are updated
nightlyand available for download via FTP, and
through direct SQL (Structured Query Language; user
account is required).
The staff of MGD collaborates with members of other
large genome inform (...truncated)