The Mouse Genome Database (MGD): the model organism database for the laboratory mouse
Judith A. Blake
0
1
Joel E. Richardson
0
1
Carol J. Bult
0
1
Jim A. Kadin
0
1
Janan T. Eppig
0
1
The Mouse Genome Database Group
0
1
0
Current members of the Mouse Genome Database Group are: R. M. Baldarelli
, M. Baya, J. S. Beal, W. J. Boddy, D. W. Bradt, D. J. Burkart, N. E. Butler, J. Campbell, T. Chu, L. E. Corbani, S. Cousins, H. J. Drabkin, D. M. Garippa, C. W. Goldsmith, P. L. Grant, M. Lennon-Pierce, I. Lu, C. M. Lutz, L. J. Maltais, P. Mani, L. M. McKenzie, J. E. Ormsby, A. J. Planchart, S. Ramachandran, D. J. Reed, D. R. Shaw, C. Smith, P. Szauter, L. A. Trombley and T. C. Wiegers
1
The Jackson Laboratory
, 600 Main Street, Bar Harbor,
ME 04609, USA
The Mouse Genome Database (MGD) is the community database resource for the laboratory mouse, a key model organism for interpreting the human genome and for understanding human biology and disease (http://www.informatics.jax.org). MGD strives to provide a highly curated, highly integrated information resource that not only includes the consensus view of current knowledge about the mouse, but also provides comparative genomic information particularly for human and rat genomes. MGD includes extensive information about mouse genes, supporting all gene attribute assertions with experimental data, statements of evidence and citation. Detailed information about alleles and mouse mutants includes genotype, molecular variant and phenotype descriptions. Extensive collaboration with other data providers such as NCBI, RIKEN and SWISS-PROT provides standardization of gene:sequence associations and robust interconnections between large information systems based on shared sequence curation. Recent integration of large datasets of mouse full-length cDNAs and radiation-hybrid mapped ESTs, the continued development and use of extensive structured vocabularies and the expansion of the representation of phenotypes highlight this year's developments.
-
The Mouse Genome Database (MGD; http://www.informatics.
jax.org) is the public community resource representing the
genetics, genomics and phenotypes of the laboratory mouse.
MGD focuses on an integrated representation of genotype
(sequence) to phenotype information including highly curated
information about genes and gene products (1) (Table 1).
Primary foci of integration are through representations of
relationships between genes, sequences and phenotypes. The
annotation pipeline includes extensive curation of the scientific
literature. All annotations in MGD are supported with
experimental evidence and citations. MGD provides official
nomenclature for mouse genes and works closely with human
and rat genome annotation groups to curate relationships
between these genomes and to standardize the representation
of genes and gene families.
MGD provides information about alleles and targeted mutations,
homology data for mammalian orthologs and detailed mapping
data at both the gene and genomic levels. Extensive curation of
sequence to gene associations provides the fundamental
dataset against which computational annotation systems are
calibrated and tested. MGD provides graphical views of the
comparative genomic data from the gene, chromosomal region,
genome or species perspective. Extensive experimental mapping
data including genetic and physical maps are available, including
data that conflicts with current consensus map positions.
MGD is part of the Mouse Genome Informatics (MGI)
project effort based at The Jackson Laboratory (http://
www.jax.org) and collaborates closely with the Gene Expression
Database (GXD) (2), the Mouse Genome Sequencing (MGS)
project and the Mouse Tumor Biology (MTB) database (3) to
provide an integrated information resource for the laboratory
mouse. MGD is a founding member of the Gene Ontology (GO)
consortium (4) and contributes particularly to the development of
mammalian components of the GO vocabularies. MGD curators
collaborate extensively with SWISS-PROT (5) and with the
LocusLink project at NCBI (6) to evaluate and update mouse
gene:sequence associations.
IMPROVEMENTS DURING 2001
Expanded allele and mutant phenotype data
For each allele or mutant, MGD provides data describing the
type of allele (e.g. ENU-induced, transgene targeted), the
mouse strain on which it arose, the phenotypic characteristics
of homozygous and heterozygous carriers and its relationship
with human genes and disease. New data available include
molecular details of the allelic change, ES cell lines and cell
line strain (for targeted alleles), promoter details (where
relevant) and expanded citations. In addition, phenotypic
alleles are linked to GXD whenever expression data are
available for mice carrying the allele.
Controlled vocabularies for phenotypes
MGD continues to develop and implement controlled and
structured vocabularies to standardize the annotation of
Number of markers (including genes)
Number of genes with sequence data
Number of markers mapped
Number of mouse/human curated orthology reports
Number of genes with links to SWISS-PROT
Number of genes with GO annotations
Number of genes with annotated alleles
Number of annotated alleles
Number of mouse nucleotide sequences curated in
MGI system (includes ESTs)
aSee text for caveats on this number.
information for mouse genes and genomic features. A recent
effort to develop and use standard vocabularies for describing
mouse normal and mutant phenotypes will improve searching,
classifying and analyzing phenotype data. The current
structured vocabulary consists of several thousand terms, each
associated with precise definitions and a citation as a source of
the information. Terms are organized hierarchically, from
general to specific, allowing annotation to reflect the state of
knowledge about particular mutants (e.g. a newly discovered
mutation may be observed to have a hearing defect; a better
studied mutation may assign the hearing defect to degeneration
of the organ of Corti). Phenotype vocabulary terms are being
used to annotate the phenotypes of mice carrying heterozygous
or homozygous mutant alleles on particular genetic
backgrounds and these data are presented as part of the MGD allele
and mutant phenotype reports.
PhenoSlim. A particular subset of the phenotype vocabularies
consisting of the broadest, high-level terms of the full phenotype
vocabulary, referred to as PhenoSlim, includes approximately
100 terms and is being used to develop the initial phenotype
query capability in MGD.
Curated orthology assertions and gene family summaries
MGD provides gene family pages that summarize information
about mouse, human and rat orthologs. Each summary report
includes official gene symbols, a representative sequence for each
gene in each species and links to MGD gene reports, human
LocusLink records and, in the near future, links to the Rat Genome
Database (7) gene detail pages. An example of the claudin gene
family pages can be viewed at http://www.informatics.jax.org/
mgihome/nomen/genefamilies/claudin.shtml. These c (...truncated)