MouseIndelDB: a database integrating genomic indel polymorphisms that distinguish mouse strains
D600–D606 Nucleic Acids Research, 2010, Vol. 38, Database issue
doi:10.1093/nar/gkp1046
Published online 20 November 2009
MouseIndelDB: a database integrating genomic indel
polymorphisms that distinguish mouse strains
Keiko Akagi1,2, Robert M. Stephens3, Jingfeng Li2,4, Evgenji Evdokimov5,
Michael R. Kuehn5, Natalia Volfovsky3 and David E. Symer2,4,6,7,8,9,*
1
Mouse Cancer Genetics Program, Center for Cancer Research, National Cancer Institute, Frederick, MD 21702,
Department of Molecular Virology, Immunology and Medical Genetics, The Ohio State University Comprehensive
Cancer Center, Columbus, OH 43210, 3Advanced Biomedical Computing Center, Information Systems Program,
SAIC-Frederick, Inc., NCI-Frederick, Frederick, MD 21702, 4Basic Research Laboratory, 5Laboratory of Protein
Dynamics and Signaling, 6Laboratory of Biochemistry and Molecular Biology, Center for Cancer Research,
National Cancer Institute, Frederick, MD 21702, 7Human Cancer Genetics Program, 8Departments of Internal
Medicine and 9Biomedical Informatics, The Ohio State University Comprehensive Cancer Center, Columbus,
OH 43210, USA
2
Received August 5, 2009; Revised October 23, 2009; Accepted October 27, 2009
ABSTRACT
MouseIndelDB is an integrated database resource
containing thousands of previously unreported
mouse genomic indel (insertion and deletion)
polymorphisms ranging from 100 nt to 10 Kb in
size. The database currently includes polymorphisms identified from our alignment of 26 million
whole-genome shotgun sequence traces from four
laboratory mouse strains mapped against the reference C57BL/6J genome using GMAP. They can be
queried on a local level by chromosomal coordinates, nearby gene names or other genomic
feature identifiers, or in bulk format using categories
including mouse strain(s), class of polymorphism(s)
and chromosome number. The results of such
queries are presented either as a custom track on
the UCSC mouse genome browser or in tabular
format. We anticipate that the MouseIndelDB
database will be widely useful for research in mammalian genetics, genomics, and evolutionary
biology. Access to the MouseIndelDB database is
freely available at: http://variation.osu.edu/.
INTRODUCTION
An ultimate goal of genetics research is to link phenotypic
differences with different genomic variants, and vice versa.
Hundreds of distinct mouse strains are characterized by
wide-ranging functional differences. This extensive
phenotypic variation has helped to make the mouse a
premier model organism, mimicking many aspects of
human diversity and diseases. Understanding the
genomic differences that distinguish different mouse
strains and species will improve the usefulness of different
mouse lineages as model organisms, facilitate further evolutionary analysis of ancestral relationships for mouse
species and strains and shed new light on the genetic
basis for variation among human individuals and in
human diseases (1,2).
Recently, much attention has been given to the types of
variation that exist within or between mammalian species
(3–5), particularly short variations such as single
nucleotide polymorphisms (SNPs) (6,7). Identification
and analysis of such variants has been accomplished by
many groups, as exemplified by the HapMap project
compiling human data (8). These studies have helped to
facilitate the recent discovery of genes associated with certain diseases by genome-wide linkage analyses. In addition
to SNPs, insertion/deletion (indel) polymorphisms are
another important form of variation (9–15). Indels are
comprised of blocks of nucleotides that are present in
one individual, strain or lineage, but absent at the
orthologous locus in another. In addition to being useful
in genotyping studies, indel polymorphisms can have
direct functional consequences. As they are longer than
SNPs, and may introduce or alter promoters, terminators,
alternative splice sites and/or other determinants of
transcriptional variation (16–19), indel polymorphisms
*To whom correspondence should be addressed. Tel: +1 614 292 0885; Fax: +1 614 292 6108; Email:
Correspondence may also be addressed to Robert M. Stephens, Tel: +1 301 846 5787; Fax: +1 301 846 5762; Email:
Present address:
Evgenji Evdokimov, Food and Drug Administration, Department of Health and Human Services, Bethesda, MD, USA
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.
ß The Author(s) 2009. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/
by-nc/2.5/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Nucleic Acids Research, 2010, Vol. 38, Database issue
could contribute significantly both to differences in gene
structure and expression, and to various disease processes.
In addition to indel polymorphisms, other important
forms of structural variation, including copy number
variants and polymorphic segmental duplications, also
have been studied extensively (5,20–22).
A rich potential source of information about genomic
variation exists in unassembled, conventional wholegenome shotgun (WGS) sequence traces obtained from
different individuals within or between species. Recently,
such traces have been used to identify human SNPs
(23,24) and simple tandem repeat (STR) and short indel
polymorphisms (10,11,25), as tools to identify such
polymorphisms from sequence traces have been developed
(26). To identify intermediate length (101–10 000 nt) indels
distinguishing between mouse lineages, we recently aligned
26 million WGS traces from four unassembled mouse
strains to the C57BL/6J reference genome assembly
(19,27). Most such mouse indels of this intermediate
length range are made up of repetitive elements. An overwhelming majority of such polymorphisms appears to
have resulted from endogenous retrotransposon integration events (19), which is clearly distinct from human
indels (12,25,28,29).
There are now several genome browsers and databases
available which provide data on SNPs, STRs and other
forms of variation (23,24,30–33). These browsers are
mostly focused on human variants, although other
species including mouse have been developed (34). Other
databases tabulate forms of structural variation that
distinguish human individuals or populations, including
polymorphic transposon integrants and other indels in
humans, but in some cases lack contextual information
about neighboring genomic features (25,35,36). By
contrast, MouseIndelDB is an integrated searchable
database that presents high-resolution information about
indel polymorphisms that distinguish inbred mouse
strains. Through their presentation as a custom track on
the UCSC mouse genome browser, these mouse indel data
now can be visualized easily in the context of many other
important and regular (...truncated)