SelTarbase, a database of human mononucleotide-microsatellite mutations and their potential impact to tumorigenesis and immunology
Stefan M. Woerner
1
2
Yan P. Yuan
0
Axel Benner
3
Sebastian Korff
1
Magnus von Knebel Doeberitz
1
Peer Bork
0
0
Structural and Computational Biology Unit, European Molecular Biology Laboratory
, Meyerhofstr. 1, D-69117 Heidelberg
1
Applied Tumorbiology, Institute of Pathology, University Hospital Heidelberg
, Im Neuenheimer Feld 220/221, D-69120 Heidelberg
2
Institute for Clinical Chemistry, Medical Faculty of Mannheim of the University of Heidelberg
, Theodor-Kutzer-Ufer 1-3, D-68167 Mannheim
3
Central Unit Biostatistics, German Cancer Research Center
, Im Neuenheimer Feld 280, D-69120 Heidelberg,
Germany
-
About 15% of human colorectal cancers and,
at varying degrees, other tumor entities as well as
nearly all tumors related to Lynch syndrome are
hallmarked by microsatellite instability (MSI) as
a result of a defective mismatch repair system.
The functional impact of resulting mutations
depends on their genomic localization. Alterations
within coding mononucleotide repeat tracts
(MNRs) can lead to protein truncation and formation
of neopeptides, whereas alterations within
untranslated MNRs can alter transcription level or
transcript stability. These mutations may provide
selective advantage or disadvantage to affected
cells. They may further concern the biology of
microsatellite unstable cells, e.g. by generating
immunogenic peptides induced by frameshifts
mutations. The Selective Targets database
(http://www.seltarbase.org) is a curated database
of a growing number of public MNR mutation
data in microsatellite unstable human tumors.
Regression calculations for various MSIH tumor
entities indicating statistically deviant mutation
frequencies predict TGFBR2, BAX, ACVR2A and
others that are shown or highly suspected to be
involved in MSI tumorigenesis. Many useful tools
for further analyzing genomic DNA, derived
wildtype and mutated cDNAs and peptides are
integrated. A comprehensive database of all
human coding, untranslated, non-coding RNA- and
intronic MNRs (MNR_ensembl) is also included.
Herewith, SelTarbase presents as a plenty
instrument for MSI-carcinogenesis-related research,
diagnostics and therapy.
The completion of the human genome project in 2003
provided the data basis for genome-wide analyses (1).
Now it became within reach to systematically investigate
the whole human genome for sequence motifs or
structures by computer assisted investigation to clarify
the association of genome variation or mutation with
certain human diseases using the human genome draft as
a consensus. Currently, there are more than 22 000 known
protein-coding genes annotated within the 3 G of base
pairs within Human Ensembl (rel. 55.37, http://www
.ensembl.org/Homo_sapiens/) leading to more than
100 000 transcripts. Sequence motifs of special interest
comprise single nucleotide polymorphisms (SNPs), splice
site recognition patterns or promoter motifs, regulatory
motifs and binding sites.
The human genome sequence also facilitated the
systematic search for human microsatellites that had been
started earlier based on EMBL DNA and mRNA data
(2). Microsatellites are especially prone to deletion and
insertion mutations during DNA replication with a
strong dependency of mutability from their length (3).
They are distributed non-randomly throughout the
whole human genome within non-coding and coding
regions (4). Their function, however, is nearly unknown.
Mononucleotide repeats (MNRs) seem to represent
the most interesting kind of microsatellites. The length
of coding MNRs (cMNRs) is conserved (5). Length
alterations of cMNRs of 1 or 2 nucleotides lead to
frameshift mutations. The length of non-coding MNRs
however can vary highly from individual to individual.
However, there are also a number of so-called
quasimonomorphic MNRs of higher length (2040 bp) within
non-coding regions that show a significantly restricted
length variation within the human population which
may indicate the possibility of functional relevance of
these non-coding MNRs. It is well known, that alterations
in polypyrimidine MNRs in the 50 local neighborhood of
splice donor sites can lead to exon skipping (6,7), which
will result in a frameshift situation in two-thirds (8). In
addition, shortening or elongation of MNRs within 50
UTRs can have an impact on the transcription level, of
those in the 30 UTR on transcript stability of the respective
mRNA (9).
Microsatellite alterations are corrected by the DNA
mismatch repair system (MMR). The functional
inactivation of the MMR system results in the manifestation
of microsatellite mutations which is termed microsatellite
instability (MSI). The MSI phenotype is found in >90%
of tumors developing in MMR germline mutation carriers
among hereditary non-polyposis colorectal cancer
(HNPCC) or Lynch syndrome patients and 15% of
sporadic cancers (10). Colorectal MSIH tumors are
characterized by certain clinico-histopathological
properties such as a better prognosis compared to tumors
of the CIN phenotype (1113). (...truncated)