Centromere identity from the DNA point of view
Miroslav Plohl
Nevenka Metrovi
Brankica Mravinac
The centromere is a chromosomal locus responsible for the faithful segregation of genetic material during cell division. It has become evident that centromeres can be established literally on any DNA sequence, and the possible synergy between DNA sequences and the most prominent centromere identifiers, protein components, and epigenetic marks remains uncertain. However, some evolutionary preferences seem to exist, and long-term established centromeres are frequently formed on long arrays of satellite DNAs and/or transposable elements. Recent progress in understanding functional centromere sequences is based largely on the high-resolution DNA mapping of sequences that interact with the centromere-specific histone H3 variant, the most reliable marker of active centromeres. In addition, sequence assembly and mapping of large repetitive centromeric regions, as well as comparative genome analyses offer insight into their complex organization and evolution. The rapidly advancing field of transcription in centromere regions highlights the functional importance of centromeric transcripts. Here, we comprehensively review the current state of knowledge on the composition and functionality of DNA sequences underlying active centromeres and discuss their contribution to the functioning of different centromere types in higher eukaryotes.
-
An essential function of genetic material in any living
organism is its faithful segregation, the role which is in eukaryotes
determined by the centromere. The centromere includes the
core or functional centromere domain, a specialized locus at
which microtubules attach to the complex multiprotein
structure of the kinetochore in order to segregate chromosomes in
mitosis and meiosis. The core centromere domain is
surrounded by large blocks of pericentromeric
heterochromatin (also called the pericentromere), primary sites of sister
chromatid cohesion. Centromere functionality is vital for all
eukaryotic organisms. In addition to understanding its role as a
biological structure, studying the centromere is also highly
relevant from a biomedical point of view, because
abnormalities in centromeric function are often lethal or associated with
various congenital and acquired diseases, such as cancer,
infertility, and birth disorders (reviewed in Thompson et al.
2010).
Centromeres are considered to be shaped by both
genomic and epigenetic mechanisms, but the synergy between
DNA sequences, protein components, and epigenetic
marks is still not well understood. In the absence of a
universal DNA sequence, species-specific histone H3
variant CENH3 (CENP-A in mammals, CID in Drosophila
melanogaster, Cse4 in Saccharomyces cerevisiae) is the
most prominent protein identifier of centromere function.
Related forms of this protein have been detected in all
studied active centromeres of single-cell and multicellular
eukaryotes (Black and Bassett 2008; Malik and Henikoff
2009). CENH3 replaces the canonical histone H3 in such a
way that arrays of CENH3-based nucleosomes alternate
with those containing canonical H3 (Blower et al. 2002;
Sullivan and Karpen 2004). In humans and flies, canonical
H3 is in turn epigenetically modified in the centromere, by
dimethylation at lysine 4 (H3K4me2), and thus distinctive
from the histone H3 in adjacent pericentromeric
heterochromatin, which is marked by methylation at lysine 9
(H3K9me). These differences qualify centromeric
chromatin as a unique chromatin type centrochromatin (Sullivan
and Karpen 2004).
In the budding yeast S. cerevisiae, centromere function
depends on a short, about 100 bp long DNA sequence motif.
These centromeres are referred to as simple or point
centromeres (Hyman and Sorger 1995). In all other eukaryotes,
centromeres are founded on repetitive DNA arrays of several
hundred kilobase, commonly known as complex or regional
centromeres (Pluta et al. 1995). A single centromere is
normally formed on each chromosome in a locus which is on the
cytogenetical level recognized as a primary constriction of the
monocentric chromosome. However, there are exceptions,
and some organisms have holocentric chromosomes that lack
a primary constriction and comprise of a centromere dispersed
in many subdomains along the entire chromosome length
(Dernburg 2001).
Mostly, due to limitations in sequencing and assembly of
long arrays of nearly-identical repeats, our knowledge on the
long-range functional organization of centromeric DNA is
rather limited, and centromeres still represent the last frontiers
in genome assemblies and sequence annotations (Hayden and
Willard 2012). Here, we review the rapidly progressing field
of functional centromere genomics. We present data relating
DNA sequences and their functional interactions in different
centromere types of higher eukaryotes, and point to the
significance of transcriptional potential of centromeric
sequences.
Repetitive DNA sequences are the most common
centromere components
Two classes of highly abundant repetitive sequences, satellite
DNAs (satDNAs) and transposable elements (TEs), represent
major DNA components of many centromeric regions. Both
groups of sequences are extremely divergent, and
understanding the mechanisms of their accumulation, diversification,
protein-binding capacity, and linear distribution is essential
for a complete picture of centromere genomics, both from a
structural and functional perspective. Characteristics of
functional DNA sequences and other abundant DNAs contributing
to centromere region of the most common model organisms of
higher eukaryotes are presented in Table 1.
SatDNAs are a class of diverse tandemly repeated DNA
sequences that comprise long arrays localized in a tightly
packed heterochromatin. Features of satDNA sequences in
centromeric regions have already been reviewed in detail
(Plohl et al. 2008, 2012). A recent comprehensive
bioinformatic analysis of centromeric satDNAs in a number of animal
and plant species confirmed the rapid evolution of DNA
sequences in these areas (Melters et al. 2013). Despite the
extreme diversity of satDNA sequences, some sequence
segments can be shared among heterologous repeats. The best
known example is the conserved 17 bp long sequence motif,
the CENP-B box, which is specific for alpha-satDNA in
humans (Ohzeki et al. 2002), as well as in various subclasses
of alphoid repeats in mammalian species (Alkan et al. 2011).
This motif is a binding site for the protein CENP-B, which
probably facilitates kinetochore formation (Masumoto et al.
2004), but might also play a role in rearrangements of satDNA
sequences (Kipling and Warburton 1997). The presence of
CENP-B box-like motifs in unrelated satDNAs of some
distant invertebrates and plants suggests its potential functional
relevance in non-mammalian organisms (Mravinac et al.
2005; Canapa et al. 2000; Metrovi et al. 2013; Gindullis
et al. 2001).
SatDNAs evolve according to the principles of concerted
evolutio (...truncated)