Centromere identity from the DNA point of view (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007%2Fs00412-014-0462-0.pdf

Centromere identity from the DNA point of view

Miroslav Plohl Nevenka Metrovi Brankica Mravinac The centromere is a chromosomal locus responsible for the faithful segregation of genetic material during cell division. It has become evident that centromeres can be established literally on any DNA sequence, and the possible synergy between DNA sequences and the most prominent centromere identifiers, protein components, and epigenetic marks remains uncertain. However, some evolutionary preferences seem to exist, and long-term established centromeres are frequently formed on long arrays of satellite DNAs and/or transposable elements. Recent progress in understanding functional centromere sequences is based largely on the high-resolution DNA mapping of sequences that interact with the centromere-specific histone H3 variant, the most reliable marker of active centromeres. In addition, sequence assembly and mapping of large repetitive centromeric regions, as well as comparative genome analyses offer insight into their complex organization and evolution. The rapidly advancing field of transcription in centromere regions highlights the functional importance of centromeric transcripts. Here, we comprehensively review the current state of knowledge on the composition and functionality of DNA sequences underlying active centromeres and discuss their contribution to the functioning of different centromere types in higher eukaryotes. - An essential function of genetic material in any living organism is its faithful segregation, the role which is in eukaryotes determined by the centromere. The centromere includes the core or functional centromere domain, a specialized locus at which microtubules attach to the complex multiprotein structure of the kinetochore in order to segregate chromosomes in mitosis and meiosis. The core centromere domain is surrounded by large blocks of pericentromeric heterochromatin (also called the pericentromere), primary sites of sister chromatid cohesion. Centromere functionality is vital for all eukaryotic organisms. In addition to understanding its role as a biological structure, studying the centromere is also highly relevant from a biomedical point of view, because abnormalities in centromeric function are often lethal or associated with various congenital and acquired diseases, such as cancer, infertility, and birth disorders (reviewed in Thompson et al. 2010). Centromeres are considered to be shaped by both genomic and epigenetic mechanisms, but the synergy between DNA sequences, protein components, and epigenetic marks is still not well understood. In the absence of a universal DNA sequence, species-specific histone H3 variant CENH3 (CENP-A in mammals, CID in Drosophila melanogaster, Cse4 in Saccharomyces cerevisiae) is the most prominent protein identifier of centromere function. Related forms of this protein have been detected in all studied active centromeres of single-cell and multicellular eukaryotes (Black and Bassett 2008; Malik and Henikoff 2009). CENH3 replaces the canonical histone H3 in such a way that arrays of CENH3-based nucleosomes alternate with those containing canonical H3 (Blower et al. 2002; Sullivan and Karpen 2004). In humans and flies, canonical H3 is in turn epigenetically modified in the centromere, by dimethylation at lysine 4 (H3K4me2), and thus distinctive from the histone H3 in adjacent pericentromeric heterochromatin, which is marked by methylation at lysine 9 (H3K9me). These differences qualify centromeric chromatin as a unique chromatin type centrochromatin (Sullivan and Karpen 2004). In the budding yeast S. cerevisiae, centromere function depends on a short, about 100 bp long DNA sequence motif. These centromeres are referred to as simple or point centromeres (Hyman and Sorger 1995). In all other eukaryotes, centromeres are founded on repetitive DNA arrays of several hundred kilobase, commonly known as complex or regional centromeres (Pluta et al. 1995). A single centromere is normally formed on each chromosome in a locus which is on the cytogenetical level recognized as a primary constriction of the monocentric chromosome. However, there are exceptions, and some organisms have holocentric chromosomes that lack a primary constriction and comprise of a centromere dispersed in many subdomains along the entire chromosome length (Dernburg 2001). Mostly, due to limitations in sequencing and assembly of long arrays of nearly-identical repeats, our knowledge on the long-range functional organization of centromeric DNA is rather limited, and centromeres still represent the last frontiers in genome assemblies and sequence annotations (Hayden and Willard 2012). Here, we review the rapidly progressing field of functional centromere genomics. We present data relating DNA sequences and their functional interactions in different centromere types of higher eukaryotes, and point to the significance of transcriptional potential of centromeric sequences. Repetitive DNA sequences are the most common centromere components Two classes of highly abundant repetitive sequences, satellite DNAs (satDNAs) and transposable elements (TEs), represent major DNA components of many centromeric regions. Both groups of sequences are extremely divergent, and understanding the mechanisms of their accumulation, diversification, protein-binding capacity, and linear distribution is essential for a complete picture of centromere genomics, both from a structural and functional perspective. Characteristics of functional DNA sequences and other abundant DNAs contributing to centromere region of the most common model organisms of higher eukaryotes are presented in Table 1. SatDNAs are a class of diverse tandemly repeated DNA sequences that comprise long arrays localized in a tightly packed heterochromatin. Features of satDNA sequences in centromeric regions have already been reviewed in detail (Plohl et al. 2008, 2012). A recent comprehensive bioinformatic analysis of centromeric satDNAs in a number of animal and plant species confirmed the rapid evolution of DNA sequences in these areas (Melters et al. 2013). Despite the extreme diversity of satDNA sequences, some sequence segments can be shared among heterologous repeats. The best known example is the conserved 17 bp long sequence motif, the CENP-B box, which is specific for alpha-satDNA in humans (Ohzeki et al. 2002), as well as in various subclasses of alphoid repeats in mammalian species (Alkan et al. 2011). This motif is a binding site for the protein CENP-B, which probably facilitates kinetochore formation (Masumoto et al. 2004), but might also play a role in rearrangements of satDNA sequences (Kipling and Warburton 1997). The presence of CENP-B box-like motifs in unrelated satDNAs of some distant invertebrates and plants suggests its potential functional relevance in non-mammalian organisms (Mravinac et al. 2005; Canapa et al. 2000; Metrovi et al. 2013; Gindullis et al. 2001). SatDNAs evolve according to the principles of concerted evolutio (...truncated)