Whole-genome bisulfite sequencing maps from multiple human tissues reveal novel CpG islands associated with tissue-specific regulation

Human Molecular Genetics, Dec 2015

CpG islands (CGIs) are one of the most widely studied regulatory features of the human genome, with critical roles in development and disease. Despite such significance and the original epigenetic definition, currently used CGI sets are typically predicted from DNA sequence characteristics. Although CGIs are deeply implicated in practical analyses of DNA methylation, recent studies have shown that such computational annotations suffer from inaccuracies. Here we used whole-genome bisulfite sequencing from 10 diverse human tissues to identify a comprehensive, experimentally obtained, single-base resolution CGI catalog. In addition to the unparalleled annotation precision, our method is free from potential bias due to arbitrary sequence features or probe affinity differences. In addition to clarifying substantial false positives in the widely used University of California Santa Cruz (UCSC) annotations, our study identifies numerous novel epigenetic loci. In particular, we reveal significant impact of transposable elements on the epigenetic regulatory landscape of the human genome and demonstrate ubiquitous presence of transcription initiation at CGIs, including alternative promoters in gene bodies and non-coding RNAs in intergenic regions. Moreover, coordinated DNA methylation and chromatin modifications mark tissue-specific enhancers at novel CGIs. Enrichment of specific transcription factor binding from ChIP-seq supports mechanistic roles of CGIs on the regulation of tissue-specific transcription. The new CGI catalog provides a comprehensive and integrated list of genomic hotspots of epigenetic regulation.

Article PDF cannot be displayed. You can download it here:

https://hmg.oxfordjournals.org/content/25/1/69.full.pdf

Whole-genome bisulfite sequencing maps from multiple human tissues reveal novel CpG islands associated with tissue-specific regulation

Human Molecular Genetics, 2016, Vol. 25, No. 1 69–82 doi: 10.1093/hmg/ddv449 Advance Access Publication Date: 28 October 2015 Original Article ORIGINAL ARTICLE Whole-genome bisulfite sequencing maps from multiple human tissues reveal novel CpG islands associated with tissue-specific regulation 1 School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA and 2Department of Genetics, Physical Anthropology and Animal Physiology, University of the Basque Country UPV/EHU, Barrio Sarriena s/n, 48940 Leioa, Spain *To whom correspondence should be addressed at: School of Biology, Georgia Institute of Technology, 950 Atlantic Drive, Atlanta, GA 30332, USA. Tel: +1 4043856084; Fax: +1 4048942295; Email: Abstract CpG islands (CGIs) are one of the most widely studied regulatory features of the human genome, with critical roles in development and disease. Despite such significance and the original epigenetic definition, currently used CGI sets are typically predicted from DNA sequence characteristics. Although CGIs are deeply implicated in practical analyses of DNA methylation, recent studies have shown that such computational annotations suffer from inaccuracies. Here we used whole-genome bisulfite sequencing from 10 diverse human tissues to identify a comprehensive, experimentally obtained, single-base resolution CGI catalog. In addition to the unparalleled annotation precision, our method is free from potential bias due to arbitrary sequence features or probe affinity differences. In addition to clarifying substantial false positives in the widely used University of California Santa Cruz (UCSC) annotations, our study identifies numerous novel epigenetic loci. In particular, we reveal significant impact of transposable elements on the epigenetic regulatory landscape of the human genome and demonstrate ubiquitous presence of transcription initiation at CGIs, including alternative promoters in gene bodies and non-coding RNAs in intergenic regions. Moreover, coordinated DNA methylation and chromatin modifications mark tissue-specific enhancers at novel CGIs. Enrichment of specific transcription factor binding from ChIP-seq supports mechanistic roles of CGIs on the regulation of tissue-specific transcription. The new CGI catalog provides a comprehensive and integrated list of genomic hotspots of epigenetic regulation. Introduction Since their initial discovery almost three decades ago (1–3), numerous studies have established the critical importance of CpG islands (CGIs) in fundamental regulatory and developmental processes (4–8). Originally defined as hypomethylated stretches of CpG-rich sequences (1–3), CGIs punctuate otherwise heavily methylated, CpG-depleted mammalian genomes (9–13). Cell type- and tissue-specific CGI methylation is a key regulatory signal for genomic imprinting (14), gene expression regulation (4) and developmental programming (5,7,11,15). Aberrant CGI methylation is implicated in numerous diseases, particularly cancers (16,17) and neurodevelopmental disorders (18). Even though CGIs were originally experimentally defined (1), subsequent annotations of CGIs relied on sequence-based computational algorithms, due to the lack of actual DNA methylation data (2,19–21). These computational algorithms have been Received: May 21, 2015. Revised: October 2, 2015. Accepted: October 21, 2015 © The Author 2015. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. 69 Isabel Mendizabal1,2 and Soojin V. Yi1, * 70 | Human Molecular Genetics, 2016, Vol. 25, No. 1 Indeed, important efforts have previously been made to generate an accurate CGI data set (5,22,24). However, these early studies lacked DNA methylation maps with nucleotide-level resolution. They were also limited to only a few tissue types. Here, we utilize whole-genome bisulfite sequencing data sets (11,15,29–34) generated from diverse cell types, including embryonic stem cells (ESCs), germ cells, fetal tissues and six adult somatic tissues spanning all three germ layers (Fig. 1A). From this comprehensive collection of whole-genome methylation maps, we identified more than 50 000 experimentally supported CGIs (‘eCGIs’). The eCGI catalog presented here is the most comprehensive experimentally defined bona fide CGI catalog to date, revealing a large number of novel CGIs that were previously undetected. This experimental definition allows for the discovery of hypomethylated CpG clusters associated with constitutively expressed genes, thereby expanding the list of CGI genes. Moreover, in contrast to the housekeeping nature of classical promoter CGIs, many novel eCGIs show promoter- and enhancer-like chromatin features and associate with facultative transcription factors (TFs) to putatively regulate tissue-specific coding and non-coding transcription. Figure 1. (A) Tissues analyzed for eCGI identification, including embryonic, gonad, germ line and fetal tissues, as well as six adult somatic tissues of distinct developmental origins. These were selected to have the highest cell type diversity with respect to gene expression patterns (68) while avoiding overly cell heterogeneous tissues. Ovaries comprise germ-line cells and endoderm-derived tissue. The adrenal gland has both ectodermal (medulla) and mesodermal (cortex) origins. (B) The genomic distribution of eCGIs. (C) The correlation between the numbers of protein-coding genes and eCGIs on each chromosome. (D). Distribution of eCGIs and cCGIs across tissues. extremely valuable for almost two decades. However, whether computationally identified CGIs truly represent hypomethylated CpG clusters has recently been called into question by genomewide methylation surveys. For example, substantial numbers of computationally defined CGIs are consistently hypermethylated in several tissues (5,22,23) (i.e. false positives). Moreover, many hypomethylated CpG-rich sequences (representing the very definition of CGIs) are missing from the computationally annotated CGI sets (5,24) (i.e. false negatives). Furthermore, a considerable fraction of CGIs has undergone CpG loss during recent evolution, suggesting that they are constitutively methylated and are not bona fide CGIs (25). With the developments of techniques to identify different types of hypomethylated genomic regions (26–28), it is feasible that the term CpG island itself may even be replaced with some other terms in the future. Nevertheless, CGIs are still one of the most widely analyzed genomic elements in epigenetic analyses, and many commercial toolkits preferentially target them (23). Consequently, re-visiting the epigenetic definition of CGIs and providing an experimentally defined CGI catalog that overcomes the limitations of computational predictions will off (...truncated)


This is a preview of a remote PDF: https://hmg.oxfordjournals.org/content/25/1/69.full.pdf
Article home page: http://hmg.oxfordjournals.org/content/25/1/69.abstract

Isabel Mendizabal, Soojin V. Yi. Whole-genome bisulfite sequencing maps from multiple human tissues reveal novel CpG islands associated with tissue-specific regulation, Human Molecular Genetics, 2016, pp. 69-82, 25/1, DOI: 10.1093/hmg/ddv449