Enhancer identification in mouse embryonic stem cells using integrative modeling of chromatin and genomic features

BMC Genomics, Apr 2012

Background Epigenetic modifications, transcription factor (TF) availability and differences in chromatin folding influence how the genome is interpreted by the transcriptional machinery responsible for gene expression. Enhancers buried in non-coding regions are found to be associated with significant differences in histone marks between different cell types. In contrast, gene promoters show more uniform modifications across cell types. Here we used histone modification and chromatin-associated protein ChIP-Seq data sets in mouse embryonic stem (ES) cells as well as genomic features to identify functional enhancer regions. Using co-bound sites of OCT4, SOX2 and NANOG (co-OSN, validated enhancers) and co-bound sites of MYC and MYCN (limited enhancer activity) as enhancer positive and negative training sets, we performed multinomial logistic regression with LASSO regularization to identify key features. Results Cross validations reveal that a combination of p300, H3K4me1, MED12 and NIPBL features to be top signatures of co-OSN regions. Using a model from 10 signatures, 83% of top 1277 putative 1 kb enhancer regions (probability greater than or equal to 0.8) overlapped with at least one TF peak from 7 mouse ES cell ChIP-Seq data sets. These putative enhancers are associated with increased gene expression of neighbouring genes and significantly enriched in multiple TF bound loci in agreement with combinatorial models of TF binding. Furthermore, we identified several motifs of known TFs significantly enriched in putative enhancer regions compared to random promoter regions and background. Comparison with an active H3K27ac mark in various cell types confirmed cell type-specificity of these enhancers. Conclusions The top enhancer signatures we identified (p300, H3K4me1, MED12 and NIPBL) will allow for the identification of cell type-specific enhancer regions in diverse cell types.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

http://www.biomedcentral.com/content/pdf/1471-2164-13-152.pdf

Enhancer identification in mouse embryonic stem cells using integrative modeling of chromatin and genomic features

BMC Genomics Enhancer identification in mouse embryonic stem cells using integrative modeling of chromatin and genomic features Chih-yu Chen 0 Quaid Morris 1 Jennifer A Mitchell 0 0 Department of Cell and Systems Biology, University of Toronto , 25 Harbord Street, Toronto, ON, M5S 3G5 , Canada 1 Donnelly Centre for Cellular and Biomolecular Research, University of Toronto , Toronto , Canada Background: Epigenetic modifications, transcription factor (TF) availability and differences in chromatin folding influence how the genome is interpreted by the transcriptional machinery responsible for gene expression. Enhancers buried in non-coding regions are found to be associated with significant differences in histone marks between different cell types. In contrast, gene promoters show more uniform modifications across cell types. Here we used histone modification and chromatin-associated protein ChIP-Seq data sets in mouse embryonic stem (ES) cells as well as genomic features to identify functional enhancer regions. Using co-bound sites of OCT4, SOX2 and NANOG (co-OSN, validated enhancers) and co-bound sites of MYC and MYCN (limited enhancer activity) as enhancer positive and negative training sets, we performed multinomial logistic regression with LASSO regularization to identify key features. Results: Cross validations reveal that a combination of p300, H3K4me1, MED12 and NIPBL features to be top signatures of co-OSN regions. Using a model from 10 signatures, 83% of top 1277 putative 1 kb enhancer regions (probability greater than or equal to 0.8) overlapped with at least one TF peak from 7 mouse ES cell ChIP-Seq data sets. These putative enhancers are associated with increased gene expression of neighbouring genes and significantly enriched in multiple TF bound loci in agreement with combinatorial models of TF binding. Furthermore, we identified several motifs of known TFs significantly enriched in putative enhancer regions compared to random promoter regions and background. Comparison with an active H3K27ac mark in various cell types confirmed cell type-specificity of these enhancers. Conclusions: The top enhancer signatures we identified (p300, H3K4me1, MED12 and NIPBL) will allow for the identification of cell type-specific enhancer regions in diverse cell types. Enhancer; Embryonic stem cells; Transcription factor; ChIP-Seq; Histone methylation; Regulation of gene expression - Background Chromatin immunoprecipitation followed by massively parallel sequencing (ChIP-Seq) has enabled genomewide investigation of chromatin features and epigenetic modifications within the non-coding regions of mammalian genomes in high resolution [1]. ChIP-Seq provides the opportunity to characterise and begin to understand on a genome-wide scale how genes are regulated in a celltype specific manner by sequence-specific DNA-binding transcription factors (TFs). However, identifying regulatory regions within the genome and linking these regions to the regulation of specific genes remains a challenge. Distal regulatory elements have been identified which regulate gene transcription from several kilobases (kb) away and have even been found to regulate genes located on separate chromosomes [2-4]. Functional characterisation of these regulatory elements can be done by identifying bound TFs and investigating whether or not they act as enhancers, increasing transcription of a gene in a position and orientation independent manner. ChIP-Seq analysis for several TFs has revealed a significant fraction (4060%) of the binding sites for most TFs are located in intergenic regions >10 kb from transcription start sites (TSSs) of annotated genes [5-7]. In addition, enhancer regions are associated with significant epigenetic differences between cell types, while gene promoters show more uniform modifications across different cell types [8,9]. These findings suggest that enhancers, which can be located at great distances from the genes they regulate, play a larger role in regulating tissue-specific gene expression than the sequences proximal to gene promoters. Moreover, mutations in DNA sequences of distant-acting enhancers contribute to various diseases [10], further stressing their importance in regulating gene expression. Prior to the availability of ChIP-Seq and ChIP-chip data, computational approaches based solely on genomic sequences were used to identify enhancer regions. Initially these approaches compared the genomic sequence with TF binding motifs represented by position specific scoring matrices (PSSM) from TRANSFAC [11] and JASPAR [12]. TF motif clustering and comparative genomics improved the predictive power of these approaches [13-16]. In addition, intergenic regions with high sequence conservation between human and Fugu or ultra-conserved regions between human-mouse-rat (>200 bp of 100% identity) are predictive of regulatory regions involved in conserved processes such as embryonic development [17,18]. As many enhancer regio (...truncated)


This is a preview of a remote PDF: http://www.biomedcentral.com/content/pdf/1471-2164-13-152.pdf

Chih-yu Chen, Quaid Morris, Jennifer A Mitchell. Enhancer identification in mouse embryonic stem cells using integrative modeling of chromatin and genomic features, BMC Genomics, 2012, pp. 152, 13, DOI: 10.1186/1471-2164-13-152