Dissecting Nucleosome Free Regions by a Segmental Semi-Markov Model (pdf)

Article PDF cannot be displayed. You can download it here:

http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0004721&type=printable

Dissecting Nucleosome Free Regions by a Segmental Semi-Markov Model

Citation: Sun W, Xie W, Xu F, Grunstein M, Li K-C ( Dissecting Nucleosome Free Regions by a Segmental Semi-Markov Model Wei Sun 0 Wei Xie 0 Feng Xu 0 Michael Grunstein 0 Ker-Chau Li 0 Enrico Scalas, University of East Piedmont, Italy 0 1 Department of Biostatistics, Carolina Center for Genome Science, University of North Carolina, Chapel Hill, North Carolina, United States of America, 2 Department of Genetics, Carolina Center for Genome Science, University of North Carolina, Chapel Hill, North Carolina, United States of America, 3 Department of Biological Chemistry, University of California Los Angeles , Los Angeles , California, United States of America, 4 Department of Statistics, University of California Los Angeles , Los Angeles , California, United States of America, 5 Institute of Statistical Science, Genomics Research Center , Academia Sinica, Taipei , Taiwan Background: Nucleosome free regions (NFRs) play important roles in diverse biological processes including gene regulation. A genome-wide quantitative portrait of each individual NFR, with their starting and ending positions, lengths, and degrees of nucleosome depletion is critical for revealing the heterogeneity of gene regulation and chromatin organization. By averaging nucleosome occupancy levels, previous studies have identified the presence of NFRs in the promoter regions across many genes. However, evaluation of the quantitative characteristics of individual NFRs requires an NFR calling method. Methodology: In this study, we propose a statistical method to identify the patterns of NFRs from a genome-wide measurement of nucleosome occupancy. This method is based on an appropriately designed segmental semi-Markov model, which can capture each NFR pattern and output its quantitative characterizations. Our results show that the majority of the NFRs are located in intergenic regions or promoters with a length of about 400-600bp and varying degrees of nucleosome depletion. Our quantitative NFR mapping allows for an investigation of the relative impacts of transcription machinery and DNA sequence in evicting histones from NFRs. We show that while both factors have significant overall effects, their specific contributions vary across different subtypes of NFRs. Conclusion: The emphasis of our approach on the variation rather than the consensus of nucleosome free regions sets the tone for enabling the exploration of many subtler dynamic aspects of chromatin biology. - Funding: This work is supported by NIH grants to M.G. and by NSF grants DMS-0201005, DMS-0406091, and DMS-0707160 to K.C.L. K.C.L.s work is also supported in part by MIB, Institute of Statistical Science, Academia Sinica and grant NSC95-3114-P-002-005-Y and NSC97-2627-P001-003. W.S.s work is also supported in part by the United States Environmental protection Agency grant (RD833825). However, the research described in this article has not been subjected to the Agencys peer review and policy review and therefore does not necessarily reflect the views of the Agency and no official endorsement should be inferred. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. . These authors contributed equally to this work. Nucleosomes, the building blocks of chromatin, are critical regulators in many biological processes, such as transcription, DNA repair and replication [1]. The presence of nucleosomes in many occasions hinders the accessibility of the transcriptional machinery to access the underlying DNA. Conversely, nucleosome depletion allows access of transcriptional regulators to DNA sequences [26]. This underlines the importance of locating nucleosome positions, a goal that has been attained by several groups in yeast [26] and mammals [78]. Despite the availability of genome wide nucleosome distribution profiles, several fundamental questions regarding the nature of nucleosome free regions (NFRs) remain unknown. First, it is not clear whether NFRs occur exclusively at the promoter regions. NFRs in non-promoter regions (including coding regions) may have functions that have not been identified. Second, it has been controversial whether histones are depleted only from the promoters of activated genes. Several studies suggested the existence of transcription-independent NFRs at individual promoters [710]. Finally, the transcriptional machinery and DNA sequence have been shown to be involved in histone eviction [911], however, they may have distinct effects for different subtypes of NFRs To investigate the above issues, it is important to bring out the dynamic aspects of NFRs. Due to the complex interplay between gene regulation and chromatin remodeling, the lengths of NFRs may differ from one another. Likewise, the degree of nucleosome depletion (DoND) in each NFR is likely to vary as well. However, while many previous studies have described nucleosome occupancy in quantitative terms, most of them focused on ensemble properties of NFRs. For instance, representative nucleosome occupancy in the promoter regions have been reported by averaging the enrichments of nucleosomes across all genes aligned by the start codons of ORFs or transcription start sites (TSSs) [5,6]. Although this representative NFR reveals a shared pattern of nucleosome depletion for many genes, it also masks characteristics specific for individual NFRs. On the other hand, reports such as Fig. 4 of Lee et al. [6] and Fig. 2 of Whitehouse et al. [14] did indicate length variation among NFRs. However, the location and quantitative features of each individual NFR have not been systematically explored. In order to examine an individual NFR across the whole genome, an automatic NFR calling algorithm is required that can dissect an NFR pattern from a noisy background. Currently the major existing algorithms facilitating the analysis of genomewide DNA-protein interaction data were adapted from those initially designed for detecting the binding of transcription factors (TFs) [1216]. These algorithms are however inadequate for capturing NFRs for the following reasons: first, as most TFs are sparsely localized across the genome, many algorithms for identifying TF binding sites (TFBSs) are designed under the assumption that TF binding is an uncommon event, thus the majority of array data is considered background noise. However, this assumption becomes problematic for exploring epigenetic events that are often abundant, including the occupancy of nucleosomes and a variety of histone modifications. Second, the signal of TF binding obtained from microarrays typically occurs within a short region and tends to form a sharp peak (Supplementary Figure 1a in Supplementary Materials S1). In contrast, the pattern of nucleosome occupancy or the occurrence of histone modifications can be much longer with various lengths (Supplementary Figure 1b in (...truncated)