Dissecting Nucleosome Free Regions by a Segmental Semi-Markov Model
Citation: Sun W, Xie W, Xu F, Grunstein M, Li K-C (
Dissecting Nucleosome Free Regions by a Segmental Semi-Markov Model
Wei Sun 0
Wei Xie 0
Feng Xu 0
Michael Grunstein 0
Ker-Chau Li 0
Enrico Scalas, University of East Piedmont, Italy
0 1 Department of Biostatistics, Carolina Center for Genome Science, University of North Carolina, Chapel Hill, North Carolina, United States of America, 2 Department of Genetics, Carolina Center for Genome Science, University of North Carolina, Chapel Hill, North Carolina, United States of America, 3 Department of Biological Chemistry, University of California Los Angeles , Los Angeles , California, United States of America, 4 Department of Statistics, University of California Los Angeles , Los Angeles , California, United States of America, 5 Institute of Statistical Science, Genomics Research Center , Academia Sinica, Taipei , Taiwan
Background: Nucleosome free regions (NFRs) play important roles in diverse biological processes including gene regulation. A genome-wide quantitative portrait of each individual NFR, with their starting and ending positions, lengths, and degrees of nucleosome depletion is critical for revealing the heterogeneity of gene regulation and chromatin organization. By averaging nucleosome occupancy levels, previous studies have identified the presence of NFRs in the promoter regions across many genes. However, evaluation of the quantitative characteristics of individual NFRs requires an NFR calling method. Methodology: In this study, we propose a statistical method to identify the patterns of NFRs from a genome-wide measurement of nucleosome occupancy. This method is based on an appropriately designed segmental semi-Markov model, which can capture each NFR pattern and output its quantitative characterizations. Our results show that the majority of the NFRs are located in intergenic regions or promoters with a length of about 400-600bp and varying degrees of nucleosome depletion. Our quantitative NFR mapping allows for an investigation of the relative impacts of transcription machinery and DNA sequence in evicting histones from NFRs. We show that while both factors have significant overall effects, their specific contributions vary across different subtypes of NFRs. Conclusion: The emphasis of our approach on the variation rather than the consensus of nucleosome free regions sets the tone for enabling the exploration of many subtler dynamic aspects of chromatin biology.
-
Funding: This work is supported by NIH grants to M.G. and by NSF grants DMS-0201005, DMS-0406091, and DMS-0707160 to K.C.L. K.C.L.s work is also
supported in part by MIB, Institute of Statistical Science, Academia Sinica and grant NSC95-3114-P-002-005-Y and NSC97-2627-P001-003. W.S.s work is also
supported in part by the United States Environmental protection Agency grant (RD833825). However, the research described in this article has not been subjected
to the Agencys peer review and policy review and therefore does not necessarily reflect the views of the Agency and no official endorsement should be inferred.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing Interests: The authors have declared that no competing interests exist.
. These authors contributed equally to this work.
Nucleosomes, the building blocks of chromatin, are critical
regulators in many biological processes, such as transcription,
DNA repair and replication [1]. The presence of nucleosomes in
many occasions hinders the accessibility of the transcriptional
machinery to access the underlying DNA. Conversely, nucleosome
depletion allows access of transcriptional regulators to DNA
sequences [26]. This underlines the importance of locating
nucleosome positions, a goal that has been attained by several
groups in yeast [26] and mammals [78].
Despite the availability of genome wide nucleosome distribution
profiles, several fundamental questions regarding the nature of
nucleosome free regions (NFRs) remain unknown. First, it is not
clear whether NFRs occur exclusively at the promoter regions.
NFRs in non-promoter regions (including coding regions) may have
functions that have not been identified. Second, it has been
controversial whether histones are depleted only from the promoters
of activated genes. Several studies suggested the existence of
transcription-independent NFRs at individual promoters [710].
Finally, the transcriptional machinery and DNA sequence have
been shown to be involved in histone eviction [911], however, they
may have distinct effects for different subtypes of NFRs
To investigate the above issues, it is important to bring out the
dynamic aspects of NFRs. Due to the complex interplay between
gene regulation and chromatin remodeling, the lengths of NFRs
may differ from one another. Likewise, the degree of nucleosome
depletion (DoND) in each NFR is likely to vary as well. However,
while many previous studies have described nucleosome
occupancy in quantitative terms, most of them focused on ensemble
properties of NFRs. For instance, representative nucleosome
occupancy in the promoter regions have been reported by
averaging the enrichments of nucleosomes across all genes aligned
by the start codons of ORFs or transcription start sites (TSSs)
[5,6]. Although this representative NFR reveals a shared pattern
of nucleosome depletion for many genes, it also masks
characteristics specific for individual NFRs. On the other hand, reports such
as Fig. 4 of Lee et al. [6] and Fig. 2 of Whitehouse et al. [14] did
indicate length variation among NFRs. However, the location and
quantitative features of each individual NFR have not been
systematically explored.
In order to examine an individual NFR across the whole
genome, an automatic NFR calling algorithm is required that
can dissect an NFR pattern from a noisy background. Currently
the major existing algorithms facilitating the analysis of
genomewide DNA-protein interaction data were adapted from those
initially designed for detecting the binding of transcription factors
(TFs) [1216]. These algorithms are however inadequate for
capturing NFRs for the following reasons: first, as most TFs are
sparsely localized across the genome, many algorithms for
identifying TF binding sites (TFBSs) are designed under the
assumption that TF binding is an uncommon event, thus the
majority of array data is considered background noise. However,
this assumption becomes problematic for exploring epigenetic
events that are often abundant, including the occupancy of
nucleosomes and a variety of histone modifications. Second, the
signal of TF binding obtained from microarrays typically occurs
within a short region and tends to form a sharp peak
(Supplementary Figure 1a in Supplementary Materials S1). In
contrast, the pattern of nucleosome occupancy or the occurrence
of histone modifications can be much longer with various lengths
(Supplementary Figure 1b in (...truncated)