Methylation-level inferences and detection of differential methylation with MeDIP-seq data

PLOS ONE, Sep 2018

DNA methylation is an essential epigenetic modification involved in regulating the expression of mammalian genomes. A variety of experimental approaches to generate genome-wide or whole-genome DNA methylation data have emerged in recent years. Methylated DNA immunoprecipitation followed by sequencing (MeDIP-seq) is one of the major tools used in whole-genome epigenetic studies. However, analyzing this data in terms of accuracy, sensitivity, and speed still remains an important challenge. Existing methods, such as BATMAN and MEDIPS, analyze MeDIP-seq data by dividing the whole genome into equal length windows and assume that each CpG of the same window has the same methylation level. More precise work is necessary to estimate the methylation level of each CpG site in the whole genome. In this paper, we propose a Statistical Inferences with MeDIP-seq Data (SIMD) to infer the methylation level for each CpG site. In addition, we analyze a real dataset for DNA methylation. The results show that our method displays improved precision in detecting differentially methylated CpG sites compared to the existing method. To meet the demands of the application, we have developed an R package called “SIMD”, which is freely available in https://github.com/FocusPaka/SIMD.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0201586&type=printable

Methylation-level inferences and detection of differential methylation with MeDIP-seq data

August Methylation-level inferences and detection of differential methylation with MeDIP-seq data Yan Zhou 0 1 Jiadi Zhu 0 1 Mingtao Zhao 1 Baoxue Zhang 1 Chunfu Jiang 0 1 Xiyan Yang 1 0 College of Mathematics and Statistics, Institute of Statistical Sciences, Shenzhen University , Shenzhen , China , 2 Institute of Statistics and Applied Mathematics, Anhui University of Finance & Economics , Bengbu, Anhui , China , 3 School of Statistics, Capital University of Economics and Business , Beijing , China , 4 School of Financial Mathematics and Statistics, Guangdong University of Finance , Guangzhou , China 1 Editor: Y-h. Taguchi, Chuo University , JAPAN DNA methylation is an essential epigenetic modification involved in regulating the expression of mammalian genomes. A variety of experimental approaches to generate genomewide or whole-genome DNA methylation data have emerged in recent years. Methylated DNA immunoprecipitation followed by sequencing (MeDIP-seq) is one of the major tools used in whole-genome epigenetic studies. However, analyzing this data in terms of accuracy, sensitivity, and speed still remains an important challenge. Existing methods, such as BATMAN and MEDIPS, analyze MeDIP-seq data by dividing the whole genome into equal length windows and assume that each CpG of the same window has the same methylation level. More precise work is necessary to estimate the methylation level of each CpG site in the whole genome. In this paper, we propose a Statistical Inferences with MeDIP-seq Data (SIMD) to infer the methylation level for each CpG site. In addition, we analyze a real dataset for DNA methylation. The results show that our method displays improved precision in detecting differentially methylated CpG sites compared to the existing method. To meet the demands of the application, we have developed an R package called ªSIMDº, which is freely available in https://github.com/FocusPaka/SIMD. - Data Availability Statement: All relevant data are within the paper. Funding: Yan Zhou's research was supported by the National Science Foundation of China (Grant No. 11701385), the National Statistical Research Project (Grant No. 2017LY56), and the Doctor start fund of Guangdong Province (Grant No. 2016A030310062). Mingtao Zhao's research was supported by the National Social Science Foundation of China (15CTJ008). Xiyan Yang's research was supported by the National Science Foundation of China (Grant No. 11601094). Baoxue Zhang's research was supported by the National 1 Introduction Several studies have shown that methylation in DNA is highly related to diverse biological processes and that aberrant methylation results in severe effects, including different types of cancers [ 1, 2 ]. Therefore, the research on epigenetic modifications throughout the human genome is meaningful. Analyzing DNA methylation profiles is now feasible due to the development of next-generation sequencing techniques, such as MethylC-seq, MeDIP-seq, MBD-seq, and MRE-seq. In particular, bisulfite genomic DNA sequencing is the gold standard to profile genome-wide DNA methylation [3]. Although there are several approaches (such as MethylCseq and whole-genome shotgun bisulfit sequencing (WGSBS)) that are reasonable for wholegenome analysis, it is rather expensive. MeDIP-seq can achieve nearly the same results as some Natural Science Foundation of China (Grant No. 11671268). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. more expensive approaches at a lower cost [ 4 ]. We therefore propose a method based on MeDIP-seq data to analyze methylation levels. MeDIP [ 5 ] involves enrichment of the methylated DNA fractions immunoprecipitated by 5-methylcytosine-specific antibodies. Although MeDIP-seq cannot provide base pair-specific profiles, it reflects methylation levels by the number of immunoprecipitated DNA fragments. Compared to MeDIP-seq, which only enriches the methylated portion of the genome, we combine methylation-sensitive restriction enzyme sequencing (MRE-seq) to identify unmethylated CpGs. It utilizes methyl-sensitive restriction enzymes (MREs), such as HpaII (CCà GG), Hin6I (GCà GC), and AciI (CCà GC) to specifically identify unmethylated CpGs [ 6 ]. The integrative method improves accuracy to identify intermediate methylation regions and enables wholegenome identification for epigenetic states [ 7 ]. Several computational tools have been developed for analyzing MeDIP-seq data. BATMAN [ 8 ] defines a coupling factor to measure the varying densities of methylated CpG sites and then implements a Bayesian deconvolution strategy to infer the methylation status at each CpG site. Mattia Pelizzola [ 9 ] believes that the relationship between the MeDIP enrichment estimates and the actual methylation levels are not linear and presented MEDME, which is based on experimental and analytical methods to evaluate the actual relationship and predicted methylation levels. The MEDIPS [ 10 ] approach is similar to the former two methods and produces similar results as BATMAN with higher computational efficiency, which significantly reduces running time for processing MeDIP-seq data. To obtain more information for methylome coverage at a lower cost, R. Alan Harris [ 7 ] has proposed a strategy to combine MeDIPseq and MRE-seq to calculate the methylation scores, which can be used to infer individual CpG methylation status. The M&M [ 11 ] algorithm is another method for analyzing integrative data, and is more accurate than other methods. In this paper, we build a model of MeDIP-seq data based on [ 7 ] to estimate the methylation level for a single CpG site. We attempt to summarize the algorithm into a model, which will enable us to understand how to integrate the MRE-seq data. After MeDIP-seq and MRE-seq experiments, we map two kinds of short reads to the reference genome. It is known that MREseq short reads can be accurately mapped to the CpG site that contributes to it, but MeDIP-seq short reads cannot be. A short read of MeDIP-seq will cover one or more CpG sites (the short reads that cannot cover CpG sites will be discarded). Then, one short read in MeDIP-seq is pulled from only one CpG site or several neighboring CpG sites; however, we do not know which one. In order to identify the actual CpG sites that contribute to the short read and to obtain the actual number of short reads on each CpG site, it is necessary to build a model and provide statistical inferences for the MeDIP-seq data. After obtaining the actual number of short reads on each CpG site, we utilize them to detect differentially methylated CpG sites. The remainder of the paper is organized as follows. In Section 2, we provide a brief description of the model and two possible cases and then propose theorems for those assumptions. In Section 3, we use an example to illustrate the SIMD method. In Section 4, we apply the proposed method to analyze a real dataset and compare it to the existing method. In Section 5, we end with a discussion and the conclusion. 2 Model for MeDIP-seq reads In the MeDIP-seq experiment, genomic DNA is first isolated and sheared by sonication to short fragments of a few hundred basepairs. In this step, the DNA fragments contain both methylated fragments and unmethylated fragments. After immunoprecipitation with an antibody that can specifically bind the DNA methylation sites, the immunoprecipitated DNA fragments will almost only comprise methylated fragments and can be PCR amplified and 2 / 13 sequenced. By aligning the MeDIP-seq short reads to the reference genome, the methylation levels of CpG sites in a region can be estimated based on the read counts in the region. This region-based method can provide insightful answers to numerous important biological questions, but it is of low resolution and cannot provide information about the methylation status of single CpG sites. Though difficult, inferring the methylation level of single CpG sites based on MeDIP-seq data is not impossible. For example, provided that the coverage of MeDIP-seq data is sufficiently large, the methylation level of an isolated CpG site (a CpG site that is far away from other CpG sites) can be easily derived. If there are MeDIP-seq reads covering the site, these reads will not be able to cover other CpG sites, which implies that this CpG site is methylated; otherwise, this site is not methylated. When two or more CpG sites are close to each other (the distance between two neighboring CpG sites is less than the read length), an MeDIP-seq read covering one CpG site will also cover its neighboring CpG site, which makes it very difficult to determine which CpG site is methylated. However, as breakage induced by sonication is random, with sufficient sequencing coverage we may still be able to distinguish between the methylated and unmethylated CpG sites. For example, suppose that only two CpG sites are close to each other and are far away from other CpG sites. If only one of the two CpG sites is methylated, the DNA fragments obtained from the sonication in this region (the region that contains the two CpG sites) can be classified into three categories: the fragments overlapping with both CpG sites, the fragments overlapping only with the methylated CpG sites, or the fragments overlapping only with the unmethylated CpG sites. Because the fragments in the first two (the last) categories contain a methylated (unmethylated) CpG site, they can (cannot) be immunoprecipitated and sequenced. Therefore, we would obtain significantly more MeDIP-seq reads covering the methylated CpG site. Similarly, if both CpG sites are methylated, the number of reads covering both CpG sites should be roughly the same. Based on this observation, we can develop a statistical model to estimate the methylation level of single CpG sites. Considering that a region C consists of G CpG sites, it is supposed that R is a random MeDIP-seq read sequenced from the region overlapping with some of the G CpG sites. From the MeDIP-seq experimental flow, we know that this read is sequenced because it contains at least one methylated CpG site and therefore allows an antibody to bind to its methylated CpG sites, which in turn makes it immunoprecipitated and sequenced. Let XRj = 1 (j = 1, , G) if the jth CpG site contributes to the sequencing of the short read R, or in other words, if the short read R contains the jth CpG site and an antibody binds to the jth CpG site, thereby allowing the immunoprecipitation and sequencing of short read R. Otherwise, we denote XRj = 0. Note that because R is a random read, XRj is a random variable taking values of {0, 1}. Assume that we have n short reads overlapping with the region C. Let Xij (i = 1, , n and j = 1, , G) be the random variable as introduced above for the ith read. We denote Xi = (Xi1, Xi2, Xij, XiG). We make the following assumptions about the random variables Xij. Assumption 1. Assume that Xi1, Xi2, Xij, XiG are independent and follow two-point distribution, that is, Xij 8 >>>< 1 antibody actually binds to this CpG site and thus allows immunoprecipitation and sequencing the short read. This assumption essentially tells us that the random vectors Xi are independently and identically distributed (i.i.d.) and that their components are independent. The i.i.d. assumption of the random vectors Xi is reasonable because the short reads can be safely viewed as independently sampled with the same sampling procedure. The independence assumption of the components of Xi is relatively strong because if a read contains two methylated CpG sites, the antibody binding to one CpG site may influence the binding to the other CpG site. Assumption 2. Assume one short read in MeDIP-seq is pulled from only one CpG site (the case is the same as Ting's algorithm), that is, SjGˆ1Xij ˆ 1. Assumption 3. Assume one short read in MeDIP-seq is pulled from not less than one CpG site (each observed short read must be associated with not less than one CpG site on the genome), that is, SjGˆ1Xij 1. Under the above assumptions, there are some theoretical results. Theorem 1. Under Assumptions 1 and 2, the joint distribution of Xi1, Xi2, , Xij, , XiG is a multinomial distribution with probability P = (p1, p2, , pj, , pG), where pj ˆ lj=SjGˆ1lj. That is, 3 An example of the model , 6. Therefore, the combination distribution of all CpG sites 4 / 13 Fig 1. Short reads mapped to reference regions. For example, there are six short reads that cover five CpG sites. In the model, there will be two cases. 3.1 Only a single CpG contributes to a short read If one short read covers several CpG sites, it actually only comes from one of them, even though we do not know which one it is. That is, given Sj5ˆ1Xij ˆ 1, the joint distribution of Xi1, Xi2, Xi3, Xi4, and Xi5 is a multinomial distribution with probability P = (p1, p2, p3, p4, p5), where pj ˆ lj=Sj5ˆ1lj. A short read is an observation that is Xi = (xi1, xi2, xi3, xi4, xi5), where xij = 0 or 1. A note is that only one element of Xi is 1 and the others are 0. The xij will be 0 when the jth CpG is not covered by the ith short reads. That is, Xi j Sj5ˆ1Xij ˆ 1 Multinomial…P†: Then, the profile log likelihood is: We know the observation from Fig 1 is: 0 1 In the second short read, we know that one of x21 and x22 is 1, but we do not know which one. Therefore, we consider x21 and x22 as latent variables and estimate P using the EM algorithm, which is given below: 5 / 13 E-step: Given the current estimation P− for P, the conditional expectation of the log complete data likelihood is: Given this observation, E-step [ 12 ] consists of computing the following quantities: therefore, Then, we solve the following equation system to obtain updated parameter estimates: Therefore, the update formula of P changed, as follows: p^1 ˆ …1 ‡ p1 p‡1 p2 ‡ p1 ‡ pp12 ‡ p3 p^2 p1 ‡ pp22 ‡ p3 †=6; p^3 ‡ p3 p‡3 p4 ‡ p^4 p3 ‡ pp44 ‡ p5 †=6; ˆ … p2 ‡ p1 ‡ p2 ˆ … p3 p1 ‡ p2 ‡ p3 ˆ … p4 ‡ p3 ‡ p4 ˆ … p5 p3 ‡ p4 ‡ p5 ‡ 1†=6: p^5 †=6; p3 p3 ‡ p4 ‡ p5 †=6; 6 / 13 The iteration process is the same as Ting's algorithm when we give equal value to the starting value for pi…0†. In fact, the starting value does not affect the convergence value. Then, [6P] is the number of short reads at each CpG site. 3.2 At least a CpG contributes to a short read If one short read covers several CpG sites, it actually comes from at least one of them, even though we do not know which CpG sites they are. Then, under the condition Sj5ˆ1Xij 1, the distribution of Xi = (Xi1, Xi2, Xi3, Xi4, Xi5) is: p…Xi1; Xi2; Xi3; Xi4; Xi5 j Sj5ˆ1Xij 1†; ˆ ˆ p…Xi1; Xi2; Xi3; Xi4; Xi5† Qj5ˆ1 qQjXijj5…ˆ11…1 qj†1 Xij 1 qj† ; p…Sj5ˆ1Xij p…Xi1; Xi2; Xi3; Xi4; Xi5; Sj5ˆ1Xij ˆ 0† 1† where qj is defined in formula (1). We know the observation is expressed in the matrix as (2). In the second read, we know that some of x are indeterminate. Therefore, we consider the missed values of x as latent variables and estimate q = (q1, q5) using the EM algorithm. E-step: Given the current estimation q− for q, the conditional expectation of the log complete data likelihood is given as: Q…q j q † ˆ E…l…q j x…obs††† j q † ˆ X5 X6 where x~ij is replaced by the condition expectation. for example, x~21 ˆ E…X21 j P ; x23 ˆ 0; x24 ˆ 0; x25 ˆ 0† ˆ 1 Pj2ˆq11…1 qj †. x~ij ˆ E…Xij j q ; x…obs††; Therefore, During the M-step, the goal is to maximize Q(q j q−) with respect to q, which requires solving @Q(q j q−)/@q = 0. That is, Then, we solve the following equation system to obtain updated parameter estimates: Q ˆ Q…q j q † l 1†: X 5 jˆ1 qj ˆ 0: Therefore, the update formula of q is changed, as follows: q^1 q^2 q^3 q^4 q^5 ˆ 1 ‡ qj † qj † qj † †…1 ‡ †…1 1 4 Real data analysis To evaluate the performance of the proposed method, we compare it with the existing method (Raw) that directly uses the observation fragments. The data comes from paper [ 11 ], which includes 19 human samples. In this paper, we only consider two samples, embryonic stem cells (the ES cell line H1) and human fetal neural stem cells (NSCs) culture (HuFNSC02, neurosphere cultured cells, ganglionic eminence derived, fetal age of 21 weeks). Then, we obtained the MeDIP-seq and MRE-seq data for each sample. Similar to the analysis procedure for the M&M method, we test the performance of SIMD and the existing method by pair-wise comparisons between two H1-ESC replicates and between H1-ESCs and fetal NSCs. The difference between the two tests is that we detect differentially methylated CpG sites in this paper; however, the test of the M&M method is to determine differentially methylated regions (DMRs). We consider the false positive rates for two methods. We apply SIMD and the existing method to the two H1-ESC replicates and use the hypothesis test to obtain the P-values for each CpG site. Because the H1-ESC samples are biological replicates, we consider the different methylated CpG sites as the false discovery sites at any P-value cutoff. The results are represented in Table 1. It is evident from the table that at the same P-value cutoff, SIMD usually reports fewer differentially methylated CpG sites than the exisiting method; for example, when the P-value cutoff equals 10−5, the number of differentially methylated CpG sites for the existing method is seven times more than for SIMD. There are approximately 1751273 CpG sites in 8 / 13 1e-3 chromosome 1 of the human reference sequence (excluding blacklist regions). We can then calculate the false positive rates for two methods at any P-value cutoff. Obviously, from Table 1, we can see that false positive rates of SIMD are significantly less than those of the existing method. Next, we consider the false discovery rates(FDRs) for two methods. We compare two different cell types, H1-ESC and fetal NSCs, and use the same P-value cutoffs as the first test. We obtain the number of differentially methylated CpG sites for two methods. Combining the results of the two H1-ESC replicates for analysis, we obtain the false discovery rates at any Pvalue cutoff. From Table 2, we can see that the number and FDRs of SIMD are no better than the existing method when the cutoffs are larger than 10−6. However, when cutoffs are smaller than 10−6, the FDRs of SIMD are obviously less than those of the existing method. Next, we further consider the q-value cutoffs in Table 3, similar to the P-value cutoff, and find that the number of differentially methylated CpG sites of SIMD is far less than in the method at each q-value level (approximately 1/20 of the existing method). However, the FDRs of the existing method are larger than the overall SIMD. 9 / 13 5 Discussion Identifying differentially methylated CpG sites across a whole genome is an effective way to study epigenetic modification. In dealing with the data integrated by MeDIP-seq and MREseq, estimating the methylation level is the first choice. In this paper, we proposed a SIMD method that considers the possible structure whereby immunoprecipitated short reads are mapped to the methylated CpG sites. We then proposed two cases based on it, one in which only a single CpG site contributes to a short read and another in which more than one CpG site contributes to a short read. By applying the SIMD method, we can obtain the real number of short reads in each CpG site. Last, we employ the hypothesis test framework to detect the differentially methylated CpG sites. In real data analysis, we compare the proposed SIMD method with the existing method (Raw). The results demonstrate that although the number of differentially methylated CpG sites detected by the SIMD method is less than those detected by the existing method, the FDRs of the SIMD are much smaller than those of the existing method. The conclusion is that the proposed method performs better than the existing method. There are still some problems, such as the assumption of independence between the short reads. When the independence condition cannot be satisfied, the proposed method may work not very well. Therefore, in our future work, we will take the correlation between the short reads that are mapped to the neighboring CpG sites into account. Appendix A: Proof of Theorem 1 Under Assumption 1, the joint distribution of (Xi1, Xi2, , Xij, , XiG) is ; XiG j SjGˆ1Xij ˆ 0† ˆ 0 in real data. This is the end of the proof. Appendix C: EM algorithm for Theorems 1 and 2 We know the observation is 0 0 B B 0 B B B X ˆ BBB 0 B B @ 0 0 0 x1k1 x2k2 0 xiki 0 xnkn x1l1 x2l2 xili xnln 0 0 0 0 warhee0r.eTxhikei;refor;ex,ilithaereombsiesrsveadtidoantao. fHeoacwhevreear,dwisexk…nobos†wˆso…xm1…oebs†o;fxx2…oikbis;†; ;;xxilin…oabsr†e†,1wahnedreothers xi…obs† ˆ …xi1 ˆ 0; ; xi…ki 1† ˆ 0; xi…li‡1† ˆ 0; ; xiG ˆ 0†. There are two cases. (1) Only a single CpG contributes to a short read: If one short read covers several CpG sites, it actually only comes from one of them, even though we do not know which one it is. That is, we have known that SjGˆ1Xij ˆ 1, the joint distribution of Xi1, , and XiG is a multinomial distribution with probability P = (p1, , pG), where pj ˆ lj=SjGˆ1lj. A short read is an observation that is Xi = (xi1, xi2, , xiG), where xij = 0 or 1. A note is that only one element of Xi is 1 and the others are 0. The xij will be 0 when the jth CpG is not covered by the ith short reads. That is, Then, the profile log likelihood is: Given this, the E-step [ 13 ] consists of computing the following quantities: ( Q…P j P † E…l…P j x…obs††† j P † ˆ ˆ XG Xn M-Step: During the M-step, the goal is to maximize Q(P j P−) with respect to P, which requires solving @Q(P j P−)/@p = 0 subject to PjGˆ1 pj ˆ 1. That is, ; xili is also a multinomial distribution or Then, we solve the following equation system to obtain updated parameter estimates: Thus, given below is the updated formula of P: ˆ 0: X n iˆ1 p^j ˆ x~ij…P †=n: The iteration process is the same as Ting's algorithm when we give equal value to the starting value for pi…0†. In fact, the starting value does not affect the convergence value. Then, [nP] is the number of short reads at each CpG site. (2) The case that at least a CpG contributes to a short read: The process of the proof is the same as (1). Acknowledgments Yan Zhou's research was supported by the National Science Foundation of China (Grant No. 11701385), the National Statistical Research Project (Grant No. 2017LY56), and the Doctor start fund of Guangdong Province (Grant No. 2016A030310062). Mingtao Zhao's research was supported by the National Social Science Foundation of China (15CTJ008). Xiyan Yang's research was supported by the National Science Foundation of China (Grant No. 11601094). Baoxue Zhang's research was supported by the National Natural Science Foundation of China (Grant No. 11671268). Author Contributions Conceptualization: Jiadi Zhu, Mingtao Zhao. Methodology: Yan Zhou, Baoxue Zhang, Chunfu Jiang. 12 / 13 Software: Xiyan Yang. Supervision: Yan Zhou. Writing ± original draft: Yan Zhou. 1. Irizarry RA , Acosta CL , Wen B , Wu ZJ , Montano C , Onyango P , et al. The human colon cancer methylome shows similar hy-po and hypermethylation at conserved tissue-specific GpG island shores . Nature Genetics . 2009 ; 41 : 178 ± 186 . https://doi.org/10.1038/ng.298 PMID: 19151715 2. Keshet I , Schlesinger Y , Farkash S , Rand E , Hecht M , Segal E , et al. Evidence for an instructive mechanism of de novo methylation in cancer cells . Nature Genetics . 2006 ; 38 : 149 ± 153 . https://doi.org/10. 1038/ng1719 PMID: 16444255 3. Esteller M. Relevance of DNA methylation in the management of cancer . The Lancet Oncology . 2003 ; 4 : 351 ± 358 . https://doi.org/10.1016/S1470-2045( 03 ) 01115 -X PMID: 12788407 4. Laird PW . Principles and challenges of genome-wide DNA methylation analysis . Nature Reviews Genetics . 2010 ; 11 : 191 ± 203 . https://doi.org/10.1038/nrg2732 PMID: 20125086 5. Weber M , Davies JJ , Wittig D , Oakeley EJ , Haase M , Lam WL , et al. Chromosome-wide and promoterspecific analyses identify sites of differential DNA methylation in normal and transformed human cells . Nature Genetics . 2005 ; 37 : 853 ± 862 . https://doi.org/10.1038/ng1598 PMID: 16007088 6. Maunakea AK , Nagarajan RP , Bilenky M , Ballinger TJ , Souza CD , Fouse SD , et al. Conserved role of intragenic DNA methylation in regulating alternative promoters . Nature . 2010 ; 466 : 253 ± 257 . https:// doi.org/10.1038/nature09165 PMID: 20613842 7. Harris RA , Wang T , Coarfa C , Nagarajan RP , Hong C , Downey SL , et al. Comparison of sequencingbased methods to profile DNA methylation and identification of monoallelic epigenetic modifications . Nature Biotechnology . 2010 ; 28 : 1097 ± 1105 . https://doi.org/10.1038/nbt.1682 PMID: 20852635 8. Down TA , Rakyan VK , Turner DJ , Flicek P , Li H , Kulesha E , et al. A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis . Nature Biotechnology . 2008 ; 26 : 779 ± 785 . https://doi.org/10.1038/nbt1414 PMID: 18612301 9. Pelizzola M , Koga Y , Urban AE , Krauthammer M , Weissan S , Halaban R , et al. MEDME: An experimental and analytical methodology for the estimation of DNA methylation levels based on microarray derived MeDIP-enrichment . Genome Research . 2008 ; 18 : 1652 ± 1659 . https://doi.org/10.1101/gr. 080721.108 PMID: 18765822 10. Chavez L , Jozefczuk J , Grimm C , Dietrich J , Timmermann B , Lehrach H , et al. Computational analysis of genome-wide DNA methylation during the differentiation of human embryonic stem cells along the endodermal lineage . Genome Research . 2010 ; 20 : 1441 ± 1450 . https://doi.org/10.1101/gr.110114.110 PMID: 20802089 11. Zhang B , Zhou Y , Lin N , Lowdon RF , Hong C , Nagarajan RP , et al. Functional DNA methylation differences between tissues, cell types, and across individuals discovered using the M&M algorithm . Genome Research . 2013 ; 23 : 1522 ± 1540 . https://doi.org/10.1101/gr.156539.113 PMID: 23804400 12. McLachlan G.J. and Peel D. Finite Mixture Models. Wiley-Interscience, New York. 2000 . 13. Lister R , Pelizzola M , Dowen RH , Hawkins RD , Hon G , Tonti-Filippini J , et al. Human DNA methylomes at base resolution show widespread epigenomic differences . Nature . 2009 ; 462 : 315 ± 322 . https://doi. org/10.1038/nature08514 PMID: 19829295


This is a preview of a remote PDF: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0201586&type=printable

Yan Zhou, Jiadi Zhu, Mingtao Zhao, Baoxue Zhang, Chunfu Jiang, Xiyan Yang. Methylation-level inferences and detection of differential methylation with MeDIP-seq data, PLOS ONE, 2018, DOI: 10.1371/journal.pone.0201586