DMAP: differential methylation analysis package for RRBS and WGBS data

Bioinformatics, Jul 2014

Motivation: The rapid development of high-throughput sequencing technologies has enabled epigeneticists to quantify DNA methylation on a massive scale. Progressive increase in sequencing capacity present challenges in terms of processing analysis and the interpretation of the large amount of data; investigating differential methylation between genome-scale data from multiple samples highlights this challenge.

DMAP: differential methylation analysis package for RRBS and WGBS data

BIOINFORMATICS ORIGINAL PAPER Genome analysis Vol. 30 no. 13 2014, pages 1814–1822 doi:10.1093/bioinformatics/btu126 Advance Access publication March 7, 2014 DMAP: differential methylation analysis package for RRBS and WGBS data Peter A. Stockwell1,*,y, Aniruddha Chatterjee2,3,y, Euan J. Rodger2 and Ian M. Morison2 1 Department of Biochemistry, University of Otago, 710 Cumberland Street, Dunedin 9054, New Zealand, 2Department of Pathology, Dunedin School of Medicine, University of Otago, 270 Great King Street, Dunedin 9054, New Zealand and 3 Gravida: National Centre for Growth and Development, 2-6 Park Ave, Grafton, Auckland 1142, New Zealand ABSTRACT Motivation: The rapid development of high-throughput sequencing technologies has enabled epigeneticists to quantify DNA methylation on a massive scale. Progressive increase in sequencing capacity present challenges in terms of processing analysis and the interpretation of the large amount of data; investigating differential methylation between genome-scale data from multiple samples highlights this challenge. Results: We have developed a differential methylation analysis package (DMAP) to generate coverage-filtered reference methylomes and to identify differentially methylated regions across multiple samples from reduced representation bisulphite sequencing and whole genome bisulphite sequencing experiments. We introduce a novel fragment-based approach for investigating DNA methylation patterns for reduced representation bisulphite sequencing data. Further, DMAP provides the identity of gene and CpG features and distances to the differentially methylated regions in a format that is easily analyzed with limited bioinformatics knowledge. Availability and implementation: The software has been implemented in C and has been written to ensure portability between different platforms. The source code and documentation is freely available (DMAP: as compressed TAR archive folder) from http://biochem.otago. ac.nz/research/databases-software/. Two test datasets are also available for download from the Web site. Test dataset 1 contains reads from chromosome 1 of a patient and a control, which is used for comparative analysis in the current article. Test dataset 2 contains reads from a part of chromosome 21 of three disease and three control samples for testing the operation of DMAP, especially for the analysis of variance. Example commands for the analyses are included. Contact: or aniruddha.chatterjee@otago .ac.nz Supplementary information: Supplementary data are available at Bioinformatics online. Received on May 14, 2013; revised on January 28, 2014; accepted on March 2, 2014 1 INTRODUCTION DNA methylation is arguably the most stable epigenetic mark that plays a key role in regulating development and disease (Baylin and Bestor, 2002; Law and Jacobsen, 2010). One of the *To whom correspondence should be addressed. y The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors. 1814 most fundamental challenges for epigeneticists is to identify DNA methylation differences between genomes. For instance, differential methylation between diseased and normal samples, interindividual variation within a population, differences between tissues or species and so on are of biological and clinical relevance. The rapid improvement in next-generation sequencing technologies now provides opportunities to interrogate DNA methylation at single base resolution with high coverage across multiple samples. Bisulphite treatment converts unmethylated cytosines to uracils (and ultimately to thymine after amplification), although leaving methylated cytosines unchanged. Therefore, bisulphite treatment combined with next-generation sequencing (BS-Seq) has become a preferred method to generate base-resolution DNA methylation maps. Because whole-genome bisulphite sequencing (WGBS) is still expensive and generates challenging amounts of raw data, reduced representation bisulphite sequencing (RRBS) provides a cost-effective alternative for whole-genome methylation sequencing. RRBS has been widely used by several groups worldwide to interrogate functionally important genomic regions at high-sequencing coverage and sensitivity (Baranzini et al., 2010; Bock et al., 2011; Chatterjee et al., 2012; Gertz et al., 2011; Gu et al., 2010; Smallwood et al., 2011; Steine et al., 2011; Xi et al., 2012). During the past few years, several alignment tools have been developed to cope with asymmetric mapping issues of bisulphite converted sequenced reads and to map millions of reads with reasonable speed to the reference genome. Some of these aligners are RMAP (Smith et al., 2009), BS Seeker (Chen et al., 2010), Bismark (Krueger and Andrews, 2011), RRBSMAP (Xi et al., 2012), BatMeth (Lim et al., 2012) and PASS-bis (Campagna et al., 2013). Recent comparative analyses have improved our understanding of the efficiency, accuracy and algorithm of these aligners (Chatterjee et al., 2012; Kunde-Ramamoorthy et al., 2014). Additionally, tools have been developed for generating methylation calls and visualization. Integrated Genome Viewer (Thorvaldsdottir et al., 2013) and MethVisual (Sun et al., 2013) allow visualization of sequenced reads and regional analysis. BiQ Analyzer HT allows site-specific DNA methylation analysis (Schmieder and Edwards, 2011), and SAAP-RRBS can perform alignment, methylation calls, annotation of CpG sites and visualization (Ziller et al., 2013). methylKit (Akalin et al., 2012a), an R package, enables detection of differentially methylated CpG sites (DMCs). methylKit ß The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: Associate Editor: Inanc Birol DMAP: differential methylation analysis package 2 METHODS AND ALGORITHMS 2.1 DMAP package and input data DMAP contains two main programs. (i) diffmeth: The input files to diffmeth are either SAM files from Bismark alignment (Krueger and Andrews, 2011) or the older native format produced by the Bismark methylation_extractor program, comprising a single line for each mapped CpG giving the chromosome, the CpG position and the methylation status (þ/). Alternatively, if other aligners (such as BSMAP and RMAPBS) are used, then the files (BED file or text files) can be processed by the rmapbscpg2 ancillary program before analysis with diffmeth. By default, diffmeth does not impose any P-value cutoff for identifying DMR; it returns a P-value for each investigated region/fragment to allow user-specified threshold P-values and independent application of multiple test corrections. (ii) The final output file from diffmeth program can then be used in the second main program of DMAP, identgeneloc, to identify proximal genes and features (transcription start sites, exons/introns, etc.), relationship to CpG features (CpG island core/shore/shelf) and distances from each feature (Fig. 1). This oper (...truncated)


This is a preview of a remote PDF: https://academic.oup.com/bioinformatics/article-pdf/30/13/1814/48923823/bioinformatics_30_13_1814.pdf
Article home page: https://academic.oup.com/bioinformatics/article/30/13/1814/2422202

Stockwell, Peter A., Chatterjee, Aniruddha, Rodger, Euan J., Morison, Ian M.. DMAP: differential methylation analysis package for RRBS and WGBS data, Bioinformatics, 2014, pp. 1814-1822, Volume 30, Issue 13, DOI: 10.1093/bioinformatics/btu126