DMAP: differential methylation analysis package for RRBS and WGBS data
BIOINFORMATICS
ORIGINAL PAPER
Genome analysis
Vol. 30 no. 13 2014, pages 1814–1822
doi:10.1093/bioinformatics/btu126
Advance Access publication March 7, 2014
DMAP: differential methylation analysis package for RRBS and
WGBS data
Peter A. Stockwell1,*,y, Aniruddha Chatterjee2,3,y, Euan J. Rodger2 and Ian M. Morison2
1
Department of Biochemistry, University of Otago, 710 Cumberland Street, Dunedin 9054, New Zealand, 2Department of
Pathology, Dunedin School of Medicine, University of Otago, 270 Great King Street, Dunedin 9054, New Zealand and
3
Gravida: National Centre for Growth and Development, 2-6 Park Ave, Grafton, Auckland 1142, New Zealand
ABSTRACT
Motivation: The rapid development of high-throughput sequencing
technologies has enabled epigeneticists to quantify DNA methylation
on a massive scale. Progressive increase in sequencing capacity present challenges in terms of processing analysis and the interpretation of
the large amount of data; investigating differential methylation between
genome-scale data from multiple samples highlights this challenge.
Results: We have developed a differential methylation analysis package (DMAP) to generate coverage-filtered reference methylomes and
to identify differentially methylated regions across multiple samples
from reduced representation bisulphite sequencing and whole
genome bisulphite sequencing experiments. We introduce a novel
fragment-based approach for investigating DNA methylation patterns
for reduced representation bisulphite sequencing data. Further, DMAP
provides the identity of gene and CpG features and distances to the
differentially methylated regions in a format that is easily analyzed with
limited bioinformatics knowledge.
Availability and implementation: The software has been implemented in C and has been written to ensure portability between different platforms. The source code and documentation is freely available
(DMAP: as compressed TAR archive folder) from http://biochem.otago.
ac.nz/research/databases-software/. Two test datasets are also available for download from the Web site. Test dataset 1 contains reads
from chromosome 1 of a patient and a control, which is used for comparative analysis in the current article. Test dataset 2 contains reads
from a part of chromosome 21 of three disease and three control samples for testing the operation of DMAP, especially for the analysis of
variance. Example commands for the analyses are included.
Contact: or aniruddha.chatterjee@otago
.ac.nz
Supplementary information: Supplementary data are available at
Bioinformatics online.
Received on May 14, 2013; revised on January 28, 2014; accepted on
March 2, 2014
1
INTRODUCTION
DNA methylation is arguably the most stable epigenetic mark
that plays a key role in regulating development and disease
(Baylin and Bestor, 2002; Law and Jacobsen, 2010). One of the
*To whom correspondence should be addressed.
y
The authors wish it to be known that, in their opinion, the first two
authors should be regarded as joint First Authors.
1814
most fundamental challenges for epigeneticists is to identify
DNA methylation differences between genomes. For instance,
differential methylation between diseased and normal samples,
interindividual variation within a population, differences between tissues or species and so on are of biological and clinical
relevance.
The rapid improvement in next-generation sequencing technologies now provides opportunities to interrogate DNA methylation at single base resolution with high coverage across
multiple samples. Bisulphite treatment converts unmethylated
cytosines to uracils (and ultimately to thymine after amplification), although leaving methylated cytosines unchanged.
Therefore, bisulphite treatment combined with next-generation
sequencing (BS-Seq) has become a preferred method to generate
base-resolution DNA methylation maps. Because whole-genome
bisulphite sequencing (WGBS) is still expensive and generates
challenging amounts of raw data, reduced representation bisulphite sequencing (RRBS) provides a cost-effective alternative for
whole-genome methylation sequencing. RRBS has been widely
used by several groups worldwide to interrogate functionally important genomic regions at high-sequencing coverage and sensitivity (Baranzini et al., 2010; Bock et al., 2011; Chatterjee et al.,
2012; Gertz et al., 2011; Gu et al., 2010; Smallwood et al., 2011;
Steine et al., 2011; Xi et al., 2012).
During the past few years, several alignment tools have been
developed to cope with asymmetric mapping issues of bisulphite
converted sequenced reads and to map millions of reads with
reasonable speed to the reference genome. Some of these aligners
are RMAP (Smith et al., 2009), BS Seeker (Chen et al., 2010),
Bismark (Krueger and Andrews, 2011), RRBSMAP (Xi et al.,
2012), BatMeth (Lim et al., 2012) and PASS-bis (Campagna
et al., 2013). Recent comparative analyses have improved our
understanding of the efficiency, accuracy and algorithm of
these aligners (Chatterjee et al., 2012; Kunde-Ramamoorthy
et al., 2014). Additionally, tools have been developed for generating methylation calls and visualization. Integrated Genome
Viewer (Thorvaldsdottir et al., 2013) and MethVisual (Sun
et al., 2013) allow visualization of sequenced reads and regional
analysis. BiQ Analyzer HT allows site-specific DNA methylation
analysis (Schmieder and Edwards, 2011), and SAAP-RRBS can
perform alignment, methylation calls, annotation of CpG sites
and visualization (Ziller et al., 2013).
methylKit (Akalin et al., 2012a), an R package, enables detection of differentially methylated CpG sites (DMCs). methylKit
ß The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email:
Associate Editor: Inanc Birol
DMAP: differential methylation analysis package
2 METHODS AND ALGORITHMS
2.1
DMAP package and input data
DMAP contains two main programs. (i) diffmeth: The input files
to diffmeth are either SAM files from Bismark alignment
(Krueger and Andrews, 2011) or the older native format produced by the Bismark methylation_extractor program, comprising a single line for each mapped CpG giving the chromosome,
the CpG position and the methylation status (þ/).
Alternatively, if other aligners (such as BSMAP and
RMAPBS) are used, then the files (BED file or text files) can
be processed by the rmapbscpg2 ancillary program before analysis with diffmeth. By default, diffmeth does not impose any
P-value cutoff for identifying DMR; it returns a P-value for
each investigated region/fragment to allow user-specified threshold P-values and independent application of multiple test
corrections.
(ii) The final output file from diffmeth program can then be
used in the second main program of DMAP, identgeneloc, to
identify proximal genes and features (transcription start sites,
exons/introns, etc.), relationship to CpG features (CpG island
core/shore/shelf) and distances from each feature (Fig. 1). This
oper (...truncated)