The methylomes of six bacteria
11450–11462 Nucleic Acids Research, 2012, Vol. 40, No. 22
doi:10.1093/nar/gks891
Published online 2 October 2012
The methylomes of six bacteria
Iain A. Murray1, Tyson A. Clark2, Richard D. Morgan1, Matthew Boitano2,
Brian P. Anton1, Khai Luong2, Alexey Fomenkov1, Stephen W. Turner2,
Jonas Korlach2,* and Richard J. Roberts1,*
1
New England Biolabs, 240 County Road, Ipswich, MA 01938 and 2Pacific Biosciences, 1380 Willow Road,
Menlo Park, CA 94025, USA
Received August 1, 2012; Revised August 31, 2012; Accepted September 3, 2012
ABSTRACT
INTRODUCTION
Six bacterial genomes, Geobacter metallireducens
GS-15,
Chromohalobacter
salexigens,
Vibrio
breoganii 1C-10, Bacillus cereus ATCC 10987,
Campylobacter jejuni subsp. jejuni 81-176 and
C. jejuni NCTC 11168, all of which had previously
been sequenced using other platforms were
re-sequenced using single-molecule, real-time
(SMRT) sequencing specifically to analyze their
methylomes. In every case a number of new
N6-methyladenine (m6A) and N4-methylcytosine
(m4C) methylation patterns were discovered and
the DNA methyltransferases (MTases) responsible
for those methylation patterns were assigned. In
15 cases, it was possible to match MTase genes
with MTase recognition sequences without further
sub-cloning. Two Type I restriction systems
required sub-cloning to differentiate their recognition sequences, while four MTase genes that were
not expressed in the native organism were
sub-cloned to test for viability and recognition
sequences. Two of these proved active. No
attempt was made to detect 5-methylcytosine
(m5C) recognition motifs from the SMRTÕ
sequencing data because this modification
produces weaker signals using current methods.
However, all predicted m6A and m4C MTases were
detected unambiguously. This study shows that
the addition of SMRT sequencing to traditional
sequencing approaches gives a wealth of useful
functional information about a genome showing
not only which MTase genes are active but also revealing their recognition sequences.
We are becoming accustomed to the ever-increasing speed
and reduced cost with which DNA can be sequenced.
However, what is often lost in this frenzy of sequencing
is the fact that DNA consists of more than just four bases.
In eukaryotes, we have known for a long time about the
epigenetic role of 5-methylcytosine (m5C), sometimes
called the fifth base, and more recently it has been found
that 5-hydroxymethylcytosine, 5-formylcytosine and
5-carboxylcytosine are also present (1–4). However, two
more modified bases, N6-methyladenine (m6A) and
N4-methylcytosine (m4C), are also common in bacterial
genomes, where they function as components of restriction–modification (RM) systems (5). Until recently, these
have usually been ignored because of the lack of simple
methods to determine their locations. However, with the
advent of single-molecule, real-time (SMRT) sequencing
(6–8), it has suddenly become possible to detect these
modified bases as a part of the routine sequencing
procedure.
The methylated bases that are found in bacterial and
archaeal genomes serve important functions as part of
RM systems, where they protect the host chromosome
against the otherwise deleterious action of the partner
restriction enzyme(s), which are needed to destroy
unwanted incoming transmissible DNA elements such as
phages (9). However, in some cases these methyltransferases (MTases) also serve regulatory roles as with
the Dam MTase of Escherichia coli, which introduces m6A
residues that play a key role in DNA repair and also have
important effects during the initiation of replication (10).
Several studies have also implicated MTases in regulating
gene expression, phase variation and pathogenicity
(11,12). Given the many DNA MTases that are typically
found in prokaryotic genomes, it seems likely that they
will have hitherto undocumented effects aside from their
*To whom correspondence should be addressed. Tel: +978 380 7405; Fax: +978 380 7406; Email:
Correspondence may also be addressed to Jonas Korlach. Tel: +650 521 8006; Fax: +650 323 9420; Email: jkorlach@pacificbiosciences.com
ß The Author(s) 2012. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which
permits unrestricted, distribution, and reproduction in any medium, provided the original work is properly cited.
Nucleic Acids Research, 2012, Vol. 40, No. 22 11451
key role in RM systems. To date, there has been no
genome-wide assessment of the extent of DNA methylation by known MTases such as E. coli Dam (10) and Dcm
(13) or the cell cycle MTase, CcrM, of Caulobacter
crescentus (14). It is not known if their methylation
specificities are as precise as the customary recognition
sequences suggest or whether the enzymes are promiscuous. This is particularly interesting to know for RM
systems as there are no obvious selective constraints on
MTase specificity provided that the core recognition
sequence of the restriction enzyme is fully modified.
Recently, we have shown that by cloning an individual
MTase gene into a plasmid and propagating it in an otherwise methylation-deficient strain of E. coli, it is easily
possible through SMRT sequencing to detect all of the
bases modified on the plasmid (15). Precise recognition
sequences were convincingly demonstrated and mostly
matched that of the cognate restriction enzyme when the
MTase was part of an RM system. However, some promiscuous methylation was observed, with the Dam gene of
E. coli being a particularly striking example. There was
one caveat to this interpretation though: because the
MTase genes in that study were cloned on a multi-copy
number plasmid (50–200 copies per cell), it could be that
the observed promiscuity arose because of overexpression.
Given that the results for the plasmids were very clear, it
seemed that it might be possible to perform a direct
analysis of bacterial genomes using the SMRTsequencing
method and thus obtain an accurate estimate of the extent
of methylation in the native organism. By then, comparing
a bioinformatic analysis of the RM systems with the direct
measurement of just what was methylated, it should be
possible to assign recognition sequences to individual
MTase genes. Of particular interest in this sort of
analysis are the Type I and Type III RM systems, which
have generally been very difficult to analyze by previous,
more tedious techniques (16). In both of these kinds of
systems, the specificity comes from a single subunit of
the enzyme—the S subunit of the Type I enzymes and
the M subunit of the Type III enzymes (16). Thus, it
seemed likely that recognition sequences for both types
of MTases could be discovered relatively easily. To demonstrate the feasibility of this approach, we chose initially
to analyze six genomes with relatively few RM systems
before moving on to more complicated cases.
MATERIALS AND METHODS
the culture (...truncated)