The methylomes of six bacteria
Iain A. Murray
1
Tyson A. Clark
0
Richard D. Morgan
1
Matthew Boitano
0
Brian P. Anton
1
Khai Luong
0
Alexey Fomenkov
1
Stephen W. Turner
0
Jonas Korlach
0
Richard J. Roberts
1
0
Pacific Biosciences
, 1380 Willow Road, Menlo Park,
CA 94025, USA
1
New England Biolabs, 240 County Road, Ipswich,
MA 01938
Six bacterial genomes, Geobacter metallireducens GS-15, Chromohalobacter salexigens, Vibrio breoganii 1C-10, Bacillus cereus ATCC 10987, Campylobacter jejuni subsp. jejuni 81-176 and C. jejuni NCTC 11168, all of which had previously been sequenced using other platforms were re-sequenced using single-molecule, real-time (SMRT) sequencing specifically to analyze their methylomes. In every case a number of new N6-methyladenine (m6A) and N4-methylcytosine (m4C) methylation patterns were discovered and the DNA methyltransferases (MTases) responsible for those methylation patterns were assigned. In 15 cases, it was possible to match MTase genes with MTase recognition sequences without further sub-cloning. Two Type I restriction systems required sub-cloning to differentiate their recognition sequences, while four MTase genes that were not expressed in the native organism were sub-cloned to test for viability and recognition sequences. Two of these proved active. No attempt was made to detect 5-methylcytosine (m5C) recognition motifs from the SMRT sequencing data because this modification produces weaker signals using current methods. However, all predicted m6A and m4C MTases were detected unambiguously. This study shows that the addition of SMRT sequencing to traditional sequencing approaches gives a wealth of useful functional information about a genome showing not only which MTase genes are active but also revealing their recognition sequences.
-
We are becoming accustomed to the ever-increasing speed
and reduced cost with which DNA can be sequenced.
However, what is often lost in this frenzy of sequencing
is the fact that DNA consists of more than just four bases.
In eukaryotes, we have known for a long time about the
epigenetic role of 5-methylcytosine (m5C), sometimes
called the fifth base, and more recently it has been found
that 5-hydroxymethylcytosine, 5-formylcytosine and
5-carboxylcytosine are also present (14). However, two
more modified bases, N6-methyladenine (m6A) and
N4-methylcytosine (m4C), are also common in bacterial
genomes, where they function as components of
restrictionmodification (RM) systems (5). Until recently, these
have usually been ignored because of the lack of simple
methods to determine their locations. However, with the
advent of single-molecule, real-time (SMRT) sequencing
(68), it has suddenly become possible to detect these
modified bases as a part of the routine sequencing
procedure.
The methylated bases that are found in bacterial and
archaeal genomes serve important functions as part of
RM systems, where they protect the host chromosome
against the otherwise deleterious action of the partner
restriction enzyme(s), which are needed to destroy
unwanted incoming transmissible DNA elements such as
phages (9). However, in some cases these
methyltransferases (MTases) also serve regulatory roles as with
the Dam MTase of Escherichia coli, which introduces m6A
residues that play a key role in DNA repair and also have
important effects during the initiation of replication (10).
Several studies have also implicated MTases in regulating
gene expression, phase variation and pathogenicity
(11,12). Given the many DNA MTases that are typically
found in prokaryotic genomes, it seems likely that they
will have hitherto undocumented effects aside from their
key role in RM systems. To date, there has been no
genome-wide assessment of the extent of DNA
methylation by known MTases such as E. coli Dam (10) and Dcm
(13) or the cell cycle MTase, CcrM, of Caulobacter
crescentus (14). It is not known if their methylation
specificities are as precise as the customary recognition
sequences suggest or whether the enzymes are
promiscuous. This is particularly interesting to know for RM
systems as there are no obvious selective constraints on
MTase specificity provided that the core recognition
sequence of the restriction enzyme is fully modified.
Recently, we have shown that by cloning an individual
MTase gene into a plasmid and propagating it in an
otherwise methylation-deficient strain of E. coli, it is easily
possible through SMRT sequencing to detect all of the
bases modified on the plasmid (15). Precise recognition
sequences were convincingly demonstrated and mostly
matched that of the cognate restriction enzyme when the
MTase was part of an RM system. However, some
promiscuous methylation was observed, with the Dam gene of
E. coli being a particularly striking example. There was
one caveat to this interpretation though: because the
MTase genes in that study were cloned on a multi-copy
number plasmid (50200 copies per cell), it could be that
the observed promiscuity arose because of
overexpression.
Given that the results for the plasmids were very clear, it
seemed that it might be possible to perform a direct
analysis of bacterial genomes using the SMRTsequencing
method and thus obtain an accurate estimate of the extent
of methylation in the native organism. By then, comparing
a bioinformatic analysis of the RM systems with the direct
measurement of just what was methylated, it should be
possible to assign recognition sequences to individual
MTase genes. Of particular interest in this sort of
analysis are the Type I and Type III RM systems, which
have generally been very difficult to analyze by previous,
more tedious techniques (16). In both of these kinds of
systems, the specificity comes from a single subunit of
the enzymethe S subunit of the Type I enzymes and
the M subunit of the Type III enzymes (16). Thus, it
seemed likely that recognition sequences for both types
of MTases could be discovered relatively easily. To
demonstrate the feasibility of this approach, we chose initially
to analyze six genomes with relatively few RM systems
before moving on to more complicated cases.
MATERIALS AND METHODS
Materials
All restriction endonucleases (REases) except Eco147I
(Fermentas; Glen Burnie, MD, USA), Phusion-HF
DNA polymerase, Antarctic Phosphatase, T4-DNA
ligase and E. coli competent cells were from New
England Biolabs Inc. (Ipswich, MA, USA). Synthetic
oligonucleotides were purchased from Integrated DNA
Technologies (Coralville, IA, USA). Geobacter
metallireducens GS-15 ATCC 53774 DNA,
Chromohalobacter salexigens DSM 3043 DNA and
Bacillus cereus ATCC 10987 DNA were obtained from
the culture collections indicated. Vibrio breoganii 1C-10
DNA was a gift from Martin Polz, MIT. Campylobacter
jejuni subsp. jejuni 81-176 and C. jejuni NCTC 11168
DNAs were a gift from Stuart Thompson, Medical
College of Georgia.
SMRT sequencing
SMRTbell template libraries were prepared as previously
described (15,17). Briefly, genomic DNA samples were
sheared to an average size (...truncated)