REPK: an analytical web server to select restriction endonucleases for terminal restriction fragment length polymorphism analysis
W58–W62 Nucleic Acids Research, 2007, Vol. 35, Web Server issue
doi:10.1093/nar/gkm384
REPK: an analytical web server to select restriction
endonucleases for terminal restriction fragment
length polymorphism analysis
Roy Eric Collins and Gabrielle Rocap*
School of Oceanography, University of Washington, Seattle WA, USA
Received January 31, 2007; Revised April 18, 2007; Accepted April 30, 2007
ABSTRACT
INTRODUCTION
Terminal restriction fragment length polymorphism
(T-RFLP) analysis is a microbial fingerprinting technique
capable of discriminating microbial communities quickly
and relatively inexpensively (1–3). T-RFLP is increasingly
used in high-throughput studies of microbial communities
in combination with or even in lieu of clone library
analysis (4,5). Briefly, the method involves PCR amplification of a gene of interest (often 16S rRNA genes) with
fluorescent dye-labeled primers, followed by multiple
single restriction digests done in parallel. The resulting
fragments are then separated by capillary electrophoresis
with an internal size standard to determine the lengths of
the terminal (fluorescently labeled) fragments. Each
distinct terminal restriction fragment is considered an
operational taxonomic unit (OTU), thus the choice of
*To whom correspondence should be addressed. Tel: 206 685 9994; Fax: 206 685 6651; Email:
ß 2007 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/
by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Terminal restriction fragment length polymorphism
(T-RFLP) analysis is a widespread technique for
rapidly fingerprinting microbial communities. Users
of T-RFLP frequently overlook the resolving power
of well-chosen restriction endonucleases and often
fail to report how they chose their enzymes. REPK
(Restriction Endonuclease Picker) assists in the
rational choice of restriction endonucleases for
T-RFLP by finding sets of four restriction endonucleases that together uniquely differentiate userdesignated sequence groups. With REPK, users can
provide their own sequences (of any gene, not just
16S rRNA), specify the taxonomic rank of interest
and choose from a number of filtering options to
further narrow down the enzyme selection. Bug
tracking is provided, and the source code is open
and accessible under the GNU Public License v.2,
at http://code.google.com/p/repk. The web server
is available without access restrictions at http://
rocaplab.ocean.washington.edu/tools/repk.
restriction enzymes can impact the number of OTUs
observed in each sample and the calculation of diversity
statistics.
When analyzing uncharacterized and very diverse
bacterial communities, sufficient community discrimination can often be accomplished with multiple randomlychosen tetrameric restriction enzymes (6). However, a
brief review of the literature indicates that there is still no
standard in even this simplified case. We examined
26 papers (1–5,7–26) that were published between 1997
and 2007 and used T-RFLP. Of those papers, 38% used
universal bacterial primers combined with a single
restriction enzyme, but the choice of enzyme was not
consistent. MspI was used most frequently (four studies),
followed by TaqI (two studies), and one study each used
AluI, CfoI, HhaI and HaeIII. Overall, only three of
the 26 papers included a rationalization of enzyme
selection (1,2,17).
An alternate approach to T-RFLP can be taken if the
microbial community has been characterized (by clone
library analysis or by prediction from previous studies) or
if a particular taxonomic group is being targeted with
specific primers. In this case, a more reasoned choice of
restriction enzymes can be conducted. In particular,
specific species or microbial taxa of interest to the
researcher—particularly closely related taxa that may
share some restriction sites—can often be differentiated
if the proper restriction enzymes are selected.
There are, however, few resources available to narrow
down the selection process. Over 600 Type II restriction
enzymes are commercially available, accounting for 262
distinct specificities (27). Existing computer programs for
assisting in the choice of restriction enzymes include TAPTRFLP (28), MiCA Enzyme Resolving Power Analysis
(http://mica.ibest.uidaho.edu) and TRF-CUT (29). These
programs perform in silico restriction digestions of a
predefined sequence database or user-provided sequences,
but these results must still be manually examined to
determine which enzymes are best suited to discriminate
that set of sequences. CLEAVER (30), a stand alone
program, provides the above features as well as the ability
Nucleic Acids Research, 2007, Vol. 35, Web Server issue W59
to assign sequences to taxonomic groups at multiple levels
and to search for enzymes that cut one group but not
another group. However, it is limited to comparing only
two groups at once. Restriction Endonuclease Picker
(REPK) addresses this gap by finding enzymes that are
able to discriminate an unlimited number of userdesignated sequence groups on the basis of their terminal
restriction fragment lengths. If no single enzyme can
discriminate all groups, REPK reports sets of four
restriction enzymes that together are able to differentiate
the groups of interest. An important component of REPK
is this ability to specify the taxonomic rank of sequences
to be differentiated, which is particularly useful in the case
where a diverse microbial community has been characterized by clone library analysis or there is an existing
database of several subgroups of sequences that amplify
with the same specific primers.
Finally, users can define their own custom enzymes if
they are not included in the standard list. The default
(all standard enzymes) was used for the example in
Figure 1. For computational efficiency isoschizomers are
grouped by cleavage site.
The final output is refined by setting several options.
Some of these, the minimum and maximum allowable
fragment lengths and the maximum difference in size
between two fragments that will still be considered the
‘same’ fragment, will be dependent on the specifications
and resolving power of particular capillary electrophoresis
systems. Users can also set the minimum threshold for
the number of groups each enzyme must be able to
discriminate on its own (the enzyme stringency), and the
number of groups allowed to remain undifferentiated in
the case that no ‘perfect’ enzyme groups are discovered.
SITE USAGE AND EXAMPLES
A complete manual and example input files are provided
on the REPK website (http:// rocaplab.ocean.washington.
edu/tools/repk). The example shown in Figure 1 was
prepared using REPK v. 1.0, with the following operating
parameters (also the defaults): example sequence file
(alignment5.txt), all commercially available Type IIP
enzymes (REBASE Versio (...truncated)