SNP2CAPS: a SNP and INDEL analysis tool for CAPS marker development
Published online January 2, 2004
Nucleic Acids Research, 2004, Vol. 32, No. 1 e5
DOI: 10.1093/nar/gnh006
SNP2CAPS: a SNP and INDEL analysis tool for CAPS
marker development
Thomas Thiel, Raja Kota, Ivo Grosse, Nils Stein and Andreas Graner*
Institute for Plant Genetics and Crop Plant Research (IPK), Corrensstrasse 3, 06466 Gatersleben, Germany
Received August 12, 2003; Revised and Accepted November 14, 2003
ABSTRACT
INTRODUCTION
Single nucleotide polymorphisms (SNPs) are the most
frequent form of DNA variation in the genome (1). Because
of their abundance, the exploitation of SNPs for marker assays
has the potential to provide answers to a large number of
important biological, genetic, pharmacological and medical
questions. The identi®cation of SNPs has progressed remarkably over the last several years and multiple assays have been
devised (2±5). However, most of these assays require expensive and specialized equipment and chemicals for analysis.
Hence, there is a need for simple and accurate genotyping
assays that can be implemented in laboratories that do not
have access to sophisticated equipment. A solution to this
problem is the detection of a SNP site by an appropriate
restriction endonuclease whose recognition sequence has been
altered or introduced by the SNP. In combination with a PCR
assay, the corresponding SNP can be analysed as a cleaved
ampli®ed polymorphic sequence (CAPS) marker (6). The
MATERIALS AND METHODS
Description of SNP2CAPS
Two input ®les that contain data about the sequence
alignments and the restriction enzymes are required for
SNP2CAPS. The ®rst input ®le is a modi®ed FASTA
formatted ®le that stores one or more multiple alignments of
sequences of different accessions. In order to ensure compatibility with existing alignment tools, 15 additional multiple
alignment formats (e.g. ClustalW, MSF and MEME) can be
imported using the AlignIO handler of the BioPerl module
v1.2 (http://bioperl.org/). The second input ®le contains data
on the restriction enzymes that can be downloaded in different
*To whom correspondence should be addressed. Tel: +49 39482 5521; Fax: +49 39482 5155; Email:
The authors wish it to be known that, in their opinion, the ®rst two authors should be regarded as joint First Authors
Nucleic Acids Research, Vol. 32 No. 1 ã Oxford University Press 2004; all rights reserved
With the in¯ux of various SNP genotyping assays in
recent years, there has been a need for an assay
that is robust, yet cost effective, and could be
performed using standard gel-based procedures. In
this context, CAPS markers have been shown to
meet these criteria. However, converting SNPs to
CAPS markers can be a dif®cult process if done
manually. In order to address this problem, we
describe a computer program, SNP2CAPS, that
facilitates the computational conversion of SNP
markers into CAPS markers. 413 multiple aligned
sequences derived from barley ESTs were analysed
for the presence of polymorphisms in 235 distinct
restriction sites. 282 (90%) of 314 alignments that
contain sequence variation due to SNPs and InDels
revealed at least one polymorphic restriction site.
After reducing the number of restriction enzymes
from 235 to 10, 31% of the polymorphic sites could
still be detected. In order to demonstrate the usefulness of this tool for marker development, we experimentally validated some of the results predicted by
SNP2CAPS.
costs of a CAPS assay is generally low, especially when it
relies on commonly used restriction enzymes.
In order to facilitate a computational conversion of SNPs
into CAPS markers, a program called dCAPS Finder 2.0 was
previously developed (7). The dCAPS Finder program works
on the principle of designing mismatched PCR primers that
would create or remove a restriction recognition site in the
analysed SNP. The conversion of SNP sites into CAPS
markers by the arti®cial introduction of restriction sites
involves the creation of mismatched primers, whose successful application is not always trivial depending on the number,
positions and types of mismatches.
We present a computer program named SNP2CAPS that
works in a different manner. A simple algorithm involves the
screening of multiply aligned sequences for restriction sites
followed by a selection pipeline that allows the deduction of
CAPS candidates by the identi®cation of putative alternative
restriction patterns. It should be noted that in this algorithm
any primer pair ¯anking the SNP site may be suited for CAPS
marker analysis.
In order to evaluate the ef®cacy of SNP2CAPS, a set of
3045 sequences derived from eight barley accessions based on
413 expressed sequence tags (ESTs) was used and analysed in
terms of (i) the potential number of SNP markers that can be
converted into CAPS markers taking into account all
commercially available restriction enzymes and (ii) the
number of CAPS markers that can be typed if only the 10
most commonly used enzymes are considered. To investigate
the accuracy of this tool, 14 EST-based SNP markers have
been experimentally validated.
e5 Nucleic Acids Research, 2004, Vol. 32, No. 1
Data sets
Sequencing efforts resulted in a set of 413 partially sequenced
genes spanning a total of 153 kb. On average, each locus had a
length of 370 bp. 3045 sequences were obtained by
sequencing three to eight barley accessions per gene locus,
resulting in a total of 1.13 Mb. Sequences were aligned using
the ClustalW program (8).
For computational restriction analysis the GCG format data
®le from the REBASE database (version 304, March 24, 2003)
was used, comprising information on 645 type II restriction
enzymes. In the present study, diagnostic restriction was
investigated using (i) a total of 235 enzymes that are
PAGE 2 OF 5
Figure 1. Illustration of possible scenarios at an EcoRI recognition site
(GAATTC) between two aligned sequences. (a±d) Four different types
that are recognized by the program algorithm: (a) class I, CAPS candidates;
(b) class II, CAPS candidates containing N; (c) class III, false positive
candidates; (d) class IV, no restriction site polymorphisms (due to no or
uniform restriction patterns). (e and f) The role of insertions/deletions for
the analysis of CAPS marker candidates: (e) deletion of the restriction site;
(f) insertion at a restriction site.
non-isoschizomeric and commercially available and (ii) a
subset of the following 10 enzymes that are widely used in
daily bench work: the 4 bp cutters AluI, HpaII, MseI and RsaI
and the 6 bp cutters BamHI, DraI, EcoRI, EcoRV, HindIII and
XbaI.
Plant material
For experimental veri®cation of the SNP2CAPS results a
genotype set of seven barley (Hordeum vulgare ssp. vulgare)
accessions was used, namely the winter barley cultivars `Igri'
and `Franka', the spring barley cultivars `Steptoe', `Morex'
and `Barke', the genetic stocks `Oregon Wolfe Rec' and
`Oregon Wolfe Dom' and accession HOR11508 of wild barley
(H.vulgare ssp. spontaneum). For these eight accessions
genomic DNA was extracted fr (...truncated)