CRISPRcompar: a website to compare clustered regularly interspaced short palindromic repeats
Published online 28 April 2008
Nucleic Acids Research, 2008, Vol. 36, Web Server issue W145–W148
doi:10.1093/nar/gkn228
CRISPRcompar: a website to compare clustered
regularly interspaced short palindromic repeats
Ibtissem Grissa1,*, Gilles Vergnaud1,2 and Christine Pourcel1
1
Univ. Paris-Sud 11, CNRS, UMR8621, Institut de Génétique et Microbiologie, 91405 Orsay and 2DGA/D4S Mission pour la Recherche et l’Innovation Scientifique, 7, rue des Mathurins, 00470 Armées, France
Received January 25, 2008; Revised April 6, 2008; Accepted April 11, 2008
ABSTRACT
INTRODUCTION
The clustered regularly interspaced short palindromic
repeat (CRISPR)-associated system (CASS) comprises the
particular repeated element CRISPR itself, the promoter
for its transcription (also called the leader) and a set of cas
genes responsible for its maintenance and function (1,2).
It is found in most Archea and 40% bacteria, and is linked
to a mechanism of acquired resistance against bacteriophages (3). Some genomes harbour a significant number
of CRISPRs [18 in Methanocaldococcus jannaschii DSM
2661with three different direct repeats (DRs)] (4). When
different CRISPRs with the same DR are present in a
genome, they have a very similar leader, generally different
*To whom correspondence should be addressed. Tel: +33 1 69 15 30 01; Fax: +33 1 69 15 66 78; Email:
ß 2008 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/
by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Clustered regularly interspaced short palindromic
repeat (CRISPR) elements are a particular family
of tandem repeats present in prokaryotic genomes,
in almost all archaea and in about half of bacteria,
and which participate in a mechanism of acquired
resistance against phages. They consist in a
succession of direct repeats (DR) of 24–47 bp
separated by similar sized unique sequences
(spacers). In the large majority of cases, the direct
repeats are highly conserved, while the number and
nature of the spacers are often quite diverse, even
among strains of a same species. Furthermore, the
acquisition of new units (DR + spacer) was shown
to happen almost exclusively on one side of the
locus. Therefore, the CRISPR presents an interesting genetic marker for comparative and evolutionary analysis of closely related bacterial strains.
CRISPRcompar is a web service created to
assist biologists in the CRISPR typing process.
Two tools facilitates the in silico investigation:
CRISPRcomparison and CRISPRtionary. This website is freely accessible at http://crispr.u-psud.fr/
CRISPRcompar/.
spacers, and only one is associated with cas genes (5).
When CRISPRs from different CRISPR families exist in
the same genome, one set of cas genes specific for each
family is present. Finally, within a species, different strains
may have different CRISPRs. The example of the three
sequenced strains of Streptococcus thermophilus is very
illustrative of this situation, since three CRISPRs were
identified in this species but only strain LMD-9 possesses
the three of them (4).
CRISPRs evolve either by deletion or acquisition of
units (a DR and a spacer) following a mechanism
proposed firstly by Pourcel et al. (6) and recently
confirmed (7–9). In the majority of cases, new units are
added at one end of the CRISPR adjacent to the leader,
whereas motif deletions can occur randomly. The independent acquisition of the same spacer twice is possible
but is not frequent and easily detected. Thus, the presence
of identical spacers in the same CRISPR locus in distinct
strains reflects shared ancestry.
The polymorphism of CRISPRs can be used for
molecular typing. The standard and classical technology
developed for Mycobacterium tuberculosis typing (10) is the
spoligotyping, which consists in detecting the presence/
absence of a range of spacers. This technique and other
PCR-based typing methods have been applied in CRISPR
genotyping to study other bacterial species (6,11–16).
We recently implemented a program (CRISPRFinder)
allowing the identification of a CRISPR structure based
on a thorough characterization of its components, i.e.
the DR and the spacers (17). Using this program, public
genome sequences are analysed and the extracted
CRISPRs are stored into a database (CRISPRdb) (4).
CRISPRFinder and CRISPRdb are accessible on the web
together with different tools that assist in recovering
spacers and DR sequences, and blasting them against
Genbank.
We now report on the development of a new website
dedicated to the comparison of CRISPRs between strains
and the labelling of spacers when multiple alleles are
analysed.
CRISPRcompar is freely accessible at http://crispr.
u-psud.fr/CRISPRcompar/index.php.
W146 Nucleic Acids Research, 2008, Vol. 36, Web Server issue
METHODS AND IMPLEMENTATION
Input
The CRISPRcompar program automatically recovers
from CRISPRdb all strains containing a CRISPR and
proposes to compare each of them using the alphabetic list
(alternatively, all strains from a given genus can be
selected at once using the ‘strain taxonomy browser’). To
compare unpublished sequences and genomes, a private
database on the model of CRISPRdb (4) must first be
created (http://crispr.u-psud.fr/CRISPRcompar/private/).
Additional sequences from the private database can then
Output
For the CRISPRcomparison application, the result is
shown in a table where CRISPRs are grouped. Figure 1
shows the result of the comparison of three S. thermophilus strains. Information is given on the CRISPR
position and on the number of repeats (Figure 1A).
A link to the corresponding CRISPR in CRISPRdb can
be activated. When two or more alleles of a given CRISPR
are found, the flanking sequences can be aligned and a link
is provided to the second application ‘CRISPRtionary’
to annotate and classify the spacers. By activating the
‘compare spacer’ button a table is shown in which
the CRISPR sequences are provided in fasta format
(Figure 1B). At this step, it is possible to upload a previous
dictionary of spacers to which the spacers of the new
CRISPR alleles will be compared. If no pre-determined
dictionary exists, one will be created in the following
steps. With the FindCRISPR button, the CRISPRFinder
program is used to identify DRs and spacers. Often more
than one DR candidate will be proposed for several
reasons. One is due to the existence of several possible
DRs, especially with short sequences (less than four units)
and another is due to the CRISPR orientation on the
genome. Indeed, when the submitted alleles are in different
orientations, two DR sequences will be proposed. Therefore, the user should select the appropriate consensus DR
or introduce a DR sequence. The ‘find spacer’ button
leads to a page where spacers are labelled (Figure 1C)
and different files can be recovered: (i) diffe (...truncated)