CRISPRcompar: a website to compare clustered regularly interspaced short palindromic repeats
Published online 28 April 2008
Nucleic Acids Research, 2008, Vol. 36, Web Server issue W145–W148
doi:10.1093/nar/gkn228
CRISPRcompar: a website to compare clustered
regularly interspaced short palindromic repeats
Ibtissem Grissa1,*, Gilles Vergnaud1,2 and Christine Pourcel1
1
Univ. Paris-Sud 11, CNRS, UMR8621, Institut de Génétique et Microbiologie, 91405 Orsay and 2DGA/D4S Mission pour la Recherche et l’Innovation Scientifique, 7, rue des Mathurins, 00470 Armées, France
Received January 25, 2008; Revised April 6, 2008; Accepted April 11, 2008
ABSTRACT
Clustered regularly interspaced short palindromic
repeat (CRISPR) elements are a particular family
of tandem repeats present in prokaryotic genomes,
in almost all archaea and in about half of bacteria,
and which participate in a mechanism of acquired
resistance against phages. They consist in a
succession of direct repeats (DR) of 24–47 bp
separated by similar sized unique sequences
(spacers). In the large majority of cases, the direct
repeats are highly conserved, while the number and
nature of the spacers are often quite diverse, even
among strains of a same species. Furthermore, the
acquisition of new units (DR + spacer) was shown
to happen almost exclusively on one side of the
locus. Therefore, the CRISPR presents an interesting genetic marker for comparative and evolutionary analysis of closely related bacterial strains.
CRISPRcompar is a web service created to
assist biologists in the CRISPR typing process.
Two tools facilitates the in silico investigation:
CRISPRcomparison and CRISPRtionary. This website is freely accessible at http://crispr.u-psud.fr/
CRISPRcompar/.
INTRODUCTION
The clustered regularly interspaced short palindromic
repeat (CRISPR)-associated system (CASS) comprises the
particular repeated element CRISPR itself, the promoter
for its transcription (also called the leader) and a set of cas
genes responsible for its maintenance and function (1,2).
It is found in most Archea and 40% bacteria, and is linked
to a mechanism of acquired resistance against bacteriophages (3). Some genomes harbour a significant number
of CRISPRs [18 in Methanocaldococcus jannaschii DSM
2661with three different direct repeats (DRs)] (4). When
different CRISPRs with the same DR are present in a
genome, they have a very similar leader, generally different
spacers, and only one is associated with cas genes (5).
When CRISPRs from different CRISPR families exist in
the same genome, one set of cas genes specific for each
family is present. Finally, within a species, different strains
may have different CRISPRs. The example of the three
sequenced strains of Streptococcus thermophilus is very
illustrative of this situation, since three CRISPRs were
identified in this species but only strain LMD-9 possesses
the three of them (4).
CRISPRs evolve either by deletion or acquisition of
units (a DR and a spacer) following a mechanism
proposed firstly by Pourcel et al. (6) and recently
confirmed (7–9). In the majority of cases, new units are
added at one end of the CRISPR adjacent to the leader,
whereas motif deletions can occur randomly. The independent acquisition of the same spacer twice is possible
but is not frequent and easily detected. Thus, the presence
of identical spacers in the same CRISPR locus in distinct
strains reflects shared ancestry.
The polymorphism of CRISPRs can be used for
molecular typing. The standard and classical technology
developed for Mycobacterium tuberculosis typing (10) is the
spoligotyping, which consists in detecting the presence/
absence of a range of spacers. This technique and other
PCR-based typing methods have been applied in CRISPR
genotyping to study other bacterial species (6,11–16).
We recently implemented a program (CRISPRFinder)
allowing the identification of a CRISPR structure based
on a thorough characterization of its components, i.e.
the DR and the spacers (17). Using this program, public
genome sequences are analysed and the extracted
CRISPRs are stored into a database (CRISPRdb) (4).
CRISPRFinder and CRISPRdb are accessible on the web
together with different tools that assist in recovering
spacers and DR sequences, and blasting them against
Genbank.
We now report on the development of a new website
dedicated to the comparison of CRISPRs between strains
and the labelling of spacers when multiple alleles are
analysed.
CRISPRcompar is freely accessible at http://crispr.
u-psud.fr/CRISPRcompar/index.php.
*To whom correspondence should be addressed. Tel: +33 1 69 15 30 01; Fax: +33 1 69 15 66 78; Email:
ß 2008 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/
by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
W146 Nucleic Acids Research, 2008, Vol. 36, Web Server issue
METHODS AND IMPLEMENTATION
CRISPRcompar is a friendly web resource offering
tools to compare CRISPRs between strains of a given
species or between closely related species, and to classify
the spacers. Its core routines were developed in Perl
under Debian Linux. It is composed of two main
applications; CRISPRcomparison and CRISPRtionary.
CRISPRcomparison identifies and compares the
CRISPRs of two or more genomes (complete or partial
sequences). It is particularly useful when strains of a
species possess several CRISPRs for which positions on
the genome might vary, as a result for instance of largescale genome rearrangements, or of presence–absence
polymorphism of CRISPR loci in the genomes of interest.
The similarity criteria are based on having an identical
consensus DR and similar flanking sequences. The flanking sequences are compared by the ClustalW alignment
of the 200 bp adjacent sequences to the CRISPR with
a threshold of 90% of similarity. In the majority of cases,
when multiple CRISPRs with the same DR are present in
a genome, only one flanking sequence is similar, the one
corresponding to the leader.
CRISPRtionary lists the spacers from different alleles
derived from the same CRISPR locus and annotates them
in a polarized fashion. Such data will be produced for
instance when investigating the diversity (evolution) of
CRISPRs within a species by sequencing the locus in
different isolates. This tool can then be used to automatically number spacers, produce a ‘dictionary’ or
repertoire of spacers and code the alleles using this
dictionary. CRISPRFinder is used to identify the DR
and order the spacers according to the DR sequence.
When sequencing PCR products, the first few nucleotides
may be missed or the data may be of poor quality.
In addition, the first, often partial and degenerated DR
(up to 50% of differences have been observed) may be
missed by CRISPRFinder in this context. For this reason,
a filter exploring the existence of stretches of additional
DR in the flanking sequence was added so as to correctly
i (...truncated)