CRISPRcompar: a website to compare clustered regularly interspaced short palindromic repeats (pdf)

Article PDF cannot be displayed. You can download it here:

https://academic.oup.com/nar/article-pdf/36/suppl_2/W145/7624939/gkn228.pdf

CRISPRcompar: a website to compare clustered regularly interspaced short palindromic repeats

Published online 28 April 2008 Nucleic Acids Research, 2008, Vol. 36, Web Server issue W145–W148 doi:10.1093/nar/gkn228 CRISPRcompar: a website to compare clustered regularly interspaced short palindromic repeats Ibtissem Grissa1,*, Gilles Vergnaud1,2 and Christine Pourcel1 1 Univ. Paris-Sud 11, CNRS, UMR8621, Institut de Génétique et Microbiologie, 91405 Orsay and 2DGA/D4S Mission pour la Recherche et l’Innovation Scientifique, 7, rue des Mathurins, 00470 Armées, France Received January 25, 2008; Revised April 6, 2008; Accepted April 11, 2008 ABSTRACT Clustered regularly interspaced short palindromic repeat (CRISPR) elements are a particular family of tandem repeats present in prokaryotic genomes, in almost all archaea and in about half of bacteria, and which participate in a mechanism of acquired resistance against phages. They consist in a succession of direct repeats (DR) of 24–47 bp separated by similar sized unique sequences (spacers). In the large majority of cases, the direct repeats are highly conserved, while the number and nature of the spacers are often quite diverse, even among strains of a same species. Furthermore, the acquisition of new units (DR + spacer) was shown to happen almost exclusively on one side of the locus. Therefore, the CRISPR presents an interesting genetic marker for comparative and evolutionary analysis of closely related bacterial strains. CRISPRcompar is a web service created to assist biologists in the CRISPR typing process. Two tools facilitates the in silico investigation: CRISPRcomparison and CRISPRtionary. This website is freely accessible at http://crispr.u-psud.fr/ CRISPRcompar/. INTRODUCTION The clustered regularly interspaced short palindromic repeat (CRISPR)-associated system (CASS) comprises the particular repeated element CRISPR itself, the promoter for its transcription (also called the leader) and a set of cas genes responsible for its maintenance and function (1,2). It is found in most Archea and 40% bacteria, and is linked to a mechanism of acquired resistance against bacteriophages (3). Some genomes harbour a signiﬁcant number of CRISPRs [18 in Methanocaldococcus jannaschii DSM 2661with three diﬀerent direct repeats (DRs)] (4). When diﬀerent CRISPRs with the same DR are present in a genome, they have a very similar leader, generally diﬀerent spacers, and only one is associated with cas genes (5). When CRISPRs from diﬀerent CRISPR families exist in the same genome, one set of cas genes speciﬁc for each family is present. Finally, within a species, diﬀerent strains may have diﬀerent CRISPRs. The example of the three sequenced strains of Streptococcus thermophilus is very illustrative of this situation, since three CRISPRs were identiﬁed in this species but only strain LMD-9 possesses the three of them (4). CRISPRs evolve either by deletion or acquisition of units (a DR and a spacer) following a mechanism proposed ﬁrstly by Pourcel et al. (6) and recently conﬁrmed (7–9). In the majority of cases, new units are added at one end of the CRISPR adjacent to the leader, whereas motif deletions can occur randomly. The independent acquisition of the same spacer twice is possible but is not frequent and easily detected. Thus, the presence of identical spacers in the same CRISPR locus in distinct strains reﬂects shared ancestry. The polymorphism of CRISPRs can be used for molecular typing. The standard and classical technology developed for Mycobacterium tuberculosis typing (10) is the spoligotyping, which consists in detecting the presence/ absence of a range of spacers. This technique and other PCR-based typing methods have been applied in CRISPR genotyping to study other bacterial species (6,11–16). We recently implemented a program (CRISPRFinder) allowing the identiﬁcation of a CRISPR structure based on a thorough characterization of its components, i.e. the DR and the spacers (17). Using this program, public genome sequences are analysed and the extracted CRISPRs are stored into a database (CRISPRdb) (4). CRISPRFinder and CRISPRdb are accessible on the web together with diﬀerent tools that assist in recovering spacers and DR sequences, and blasting them against Genbank. We now report on the development of a new website dedicated to the comparison of CRISPRs between strains and the labelling of spacers when multiple alleles are analysed. CRISPRcompar is freely accessible at http://crispr. u-psud.fr/CRISPRcompar/index.php. *To whom correspondence should be addressed. Tel: +33 1 69 15 30 01; Fax: +33 1 69 15 66 78; Email: ß 2008 The Author(s) This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. W146 Nucleic Acids Research, 2008, Vol. 36, Web Server issue METHODS AND IMPLEMENTATION CRISPRcompar is a friendly web resource oﬀering tools to compare CRISPRs between strains of a given species or between closely related species, and to classify the spacers. Its core routines were developed in Perl under Debian Linux. It is composed of two main applications; CRISPRcomparison and CRISPRtionary. CRISPRcomparison identiﬁes and compares the CRISPRs of two or more genomes (complete or partial sequences). It is particularly useful when strains of a species possess several CRISPRs for which positions on the genome might vary, as a result for instance of largescale genome rearrangements, or of presence–absence polymorphism of CRISPR loci in the genomes of interest. The similarity criteria are based on having an identical consensus DR and similar ﬂanking sequences. The ﬂanking sequences are compared by the ClustalW alignment of the 200 bp adjacent sequences to the CRISPR with a threshold of 90% of similarity. In the majority of cases, when multiple CRISPRs with the same DR are present in a genome, only one ﬂanking sequence is similar, the one corresponding to the leader. CRISPRtionary lists the spacers from diﬀerent alleles derived from the same CRISPR locus and annotates them in a polarized fashion. Such data will be produced for instance when investigating the diversity (evolution) of CRISPRs within a species by sequencing the locus in diﬀerent isolates. This tool can then be used to automatically number spacers, produce a ‘dictionary’ or repertoire of spacers and code the alleles using this dictionary. CRISPRFinder is used to identify the DR and order the spacers according to the DR sequence. When sequencing PCR products, the ﬁrst few nucleotides may be missed or the data may be of poor quality. In addition, the ﬁrst, often partial and degenerated DR (up to 50% of diﬀerences have been observed) may be missed by CRISPRFinder in this context. For this reason, a ﬁlter exploring the existence of stretches of additional DR in the ﬂanking sequence was added so as to correctly i (...truncated)