Asymmetric positioning of Cas1–2 complex and Integration Host Factor induced DNA bending guide the unidirectional homing of protospacer in CRISPR-Cas type I-E system

Nucleic Acids Research, Jan 2017

CRISPR–Cas system epitomizes prokaryote-specific quintessential adaptive defense machinery that limits the genome invasion of mobile genetic elements. It confers adaptive immunity to bacteria by capturing a protospacer fragment from invading foreign DNA, which is later inserted into the leader proximal end of CRIPSR array and serves as immunological memory to recognize recurrent invasions. The universally conserved Cas1 and Cas2 form an integration complex that is known to mediate the protospacer invasion into the CRISPR array. However, the mechanism by which this protospacer fragment gets integrated in a directional fashion into the leader proximal end is elusive. Here, we employ CRISPR/dCas9 mediated immunoprecipitation and genetic analysis to identify Integration Host Factor (IHF) as an indispensable accessory factor for spacer acquisition in Escherichia coli. Further, we show that the leader region abutting the first CRISPR repeat localizes IHF and Cas1–2 complex. IHF binding to the leader region induces bending by about 120° that in turn engenders the regeneration of the cognate binding site for protospacer bound Cas1–2 complex and brings it in proximity with the first CRISPR repeat. This appears to guide Cas1–2 complex to orient the protospacer invasion towards the leader-repeat junction thus driving the integration in a polarized fashion.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://nar.oxfordjournals.org/content/45/1/367.full.pdf

Asymmetric positioning of Cas1–2 complex and Integration Host Factor induced DNA bending guide the unidirectional homing of protospacer in CRISPR-Cas type I-E system

Nucleic Acids Research Asymmetric positioning of Cas1-2 complex and Integration Host Factor induced DNA bending guide the unidirectional homing of protospacer in CRISPR-Cas type I-E system K.N.R. Yoganand 0 R. Sivathanu 0 Siddharth Nimkar 0 B. Anand 0 0 Department of Biosciences and Bioengineering, Indian Institute of Technology Guwahati , Guwahati 781039, Assam , India CRISPR-Cas system epitomizes prokaryote-specific quintessential adaptive defense machinery that limits the genome invasion of mobile genetic elements. It confers adaptive immunity to bacteria by capturing a protospacer fragment from invading foreign DNA, which is later inserted into the leader proximal end of CRIPSR array and serves as immunological memory to recognize recurrent invasions. The universally conserved Cas1 and Cas2 form an integration complex that is known to mediate the protospacer invasion into the CRISPR array. However, the mechanism by which this protospacer fragment gets integrated in a directional fashion into the leader proximal end is elusive. Here, we employ CRISPR/dCas9 mediated immunoprecipitation and genetic analysis to identify Integration Host Factor (IHF) as an indispensable accessory factor for spacer acquisition in Escherichia coli. Further, we show that the leader region abutting the first CRISPR repeat localizes IHF and Cas1-2 complex. IHF binding to the leader region induces bending by about 120◦ that in turn engenders the regeneration of the cognate binding site for protospacer bound Cas1-2 complex and brings it in proximity with the first CRISPR repeat. This appears to guide Cas1-2 complex to orient the protospacer invasion towards the leaderrepeat junction thus driving the integration in a polarized fashion. - Archaea and Bacteria defend themselves from the assault of phages and plasmids by employing CRISPR–Cas adaptive immune system (1–5). CRISPR constitutes an array of direct repeats (each of ∼30–40 bp) that are intervened by similarly sized variable spacer sequences. The spacer sequences are captured from the invading foreign DNA and they serve as immunological memory––akin to antibodies in higher organisms––to mount retaliation during recurrent infection (6,7). Several studies in the recent past revealed that CRISPR interference proceeds via three stages: (i) adaptation, (ii) maturation and (iii) interference. Immunological memory is generated during adaptation wherein short stretches of DNA from invaders (protospacer) is acquired and incorporated into the CRISPR locus. This is followed by the transcription and processing of the pre-CRISPR RNA transcript that generates the mature CRISPR RNA (crRNA) onto which several Cas proteins assemble to form a ribonucleoprotein (RNP) surveillance complex. The crRNA within the RNP guides the target recognition by base complementarity whereas the protein components facilitate the cleavage of the target DNA (2–5,8). While the adaptation constitutes the cornerstone of CRISPR–Cas system by expanding the immunological memory, it is also the less well-understood process than the other two (8,9). The adaptation process can be envisaged to encompass two subsets of events: the uptake of protospacer fragments from the foreign DNA and their subsequent insertion into the CRISPR array. The generation of protospacer fragments from the foreign DNA in Escherichia coli (Type I-E) involves the RecBCD nuclease activity (10); however, it appears that only those fragments of about 33 bp DNA that border the protospacer adjacent motif (PAM) are captured and integrated into the CRISPR array (11– 15). In Type-I, Type-II and Type-V CRISPR–Cas systems (4), the PAM comprises of short stretch (2–5 nucleotides) of conserved sequence present on either upstream or downstream of acquired protospacer element (9,13,16– 19). This sequence is varied among different species and assist in discriminating self- versus non-self during interference step. Point mutations in PAM and protospacer of invading nucleic acid elements lead to imperfect pairing and abrogate target cleavage by interference complex. This mismatched priming leads to acquisition of new spacers more rapidly and efficiently from the mutated invader by a process termed as ‘primed acquisition’. This feedback loop mechanism in addition to na¨ıve adaptation (or non-primed adaptation) effectively aids the bacteria to counter mutated phages (9,11,19,20). Two of the highly conserved Cas proteins, Cas1 and Cas2 form a complex (Cas1–2 complex) that captures the protospacer element and promotes its insertion into the CRISPR array. Here, Cas1 is shown to function like an integrase and Cas2 provides a structural scaffold that stimulates the catalytic activity of Cas1 (21–23). This complex structure acts as a molecular ruler that appears to determine the length of the acquired protospacer element (21,23). Nucleophilic attack mediated by free 3 OH ends of protospacers integrates them into the repeatspacer array (21,22,24). In Type I-E, in order to mediate na¨ıve adaptation, Cas1–2 complex alone is sufficient whereas the active interference complex is indispensable for primed acquisition (11,13). On the contrary, TypeIB, Type-IF and Type-II systems require all the Cas proteins (including maturation and interference proteins) for the incorporation of new spacer in vivo (25–28). In addition to the involvement of Cas proteins, recent studies have highlighted the importance of host-encoded proteins in CRISPR immunity. A nucleoid protein H-NS was shown to control CRISPR immunity by regulating Cas operon expression (29). Recent study also demonstrated the requirement of genome stability proteins like RecG helicase and PriA in E. coli primed acquisition (30). Physical and genetic interaction studies performed on E. coli Cas1 revealed its interaction with various DNA repair pathway proteins, viz., RuvB, RecB, RecC and others (31). While Cas1–2 complex seems to be essential, it is not sufficient for the spacer uptake in vivo. Sequences upstream of the first CRISPR repeat (referred as leader) are shown to harbor DNA elements critical for adaptation process (13,32,33). Despite the presence of several repeat-spacer units in the CRISPR array, the site of integration of new protospacer has always been at the leader-repeat junction resulting in the integration of the protospacer and concomitant duplication of the first repeat (11–15). This polarization preserves the chronology of the integration events such that the newest protospacer is closer to the leader proximal end and the oldest protospacer at the distal end. Intriguingly, while Cas1–2 complex alone is sufficient for the integration of protospacer elements in E. coli and shown to have intrinsic sequence specificity in vitro (24), it lacks the homing site specificity towards the leaderrepeat junction leading to integration at all CRISPR repeats (22). This hints at the involvement of accessory factors that bring in specificity towards the integration site for the invading protospacers. Recently, Integration Host Factor (IHF) was shown to act as an essential accessory factor that determines the specificity of protospacer acquisition in E. coli (34). Here, based on CRISPR/dCas9 mediated immunoprecipitation (35) and biochemical analysis, we were independently led to identify IHF as an essential factor in protospacer acquisition. We further show that the leader region harbours binding site for IHF and that it participates in the protospacer acquisition by bending the leader region by about 120◦ to produce a reversal in the direction of DNA. This brings the Cas1–2 complex, which is also localized adjacent to IHF, into proximity with the first CRIPSR repeat favouring the nucleophilic attack of the invading protospacer on the leader-repeat junction. MATERIALS AND METHODS Construction of bacterial strains and plasmids Descriptions of the strains, plasmids and oligonucleotides are listed in supplementary Tables S2–S4, respectively. Escherichia coli IYB5101 (referred as Wt) (13) was used as parental strain for all the genomic manipulations, unless specified otherwise. Knock-out strains of ihfα ( IHF ) and ihfβ ( IHF ) were created using Red recombineering (36). Keio collection strains (37) carrying deletions of ihfα and ihfβ were used as templates for amplification of kanamycin resistant cassettes along with 100–130 bp flanking sequence. Amplified cassettes were used to transform Red recombinase expressing E. coli IYB5101 to create IHF and IHF strains. pdCas9-bacteria (38) was modified with the construct encoding 3XFLAG-dCas9-StrepII. Overlap extension PCR was used to generate a 166 bp DNA fragment encoding a gRNA complementary to a region that is 86 bp upstream of first CRISPR repeat in E. coli BL21-AI (NCBI accession: NC 012947.1, nucleotide positions: 1002800– 1003800). This region was inserted in between SpeI and HindIII sites of the pgRNA-bacteria (38) to create the pgRNA-leader. pCSIR-T (14) was used as template to amplify Wt array, IHF binding site mutants (IBS and IBS) as well as Cas binding site variants of the leader regions denoted as CBS1 (−34 to −45 nt), CBS2 (−46 to −57 nt), CBS3 (−58 to −69 nt), CBS2(L) (−54 to −57 nt), CBS2(C) (−50 to −53 nt) and CBS2(R) (−46 to −49 nt)––the nucleotide positions are from the leader-repeat junction. All the amplified Wt and mutant arrays were individually inserted in between KpnI/PstI sites in pOSIP-CT (39) and subsequently integrated into Phi 21 (P21) locus of E. coli IYB5101 strain by a one step process of cloning and integration into attB locus termed as ‘clonetegration’. To generate pBend-Wt and pBend-CBS2, 81 bp complementary oligos (encompassing 69 bp of leader sequence) corresponding to Wt and CBS2 leader sequences were annealed and end filled by PCR, phosphorylated using T4 polynucleotide kinase and inserted into pBend5 using HpaI site (40). Escherichia coli K-12 MG1655 genomic DNA was used as template to amplify genes encoding IHF , IHF , Cas1 and Cas2. To generate p8R-IHF and p1R-IHF , a bicistronic cassette encoding IHF and IHF was amplified and inserted into p8R and p1R using SspI site. Whereas p13SR-Cas1 and p1S-Cas2 were generated by inserting the region encoding Cas1 and Cas2 into p13SR and p1S, respectively, using SspI site. All constructs were verified by sequencing. Expression and purification of proteins Escherichia coli BL21(DE3) harbouring p1R-IHF was grown in terrific broth supplemented with 100 g/ml kanamycin at 37◦C till the OD600 reaches 0.6. At this point, IHF expression was induced with addition of 0.5 mM IPTG and the cells were allowed to grow for 4 h at 37◦C. Thereafter, the cells were harvested and resuspended in IHF binding buffer (20 mM Tris–Cl pH 8, 150 mM NaCl, 10% glycerol, 1 mM PMSF and 6 mM -mercaptoethanol). The cells were then subjected to lysis by sonication and clarified soluble extract was loaded on to 5 ml StrepTrap HP column (GE Healthcare). After loading, column was washed with IHF binding buffer and proteins were eluted with IHF binding buffer containing 2.5 mM D-desthiobiotin (Sigma). Eluted protein fractions were pooled up and loaded on to 5 ml HiTrap Heparin HP column (GE Healthcare). The column was washed with IHF binding buffer and bound proteins were eluted with a linear gradient of 0.15 – 2 M NaCl in IHF binding buffer. Purified fractions were pooled and dialyzed against IHF binding buffer. Dialyzed protein was concentrated, flash frozen and stored at −80◦C until required. In order to express Cas1, E. coli BL21(DE3) harbouring p13SR-Cas1 was grown until OD600 = 0.6 at 37◦C in auto-induction media supplemented with 100 g/ml spectinomycin. Thereafter, growth and induction were continued for 16 h more at 16◦C. Subsequently, cells were harvested and resuspended in Buffer 1A (20 mM HEPES–NaOH pH 7.4, 500 mM KCl, 10% glycerol, 1 mM PMSF and 1 mM DTT) and lysed by sonication. The clarified soluble cell extract was loaded on to 5ml StrepTrap HP column, which was then washed with Buffer 1A. Proteins were eluted with Buffer 1A containing 2.5 mM D-desthiobiotin. Eluted protein fractions were dialyzed against Buffer 1B (20 mM HEPES–NaOH pH 7.4, 50 mM KCl, 10% glycerol and 1 mM DTT) and loaded onto 5 ml HiTrap Heparin HP column. Protein loaded columns were washed with Buffer 1B and bound proteins were eluted with a linear gradient of 0.05–2 M KCl in Buffer 1B. Purified fractions were pooled up and dialyzed against buffer containing 20 mM HEPES–NaOH pH 7.4, 150 mM KCl, 10% glycerol and 1 mM DTT. Dialyzed protein was concentrated, snap frozen and stored at −80◦C until required. For purification of C-terminal Strep-II tagged Cas2, E. coli BL21(DE3) harbouring p1S-Cas2 was grown until OD600 = 0.6 in auto-induction media supplemented with 100 g/ml kanamycin at 37◦C. Thereafter, growth and induction were continued for 16 h more at 16◦C. Subsequently, the cells were harvested and resuspended in Buffer 2A (20 mM HEPES–NaOH pH 7.4, 500 mM KCl, 10% glycerol, 10 mM Imidazole, 1 mM PMSF and 1 mM DTT) and lysed by sonication. The clarified soluble extract was loaded onto 5 ml HiTrap IMAC HP column (GE Healthcare). After loading, column was washed with Buffer 2A and proteins were eluted using a linear gradient of imidazole (0.01–0.5 M) in Buffer 2A. Purified fractions were pooled up and mixed with TEV protease (in 10:1 ratio of His-SUMO-Cas2-strep: TEV) and incubation was continued during dialysis with Buffer 2A at 4◦C overnight. Dialyzed protein mixture was loaded onto 5 ml HiTrap IMAC HP column 5× times to allow binding of Histagged SUMO-Cas2-Strep, SUMO and TEV protease. Subsequently, a 5 ml StrepTrap HP column was connected tandemly and protein mixture was allowed to pass 5× more times. The C-terminal strep-tagged Cas2 was later eluted with Buffer 2B (20 mM HEPES–NaOH pH 7.4, 500 mM KCl, 10% glycerol, 2.5 mM D-desthiobiotin and 1 mM DTT). Eluted fractions were pooled up and dialyzed against buffer containing 20 mM HEPES–NaOH pH 7.4, 150 mM KCl, 10% glycerol and 1 mM DTT. Dialyzed protein was concentrated, snap frozen and stored at −80◦C until required. CRISPR/dCas9 mediated immunoprecipitation Escherichia coli BL21-AI was transformed with p3XFdCas9, pgRNA-leader and pCas1–2[K] (14) and was allowed to grow in a shaker operated at 180 rpm till OD600 = 0.6 at 37◦C in LB media supplemented with 0.2% Larabinose, 0.1 mM IPTG, 25 g/ml chloramphenicol, 100 g/ml ampicillin and 50 g/ml spectinomycin. 100 ng/ml anhydrotetracycline was added to induce the expression of 3× FLAG-tagged dCas9 and growth was continued for four more hours to allow dCas9–gRNA complex to anchor on its target site, i.e. upstream region of CRISPR leader. Chemical crosslinking and cell lysis were performed as described previously (41) with few modifications. Formaldehyde was added to a final concentration of 1% to crosslink proximally interacting nucleic acids and proteins. Crosslinking was continued for 20 min at 25◦C with gentle rocking. Glycine was added to a final concentration of 0.5 M and incubation was continued for 5 min at 25◦C to quench the crosslinking reaction. 10 ml cells were centrifuged at 2500 g at 4◦C for 5 min and the pellet was washed twice with equal volume of buffer W (20 mM Tris–Cl pH 7.5 and 150 mM NaCl). Pelleted cells were resuspended in 1ml buffer L (10 mM Tris–Cl pH 8.0, 20% sucrose, 50 mM NaCl, 10 mM EDTA, 10 mg/ml lysozyme) and incubated at 37◦C for 30 min. Lysate was resuspended in 4 ml of buffer R (50 mM HEPES–KOH pH 7.5, 150 mM NaCl, 1 mM EDTA, 1% Triton-X 100, 0.1% sodium deoxycholate, 1 mM PMSF and 0.1% SDS). The cells were subjected to sonication for four rounds of 15 × 1 s pulses with 2 min pause between each round in Vibra-cell probe sonicator that was set at 33% amplitude. Clarified supernatant containing sheared DNAprotein complex was separated by centrifugation. 800 l of supernatant was mixed with 200 l of protein G dyna beads (Life technologies) conjugated with 20 g of anti-FLAG M2 antibody (Sigma) and rocked gently at 4◦C overnight. Incubated beads were separated by centrifugation and washed twice each with 1 ml of Low Salt Wash Buffer (20 mM Tris–Cl pH 8.0, 150 mM NaCl, 2 mM EDTA, 1% TritonX-100, 0.1% SDS), High Salt Wash Buffer (20 mM Tris–Cl pH 8.0, 500 mM NaCl, 2 mM EDTA, 1% TritonX-100, 0.1% SDS), LiCl Wash Buffer (10 mM Tris– Cl pH 8.0, 250 mM LiCl, 1 mM EDTA, 0.5% Nonidet P40 (NP-40), 0.5% sodium deoxycholate), and TBS Buffer (50 mM Tris, pH 7.5, 150 mM NaCl) with 0.1% NP-40 as described previously (35). In the final step, beads were separated by centrifugation and resuspended in 100 l of buffer containing 20 mM Tris–Cl pH 8 and 150 mM NaCl. 30 l of resuspended beads were mixed with 10 l of 4× SDS sample buffer and heated at 95◦C for 30 min to reverse crosslink and denature the proteins. Heated mixture was loaded on to SDS-PAGE and electrophoresed to enter stacking gel. The part of stacking gel containing the proteins was sliced and analyzed by mass spectrometry for the identification of protein factors in the sample. Spacer acquisition assays In vivo acquisition assays were performed as described earlier (13). Briefly, three cycles of growth and induction was performed with E. coli IYB5101 (Wt) or its variants carrying pCas1–2[K] (14) at 37◦C for 16 h in LB media supplemented with 50 g/ml spectinomycin, 0.2% Larabinose and 0.1 mM IPTG. In between each cycle, cultures were diluted to 1:300 times with fresh LB media containing aforementioned supplements and growth was continued for 16 h. For IHF complementation experiments, IHF and IHF strains were transformed with p8RIHF and pCas1–2[K] and 3 cycles of inductions were performed as discussed above. To monitor CRISPR array expansion, 200 l of induced cells were collected after cycle 3 and washed thrice and resuspended in distilled water. These cells were used as template for PCR to monitor CRISPR array expansion either in CRISPR 2.1 array (in case of Wt, IHF and IHF strains) or P21 locus integrated CRISPR DNA (in case of Wt, IBS, IBS, CBS13, CBS2(L), CBS2(C) and CBS2(R) strains). All the PCR amplified samples were separated on 1.5% agarose gels to identify parental and expanded arrays (parental array + 61 bp). Electrophoretic mobility shift assays Wt or mutant leader DNA (IBS, IBS and CBS13) was PCR amplified from the strain carrying the respective construct that is integrated into P21 locus. 14 nM of amplified DNA was incubated with increasing concentration of purified IHF (0, 0.2, 0.3, 0.4, 0.5, 0.6, 0.8, 1.0, 1.2, 1.4 and 1.6 M) in buffer containing 0.5X TBE (50 mM Tris–Cl pH 8.3, 50 mM boric acid and 1mM EDTA), 100 mM KCl, 10% glycerol and 5 g/ml BSA for 30 min at 25◦C. Post-incubation samples were directly loaded on 8% native acrylamide gel and electrophoresed in 1X TBE at 4◦C. Gels were post-stained with ethidium bromide (EtBr) and DNA bands were visualized in gel documentation system (Bio-Rad). FRET-based monitoring of DNA bending A 35 bp DNA encompassing leader sequence (−4 to −38 from the leader-repeat junction) of Wt (or Wt without quencher or IBS) was assembled from three oligos by annealing and end labeled with 3 ,6-FAM and 5 Iowa Black (IDT). 222 nM of DNA was incubated with increasing concentrations of purified IHF (0, 0.3, 0.6, 0.9, 1.2, 1.5 and 1.8 M) in buffer containing 0.5× TBE, 100 mM KCl, 10% glycerol and 5 g/ml BSA for 20 min at 25◦C. Postincubation samples were excited at 495 nm and emission was monitored from 500 to 600 nm, with averaging over three scans in FluoroMax-4 spectrofluorometer (Horiba Scientific, Edison, NJ, USA). The slit width used for excitation and emission was 2 and 7 nm, respectively. In order to further ascertain that the enhanced quenching is due to IHF mediated DNA bending; a fluorescence recovery assay was designed. In this assay, buffer (0.5× TBE, 100 mM KCl, 10% glycerol and 5 g/ml BSA) containing 222 nM DNA was excited at 495 nm and emission was captured for 200 s at 520 nm. To this sample, IHF was added to a final concentration of 1.8 M and fluorescence emission was recorded till 600 s. Thereafter, IHF degradation and DNA release were initiated by addition of proteinase K to a final concentration of 1 mg/ml and emission was monitored for another 400 s. After background correction, the fluorescence intensity of DNA in the presence of IHF is normalized relative to that of DNA alone. Estimation of bending angles by circular permutation gel retardation assay pBend-Wt and pBend-CBS2 were digested with HindIII and EcoRI to produce a 329 bp DNA fragment. This fragment was gel purified as per the manufacturer’s instruction (Qiagen) and digested with BamHI, KpnI, SspI, EcoRV, SpeI, BglII and MluI in separate reactions. All the digested DNA samples were further purified (Qiagen) and 21 nM of each DNA was incubated individually with 0.7 M IHF in buffer containing 0.5× TBE, 100 mM KCl, 10% glycerol and 5 g/ml BSA for 30 min at 25◦C. Post-reaction samples were directly loaded on 8% native acrylamide gel and electrophoresed in 1× TBE at 4◦C. Gels were poststained with EtBr and DNA bands were visualized in gel documentation system. IHF bending angles were calculated as described previously (42). Mobilities of IHF bound DNA complex (Rb) and the respective free DNA (Rf) were calculated for all the restriction-digested fragments. Rb values were normalized to the respective Rf values and were plotted against flexure displacement (length from the middle of the binding site to the 5 end of the restriction fragment/total restriction fragment length). The resulting plot was fitted to a quadratic equation: y = ax2 − bx + c, where x and y denotes flexure displacement and Rb/Rf, respectively. The bending angle (α) was calculated using the relationship a = −b = 2c(1 − cosα). Here, we have represented the bending angle ( ) as the average value that was calculated from the parameters a and b. Circular permutation gel retardation assay in the presence of Cas1–2 was performed as described above with few modifications. 210 nM of Cas1–2 was incubated with 21 nM of digested pBend-Wt fragments in the presence or absence of 0.7 M IHF in buffer containing 20 mM HEPES–NaOH pH 7.5, 25 mM KCl, 10 mM MgCl2 and 1 mM DTT for 30 min at 25◦C. Post-reaction samples were directly loaded on 8% native acrylamide gel and electrophoresed in 1× TBE at 4◦C. Gels were poststained with EtBr and DNA bands were visualized in gel documentation system. In vitro integration assay Wt or mutant leader DNA (IBS, IBS and CBS13) was PCR amplified from the strain carrying the respective construct that is integrated into P21 locus. The constructs for repeat variants (Wt and Rep1-2) were synthesized (Genscript) and extracted from the plasmid by restriction digestion using the respective restriction sites (EcoRI/BamHI/XhoI). Various types of protospacers (Supplementary Figure S7A) were prepared by annealing the corresponding oligos. In vitro integration assays employing Cas1 and Cas2 were performed as previously described (22) with few modifications. 210 nM of Cas1 and Cas2 were mixed and incubated at 4◦C for 15 min. 550 nM of desired protospacer DNA was added to the mixture and incubation at 4◦C was continued for another 15 min. To this complex, 21 nM of CRISPR DNA substrate (Wt or Mutant leader or Mutant repeat) was added along with 0.7 M IHF in duplicates and incubated at 37◦C for 60 min in buffer containing 20 mM HEPES–NaOH pH 7.5, 25 mM KCl, 10 mM MgCl2 and 1 mM DTT. First set of reaction mixtures were directly loaded and electrophoresed on 8% native acrylamide gel in 1× TBE at 4◦C. Whereas second set of reaction mixtures was treated with 1 mg/ml proteinase K for 30 min at 37◦C prior to electrophoresis. Electrophoresed gels were post-stained with EtBr and imaged in gel documentation system. The sizes of the integrated products were analyzed by denaturing capillary electrophoresis. DNA with 6-FAM labeled on 5 end of the leader (L*) or on 5 end of spacer2 (R*) was used as substrate (Supplementary Figure S5). Integration reactions were performed as describe above. Post-reaction samples were treated with 1 mg/ml proteinase K and separated by capillary electrophoresis. The intensities of the fragments were visualized using GeneMapper (Thermo Fisher Scientific) after loading the corresponding .fsa files. Spacer disintegration assay The reaction mixture from integration assay was purified using PCR purification kit (Qiagen) as per the manufacturer’s instruction. 210 nM of Cas1 and Cas2 were mixed and incubated at 4◦C for 15 min. To this complex, 21 nM purified integration product was mixed with or without 0.7 M IHF and incubated at 37◦C for 60 min in buffer containing 20 mM HEPES–NaOH pH 7.5, 25 mM KCl, 10 mM MgCl2 and 1 mM DTT. Subsequently, proteinase K was supplemented to a final concentration of 1 mg/ml concentration and incubated for 30 min at 37◦C. Sample was mixed with 6× DNA loading dye and electrophoresed on 8% native acrylamide gel in 1× TBE at 4◦C. Electrophoresed gels were post-stained with EtBr and imaged in gel documentation system. Genome analysis The lists comprising of type I-E and other type I (excluding type I-E) organisms were compiled from previous study (4). Using IHF- as a query, we initiated blastp (43) search against the genomes harbouring type I-E and other type I (non-type I-E) CRISPR systems. Hits were considered bonafide if the E-value is less than 0.005 and the alignment coverage with respect to the query is at least 60%. Based on these criteria, we estimated the distribution of IHF across the species. Since HU and IHF are structurally similar and also shares similarity at the sequence level (44), we relied on the annotation to distinguish between the two. The multiple sequence alignment corresponding to the leader region for type I-E CRISPR system was obtained from the CRISPRleader database (45). The conservation profile was generated using WebLogo 3 (46). CRISPR/dCas9 based immunoprecipitation detects the participation of IHF as accessory factor for adaptation in vivo In order to identify the potential host factors that are likely to promote the directional insertion of protospacer fragment, we exploited the CRIPSR/dCas9 based immunoprecipitation (35). Here, we expressed the Cas1–2 complex together with the inactive form of FLAG-tagged Cas9 (dCas9) and gRNA that is targeted towards the leader region of the CRISPR array in E. coli BL21-AI (NCBI accession: NC 012947.1, nucleotide positions: 1002800– 1003800). Subsequent to the chemical crosslinking, the DNA bound protein factors that are localized into the leader region were selectively pulled down using the anti-FLAG coated beads against the FLAG-tagged dCas9 (Figure 1A and Supplementary Figure S1). The immunoprecipitated protein factors were analyzed using mass spectrometry to identify the associated factors with the DNA. We hypothesized that since the Cas1–2 complex shows integrase-like activity, the presence of host factors that are previously characterized to facilitate the integration of DNA elements would be prospective candidates. Most of the identified factors belong to ribosomal proteins and translational factors – an aspect characteristic of their omnipresence due to their housekeeping functions. Remarkably, a few of the identified factors were mapped to Cas proteins including Cas1 and Cas2 that bolstered the utility of this approach. Among others, we also noted the presence of the DNA architectural proteins such as H-NS and IHF and DNA repair proteins such as RecA (vide. Supplementary Table S1). We filtered out the factors if they were previously shown not to be involved or essential in determining the protospacer integration (10,12,29) or functionally unrelated such as chaperones, proteases, metabolic enzymes, etc. For example, though the DNA architectural protein H-NS was identified with high score than that of Cas1 and Cas2, it was previously shown that it is not essential for CRISPR adaptation and that it acts as a repressor of cas operon in E. coli (12,29). Therefore, we didn’t pursue further with H-NS and similar such rationale was exercised to exclude other factors. On the other hand, though another architectural protein IHF scored lower than H-NS, its role in site-specific recombination championed by integrase as well as in DNA transposition was well characterized (47,48). Given that Cas1–2 complex functions like an integrase – akin to integrase – and since the role of IHF in CRISPR adaptation is not previously characterized, we were tempted to probe the involvement of IHF in protospacer acquisition. IHF is essential for protospacer acquisition into the leader proximal end in vivo IHF is a heterodimer comprising of and subunits. To test the involvement of IHF in protospacer acquisition, we created a null mutant of IHF devoid of either or subunits in E. coli IYB5101. It was found that deletion of either or subunit abrogates the acquisition of protospacer elements (Figure 1B). In order to reinforce this, we complemented the null mutant with plasmid borne IHF and IHF that restored the expansion of CRISPR 2.1 array in E. coli IYB5101 (Figure 1B). This strengthened our conjecture that the acquisition of protospacer requires the participation of IHF in vivo. Unlike some of the related DNA architectural proteins such as HU, IHF exhibits sequence specific DNA binding. It recognizes the consensus sequence 5 WATCAANNNNTTR-3 (where W – A/T, N – A/T/G/C, R – A/G). Therefore, we searched for potential IHF binding site abutting the CRISPR 2.1 locus in E. coli IYB5101 as well as in related strains (Supplementary Figure S2). This search led to the identification of a putative binding site adjacent to the first CRISPR repeat (Figure 1C). We wondered whether this region could act as a potential binding site for IHF. To test this, we deleted the binding site partially ( IBS in Figure 1C) and assayed for the acquisition. Interestingly, we found no expansion of the array (Figure 1D). Similarly, mutation of the key binding nucleotides (IBS in Figure 1C) also abolished the acquisition (Figure 1D). This suggests that the identified site for IHF binding indeed impacts the adaptation process and these findings are also in concurrence with the recent report (34). IHF induced bending of the linear DNA facilitates protospacer integration IHF binding induces bending of the leader region The structure of IHF–DNA complex shows that the IHF and subunits form an intertwined compact body from which two structures protrude out clamping the DNA (49). This induces bending of DNA by about 160◦ leading to the reversal of the direction of DNA (Figure 2A). This prompted us to investigate whether IHF binds and bends the putative site (IBS) in the leader region. Towards this, we purified the IHF from E. coli and tested the DNA binding using EMSA (Figure 2B). This showed retardation of DNA mobility in the presence of IHF indicating that indeed IHF binds the leader region (Figure 2B). In line with the recent study (34), substitution or deletion of key binding nucleotides drastically reduced the IHF binding (Supplementary Figure S3). Motivated by the IHF binding to the leader region, we wondered whether the binding leads to bending of the DNA. To assess this, we designed a FRET based assay wherein one end of the IHF binding region is tagged with a fluorophore (6-FAM) and the other end with the quencher (Iowa Black). In the linear DNA, the fluorophore and the quencher will be sequestered and hence this won’t quench the fluorescence. However, if IHF bends the DNA, this brings both the fluorophore and quencher into proximity leading to quenching of the fluorescence. Indeed, we observed that addition of IHF led to drastic reduction in the fluorescence intensity (Solid line in Figure 2C and Supplementary Figure S4). However, in the same reaction, when a protease was added to remove IHF, it restored the fluorescence (solid line in Figure 2C). On the contrary, similar experiment performed with the 6-FAM labeled DNA, albeit without the quencher, showed that despite addition of IHF the intensity of the fluorescence remained constant (dotted line in Figure 2C). This allows us to exclude the possibility that the quenching of fluorescence is not caused by IHF binding alone and it is indeed the DNA bending effected by the IHF that brings the two ends into proximity leading to quenching of fluorescence. Having established the fact that IHF indeed bends the leader region, we were interested in investigating the extent to which IHF bends the leader DNA. To address this, we utilized the bending vector pBend5, which contains circularly permuted duplicated restriction sites (40). Cloning of the IHF binding site (IBS) into pBend5 and subsequent digestion using the restriction enzymes ensue fragments with same length but with the binding site distributed to different positions, either in the middle or towards the end (Figure 2D). When the DNA undergoes bending due to protein binding, the fragment that harbors the binding site in the middle migrates slower than the one with the binding site at the end. From this mobility differences, it is possible to estimate the bending angle, which is defined as the angle by which the DNA deviates from the linearity (vide. methods). We estimated that IHF bends DNA by ∼120◦ suggesting that the sharp deformation could result in the reversal of the DNA direction (Figure 2E and F). Since IHF deforms the linear DNA, we were interested in deciphering the mechanism by which it influences the integration of the protospacer into the CRISPR locus. Further, our analysis of CRISPR leader sequences from organisms harbouring type I-E system showed that the identified IHF binding site (–9 to –35 nt; boxed in solid line in Figure 3A) along with another region (–44 to –59 nt; boxed in dotted line in Figure 3A) is highly conserved across other species as well. Therefore, we designed a linear DNA construct encompassing the above mentioned leader region and two units of repeat-spacer segments (Figures 1C and 3A). When IHF was added to the CRISPR DNA, we noted a single slow migrating band suggesting that the IHF induced DNA bending retards the mobility of the CRISPR DNA (lane 4 in Figure 3B). Subsequent addition of Cas1–2 complex and protospacer fragment resulted in the appearance of a super-shifted band (lane 12 in Figure 3B). Strikingly, this band was not observed in the absence of IHF (lane 11 in Figure 3B). This hints that Cas1–2 complex associates with the CRISPR DNA only when IHF is present. When the DNA bound proteins were removed using proteinase K treatment, we spotted a slow migrating band, whose size seemed to be larger than the CRISPR DNA (lane 12 in Figure 3C). Remarkably, this band appeared only from the proteinase K treated reaction mixture consisting of CRISPR DNA, protospacer fragment, IHF and Cas1–2 complex (lane 12 in Figure 3C). To further probe the requirement of IHF for the formation of super-shifted band, we performed the experiment with IHF binding site variants. This showed that either deletion ( IBS) or mutation of the IHF binding site (IBS) completely abolished the appearance of supershifted band (lanes 6 and 9 in Figure 3D and E). This suggests the possibility that the slow migrating band represents the protospacer integrated into the CRISPR DNA. In order to ascertain the protospacer integration, we designed 5 6-FAM labeled linear CRISPR DNA constructs and repeated the aforementioned experiment. The proteinase K treated reaction mixture corresponding to the slow migrating band was resolved by denaturing capillary electrophoresis. The fragments were analyzed in comparison with the fluorescently labeled standards to estimate their sizes. This showed that when the label was at the 5 leader proximal end, the size of the fragment was estimated to be ∼161 and 63 nt, whereas when the label was at the 5 distal end, the size of the fragment was ∼168 nt (Supplementary Figure S5). Since the 63 nt fragment maps the cleavage around the leader-repeat junction, this suggests that the protospacer integration has taken place proximal to the leader region in the top strand leading to half-site integration intermediate (vide. Supplementary Figure S5). In agreement with recent work (34), this further suggests that IHF indeed stimulates the protospacer incorporation into the linear CRISPR DNA. Further, since it was reported that halfsite integration intermediate is selectively excised by the Cas1–2 complex (22,24), we reasoned that this could serve as an additional diagnosis for the existence of half-site integration intermediate. Therefore, we purified the reaction mixture containing the half-site integration intermediate and monitored disintegration in the presence of Cas1–2 complex. Indeed, we observed that the presence of Cas1–2 complex led to the disappearance of the integrated product (Supplementary Figure S6). Interestingly, the disintegration activity of Cas1–2 complex is significantly inhibited in the presence of IHF (Supplementary Figure S6). Taken together, it is possible to reiterate that the protospacer integration occurs into the leader-repeat junction in the top strand. Further, it appears that once the site of invasion is marked, in line with earlier reports (21,22,24), we recognized that it is the 3 –OH of the protospacer with 3 overhang that mounts the nucleophilic attack on the CRISPR DNA (Supplementary Figure S7). Cas1–2 complex is localized upstream of IHF binding site It was shown that up to 60 bp leader segment adjoining the first CRISPR repeat is essential for spacer acquisition (13). However, the IHF binding region falls within the 35 bp from the first CRISPR repeat (boxed in solid line in Figure 3A). Given the importance of this region, we wondered what the function of the remaining 25 bp could be in the leader region. Intriguingly, we also noted high conservation of sequence upstream to that of IHF binding site (boxed in dotted line in Figure 3A). Therefore, we randomly mutated the 36 bp leader region upstream of the IHF binding site, 12 bp at a time (CBS1–3), and tested whether this modified region could support acquisition (Figure 4A). We observed that though CBS1 [–34 to –45] and CBS3 [–58 to –69] had no effect on the spacer acquisition, surprisingly, no expansion was seen for the CBS2 [–46 to –57] (Figure 4A and B). This led us to wonder whether any of these mutated sequences affects the binding of IHF and thereby leading to the abrogation of acquisition. Hence we tested the binding of IHF to leader region containing CBS1-3. We noted that while CBS1 reduces the IHF binding, perhaps owing to its marginal overlap with the cognate binding site (Figure 4A), the other two showed only minor effect on the IHF binding (Supplementary Figure S9). Interestingly, we also found that the IHF mediated DNA bending is not affected for CBS2 (Supplementary Figure S10). This suggests that the impairment of spacer acquisition due to CBS2 is not effected by IHF. Since CBS2 does not impact the IHF binding to the leader region significantly, we hypothesized that CBS2 could be a binding site for Cas1–2 complex. Therefore, we conducted the integration experiments involving CBS1–3. This showed that the super-shifted band that was seen in wild type CRISPR DNA (lane 3 in Figure 4C) appeared in CBS1 and CBS3 too (lanes 6 and 12 in Figure 4C). Intriguingly, this band was absent when CBS2 was utilized (lane 9 in Figure 4C). Moreover, IHF dependent mobility shift was prominently seen for Wt and CBS2, albeit it was weak for CBS1. For CBS3, the IHF dependent mobility shift was not prominent despite the presence of supershifted band. This suggests that except for CBS1––that overlaps partly with the IHF binding site––the IHF binding is not impaired in others. In line with this, all except CBS2 showed the presence of integration product (Figure 4D). Highly conserved sub-motif region within the CBS2 is crucial for protospacer integration In order to probe the CBS2 further, we made three constructs, viz., CBS2(L), CBS2(C) and CBS2(R). In each of these constructs, 4 bp were mutated with respect to CBS2 (Figure 5A). We tested each of these constructs for their ability to support protospacer acquisition in vivo. Remarkably, we found that except CBS2(C), other two constructs showed expansion of CRISPR array suggesting that the 4 bp (GTGG) in the middle of CBS2 (−50 to −53 nt) are crucial for protospacer acquisition (Figure 5B). To assess how these residues are impacting the protospacer acquisition, we conducted integration assays involving these constructs. In line with the acquisition assay in vivo, we observed that both CBS(L) and CBS(R) showed shift in the mobility of the CRISPR DNA in the presence of IHF and Cas1–2 complex (lanes 9 and 15 in Figure 5C). Surprisingly, in case of CSB2(C), there was no super-shifted complex even in the presence of IHF and Cas1–2 complex (lane 12 in Figure 5C). In line with this, the integration product was absent from the proteinase K treated CBS2(C) sample (lane 12 in Figure 5D). On the contrary, the integration products were observed for CBS2 (L) and CBS2 (R) (lanes 9 and 15 in Figure 5D). This suggests that CBS2(C) residues are essential for the integration of protospacer fragment into the leader-repeat junction. In tune with this, we also noted high conservation of residues corresponding to CBS2(C) in organisms harbouring type I-E system (boxed in dotted line in Figure 3A). Taken together, the disappearance of super-shifted band despite the presence of IHF related band in CBS2 and CBS2(C) led us to reason that the CBS2(C) is likely to harbour the binding site for Cas1–2 complex (see Discussion). We refer to residues (-50 to -53 nt) corresponding to CBS2(C) as integrase anchoring site (IAS). An outstanding question regarding CRISPR adaptation pertains to the mechanism regulating specific integration of new protospacers into the leader-repeat junction amidst the presence of several repeat regions. Unlike in vivo, the integration of protospacer in vitro occurs in other repeats too (22). This observation led us to hypothesize the involvement of specific host proteins in defining the site of protospacer invasion in vivo. Our genome wide search by employing the CRISPR/dCas9based immunoprecipitation (35) led us to recognize the participation of DNA architectural protein IHF in specifying the directional insertion of protospacer elements into the leader proximal end of CRIPSR array. IHF is known to specifically recognize its binding region and induce sharp DNA bends thereby facilitating site-specific recombination and DNA transposition (48,50,51). IHF mediated positioning of distantly oriented low affinity core site and high affinity attachment site of integrase into proximity facilitates bacteriophage integration into the genome of E. coli (51). Here, DNA deformation is utilized by IHF in bringing remotely located recognition sites into proximity. Indeed, our experiments with bending vectors showed that IHF bends the linear CRISPR DNA (Figure 2). Supercoiled plasmids carrying a CRISPR DNA were shown to act as in vitro substrates for integration whereas no integration was observed when linearized CRISPR encompassing plasmids were used (22). In comparison to linear DNA, supercoiled plasmids are inherently compact and bent and therefore it is intrinsically possible to bring remotely located recognition sites into juxtaposition. However, in the case of linear DNA, we identify that IHF is indispensable and it may facilitate favourable conformation of DNA for integration (Figures 1-3). Further, since some transposases such as Tn10 prefer deformed target DNA (52), it is possible that the IHF mediated bent DNA conformation could become a substrate for Cas1–2 complex. In addition to this, the fact that the presence of IHF ensue reduced disintegration of protospacer implies that IHF induced DNA bending appears to stabilize the integration intermediate by modulating the integrase/excisionase activity of Cas1–2 complex (Supplementary Figure S6). This shows semblance to how IHF along with integrase promotes integration over excision (51,53). The involvement of IHF in the protospacer acquisition is recently reported (34); however, the precise connection between IHF induced DNA bending and directional integration of the protospacer remains elusive. Further, while IHF binding to linear DNA was shown earlier, the extent to which it deforms the leader region is not clear (34). Our findings suggest that IHF bends the linear CRISPR DNA by ∼120◦, which is likely to prompt reversal in the DNA direction (Figure 2). One possible consequence of this bending could be to bring the leader region in proximity to the first repeat. While pursuing this hypothesis, we discovered that in addition to the IHF binding site, the leader region also harbours binding site for Cas1–2 complex (referred as IAS) that is located just upstream of IBS (Figures 4 and 5). We also observed IAS to be highly conserved within the leader region among the type I-E organisms that harbour IHF (boxed in dotted line in Figure 3A). This presents an attractive proposition that the IHF induced DNA bending is likely to facilitate the proximity between the Cas1–2 complex and the leaderrepeat junction. The higher order nucleoprotein complex (vide. Super-shifted band in Figures 3-5) that appears in the presence of Cas1–2 complex and IHF is also noted in the case of site-specific recombination catalyzed by integrase and IHF (53,54). However, in the absence of IHF, since CRISPR DNA is not bound by Cas1– 2 complex, it is likely that IHF induced DNA bending precedes the loading of Cas1–2 complex onto the CRISPR DNA (Figure 3). Moreover, we observed no appreciable changes in the bending angle even in the presence of Cas1–2 complex suggesting that the loading of Cas1– 2 complex doesn’t introduce further DNA deformation (Supplementary Figure S11). Cas1 is reported to have an intrinsic specificity towards the sequences spanning the leader-repeat junction (24). In the vast genome sequence, it is not infrequent for Cas1 to encounter such nucleotide preference and hence this is unlikely to be a principal specificity determinant. Therefore, the role of IHF could be attributed to biasing the preference of Cas1–2 complex towards shape-based recognition as exhibited by homing endonucleases (55). In this context, it is tempting to propose that Cas1–2 complex prefers a bipartite binding site that is complemented by a part of the leader region (IAS) and leader-repeat junction. This is akin to the distantly located low affinity core site and high affinity attachment site in the case of integrase (51). Proximity of these complementary sites––IAS and leaderrepeat junction––mediated by the IHF induced DNA bending is aptly poised to regenerate the cognate binding site for Cas1–2 complex. The following observations appear to bolster this conjecture: First, the formation of higher order nucleoprotein complex requires IHF induced DNA bending––akin to ‘intasome’ in the case of bacteriophage integration––suggesting that the loading of Cas1–2 complex onto the CRISPR DNA is contingent upon the proximity of the aforementioned complementary sites (Figures 3-5). Therefore, in the absence of such proximityinduced regeneration of the cognate binding site, Cas1–2 complex is unlikely to facilitate the protospacer integration into the leader proximal end. Second, in line with the above, we could observe IHF binding onto linear CRISPR DNA in the absence of Cas1–2 complex and not vice versa (Figures 3-5). Third, in conjunction with the acquisition assay, we noted that the presence of IHF abolishes the nonspecific nicking activity of Cas1–2 complex (Supplementary Figure S5). Given this, it is possible to reiterate that Cas1–2 complex loading onto the CRISPR DNA is governed by the IHF mediated regeneration of the distantly located bipartite binding site. While type I-E system requires accessory factor for protospacer acquisition, it was shown in vitro that type II-A system exhibits robust polarized protospacer incorporation into linear CRISPR DNA in the absence of any host factor (56). Further, another study showed that substitution or deletion of leader region (−1 to −5 from repeat) bordering leader-repeat junction (termed as leaderanchoring sequence or LAS) in Streptococcus pyogenes (type II-A) induces an ectopic spacer incorporation at fifth repeat where the sequence derived from fourth spacer acts as LAS (57). In Sulfolobus solfataricus (type I-A), it was observed that CRISPR locus E alone exhibits ectopic spacer incorporation whereas polarized acquisition was observed in loci C and D (58). CRISPR locus E encompasses a deletion of −47 to −70 in the leader region (58,59), which could possibly disrupt the accessory factor/Cas1– 2 binding site. This in turn may impair bipartite site formation and since ssoCas1 is shown to have intrinsic sequence specificity (24), it could favour integration at region that closely resembles that of leader-repeat junction thus tuning it towards ectopic acquisition. These studies lend credence to our hypothesis that the distance between IAS and leader-repeat junction (bipartite site for Cas1–2 binding) governs the requirement of accessory factor(s) for protospacer incorporation (see below). In addition to the sequences bordering the leader-repeat junction, modification of the repeat sequences or structure in vivo is also reported to inhibit the protospacer integration (32,60,61). On the contrary, we noticed that the protospacer integration due to such modifications remains unaltered in vitro (Supplementary Figure S8). This suggests that the bipartite binding site of Cas1–2 complex is unlikely to extend deep into the CRISPR repeat region. Based on fragment analysis, under our experimental conditions, we deciphered that the integration of the protospacer occurs into the top strand and we found no integration into the bottom strand (Supplementary Figure S5). This allows us to infer that since the underlying leader region harbouring the IBS and IAS remains intact, in such scenario, the modification of repeat sequences or structure is not expected to inhibit the top strand invasion. On the other hand, we speculate that such modification could reduce the efficiency of bottom strand invasion––wherein the integrity of the repeat sequences or structure could play a leading role in determining the specificity towards repeat1-spacer1 junction––leading to unproductive fullsite integration in agreement with spacer integration assay (32,60,61). Based on our data and previous reports (13,22,24,32,34,56–58), we present an updated model for CRISPR adaptation (Figure 6). This model can be dichotomized based on the proximity between IAS and leader-repeat junction, which allows us to predict the requirement of accessory factor(s). In cases where IAS and leader-repeat junction are segregated, in order to bring them into proximity for Cas1–2 binding, accessory factor(s) may be required. As exemplified by type I-E, this role is adopted by IHF in E. coli. IHF binding to the leader region of the CRISPR locus (IBS) leads to DNA bending. This deformed conformation ensue proximity of the distantly located IAS and leader-repeat junction that leads to the regeneration of the cognate binding site for the Cas1–2 integrase complex. Subsequently, this allows the Cas1–2 complex to orient the 3 -OH end of the protospacer fragment suitably for nucleophilic attack on the leader-repeat junction thus leading to the first nick on the top strand. This is followed by the second nucleophilic attack on the bottom strand leading to the full integration of the protospacer. We analyzed the distribution of IHF in organisms possessing type I CRISPR systems (type I-E and non type I-E systems). Out of 76 organisms encompassing type I-E CRISPR system, we found that 56 of them possess IHF (about 73%) and its distribution is predominant among enteric bacteria (Supplementary Table S5). In the case of non type I-E, 104 out of 242 organisms (∼43%) carry IHF (Supplementary Table S5). Interestingly, wherever IBS is conserved in type I-E systems, we also noted a strong correlation for the existence of IAS suggesting that these two sites co-evolve to preserve the CRISPR adaptation active (Figure 3A). However, since few organisms that harbour type I-E system in our analysis lack IHF (27%), it is possible to envisage the participation of other DNA architectural proteins such as HU or other auxiliary Cas proteins to facilitate protospacer integration (62,63). It may be noted that HU is structurally similar to IHF; however, unlike IHF, it binds DNA non-specifically. Further inspection of the type I-E organisms that lack IHF showed that a few of them lack cas operon suggesting that they are non-functional similar to E. coli BL21. A few others co-exist with other CRISPR subtypes including other type I (non type I-E), type II and type III, which suggests that the acquisition machinery may be shared across subtypes. On the contrary, if the IAS and leaderrepeat junction lie juxtaposed as observed in type II-A system (33,56,57), the requirement of accessory factor(s) may be precluded (Figure 6). Nevertheless, co-opting the host proteins during adaptation epitomizes just the tip of the iceberg of the functional diversity embodied in the CRISPR–Cas system. SUPPLEMENTARY DATA Supplementary Data are available at NAR Online. ACKNOWLEDGEMENTS Vectors pdCas9-bacteria (Addgene #44249) and pgRNAbacteria (Addgene #44251) were a kind gift from Stanley Qi; pOSIP-CT (Addgene #45981) was a kind gift from Drew Endy and Dr Keith Shearwin; pCas1–2[K] and pCSIR-T were a kind gift from F.J.M. Mojica; pBAD Strep TEV LIC cloning vector (p8R) (Addgene #37506), pET StrepII TEV LIC cloning vector (p1R) (Addgene #29664), pET StrepII TEV co-transformation cloning vector (p13SR) (Addgene #48328) and pET His6 Sumo TEV LIC cloning vector (p1S) (Addgene #29659) were a kind gift from Scott Gradia; pBend5 was a kind gift from Sankar Adhya; pKD46 (CGSC #7739) was a kind gift from Barry L. Wanner; E. coli IYB5101 was a kind gift from Udi Qimron; E. coli strains JW1702 (CGSC#: 9441) and JW0895 (CGSC#: 8917) were a kind gift from Hirotada Mori. We acknowledge the geniality of aforementioned scientists for sharing their plasmids and bacterial strains. We thank Payel Sarkar for technical assistance and all members of the MAB lab for their critical comments and suggestions. We acknowledge the Mass Spectrometry facility at C-CAMP, Bangalore for their services. Department of Biotechnology (DBT) [BT/08/IYBA/2014-1 5, BT/406/NE/UEXCEL/2013, BT/PR5511/MED/29/631/2 012 and BT/341/NE/TBP/2012]; Science and Engineering Research Board (SERB) [YSS/2014/000286]. The open access publication charge for this paper has been waived by Oxford University Press –– NAR. Conflict of interest statement. None declared. 1. Barrangou , R. , Fremaux , C. , Deveau , H. , Richards , M. , Boyaval , P. , Moineau , S. , Romero , D.A. and Horvath , P. ( 2007 ) CRISPR provides acquired resistance against viruses in prokaryotes . Science , 315 , 1709 - 1712 . 2. Horvath , P. and Barrangou , R. ( 2010 ) CRISPR/Cas , the immune system of bacteria and archaea. Science , 327 , 167 - 170 . 3. Fineran , P.C. and Charpentier , E. ( 2012 ) Memory of viral infections by CRISPR-Cas adaptive immune systems: acquisition of new information . Virology , 434 , 202 - 209 . 4. Makarova , K.S. , Wolf , Y.I. , Alkhnbashi , O.S. , Costa , F. , Shah , S.A. , Saunders , S.J. , Barrangou , R. , Brouns , S.J. , Charpentier , E. , Haft , D.H. et al. ( 2015 ) An updated evolutionary classification of CRISPR-Cas systems . Nat. Rev. Microbiol. , 13 , 722 - 736 . 5. Marraffini , L.A. ( 2015 ) CRISPR-Cas immunity in prokaryotes . Nature , 526 , 55 - 61 . 6. Jansen , R. , Embden , J.D. , Gaastra , W. and Schouls , L.M. ( 2002 ) Identification of genes that are associated with DNA repeats in prokaryotes . Mol. Microbiol ., 43 , 1565 - 1575 . 7. Pougach , K. , Semenova , E. , Bogdanova , E. , Datsenko , K.A. , Djordjevic , M. , Wanner , B.L. and Severinov , K. ( 2010 ) Transcription, processing and function of CRISPR cassettes in Escherichia coli . Mol. Microbiol ., 77 , 1367 - 1379 . 8. Wright , A.V. , Nunez , J.K. and Doudna , J.A. ( 2016 ) Biology and Applications of CRISPR systems: harnessing nature's toolbox for genome engineering . Cell , 164 , 29 - 44 . 9. Amitai , G. and Sorek , R. ( 2016 ) CRISPR-Cas adaptation: insights into the mechanism of action . Nat. Rev. Microbiol. , 14 , 67 - 76 . 10. Levy , A. , Goren , M.G. , Yosef , I. , Auster , O. , Manor , M. , Amitai , G. , Edgar , R. , Qimron, U. and Sorek , R. ( 2015 ) CRISPR adaptation biases explain preference for acquisition of foreign DNA . Nature , 520 , 505 - 510 . 11. Datsenko , K.A. , Pougach , K. , Tikhonov , A. , Wanner , B.L. , Severinov , K. and Semenova , E. ( 2012 ) Molecular memory of prior infections activates the CRISPR/Cas adaptive bacterial immunity system . Nat. Commun. , 3 , 945 . 12. Swarts , D.C. , Mosterd , C. , van Passel , M.W. and Brouns , S.J. ( 2012 ) CRISPR interference directs strand specific spacer acquisition . PLoS One , 7 , e35888 . 13. Yosef ,I., Goren , M.G. and Qimron, U. ( 2012 ) Proteins and DNA elements essential for the CRISPR adaptation process in Escherichia coli . Nucleic Acids Res ., 40 , 5569 - 5576 . 14. Diez-Villasenor , C. , Guzman , N.M. , Almendros , C. , Garcia-Martinez , J. and Mojica , F.J. ( 2013 ) CRISPR-spacer integration reporter plasmids reveal distinct genuine acquisition specificities among CRISPR-Cas I-E variants of Escherichia coli . RNA Biol ., 10 , 792 - 802 . 15. Yosef ,I., Shitrit , D. , Goren , M.G. , Burstein , D. , Pupko , T. and Qimron , U. ( 2013 ) DNA motifs determining the efficiency of adaptation into the Escherichia coli CRISPR array . Proc. Natl. Acad. Sci. U.S.A. , 110 , 14396 - 14401 . 16. Deveau , H. , Barrangou , R. , Garneau , J.E. , Labonte , J. , Fremaux , C. , Boyaval , P. , Romero , D.A. , Horvath , P. and Moineau , S. ( 2008 ) Phage response to CRISPR-encoded resistance in Streptococcus thermophilus . J. Bacteriol. , 190 , 1390 - 1400 . 17. Mojica , F.J. , Diez-Villasenor , C. , Garcia-Martinez , J. and Almendros , C. ( 2009 ) Short motif sequences determine the targets of the prokaryotic CRISPR defence system . Microbiology , 155 , 733 - 740 . 18. Zetsche , B. , Gootenberg , J.S. , Abudayyeh , O.O. , Slaymaker , I.M. , Makarova , K.S. , Essletzbichler , P. , Volz , S.E. , Joung , J. , van der Oost , J. , Regev , A. et al. ( 2015 ) Cpf1 is a single RNA-guided endonuclease of a class 2 CRISPR-Cas system . Cell , 163 , 759 - 771 . 19. Sternberg , S.H. , Richter , H. , Charpentier , E. and Qimron , U. ( 2016 ) Adaptation in CRISPR-Cas systems . Mol. Cell , 61 , 797 - 808 . 20. Semenova , E. , Savitskaya , E. , Musharova , O. , Strotskaya , A. , Vorontsova , D. , Datsenko , K.A. , Logacheva , M.D. and Severinov , K. ( 2016 ) Highly efficient primed spacer acquisition from targets destroyed by the Escherichia coli type I-E CRISPR-Cas interfering complex . Proc. Natl. Acad. Sci. U.S.A. , 113 , 7626 - 7631 . 21. Nunez , J.K. , Harrington , L.B. , Kranzusch , P.J. , Engelman , A.N. and Doudna , J.A. ( 2015 ) Foreign DNA capture during CRISPR-Cas adaptive immunity . Nature , 527 , 535 - 538 . 22. Nunez , J.K. , Lee , A.S. , Engelman , A. and Doudna , J.A. ( 2015 ) Integrase-mediated spacer acquisition during CRISPR-Cas adaptive immunity . Nature , 519 , 193 - 198 . 23. Wang , J. , Li , J. , Zhao , H. , Sheng , G. , Wang , M. , Yin , M. and Wang , Y. ( 2015 ) Structural and mechanistic basis of PAM-dependent spacer acquisition in CRISPR-Cas systems . Cell , 163 , 840 - 853 . 24. Rollie , C. , Schneider , S. , Brinkmann , A.S. , Bolt , E.L. and White , M.F. ( 2015 ) Intrinsic sequence specificity of the Cas1 integrase directs new spacer acquisition . eLife , 4 , e08716 . 25. Li , M. , Wang , R. , Zhao , D. and Xiang , H. ( 2014 ) Adaptation of the haloarcula hispanica CRISPR-Cas system to a purified virus strictly requires a priming process . Nucleic Acids Res ., 42 , 2483 - 2492 . 26. Heler , R. , Samai , P. , Modell , J.W. , Weiner , C. , Goldberg , G.W. , Bikard , D. and Marraffini , L.A. ( 2015 ) Cas9 specifies functional viral targets during CRISPR-Cas adaptation . Nature , 519 , 199 - 202 . 27. Vorontsova , D. , Datsenko , K.A. , Medvedeva , S. , Bondy-Denomy , J. , Savitskaya , E.E. , Pougach , K. , Logacheva , M. , Wiedenheft , B. , Davidson , A.R ., Severinov , K. et al. ( 2015 ) Foreign DNA acquisition by the I-F CRISPR-Cas system requires all components of the interference machinery . Nucleic Acids Res ., 43 , 10848 - 10860 . 28. Wei , Y. , Terns , R.M. and Terns , M.P. ( 2015 ) Cas9 function and host genome sampling in Type II-A CRISPR-Cas adaptation . Genes Dev. , 29 , 356 - 361 . 29. Westra , E.R. , Pul, U. , Heidrich , N. , Jore , M.M. , Lundgren , M. , Stratmann , T. , Wurm , R. , Raine , A. , Mescher , M. , Van Heereveld , L. et al. ( 2010 ) H-NS-mediated repression of CRISPR-based immunity in Escherichia coli K12 can be relieved by the transcription activator LeuO . Mol. Microbiol ., 77 , 1380 - 1393 . 30. Ivancic-Bace , I. , Cass , S.D. , Wearne , S.J. and Bolt , E.L. ( 2015 ) Different genome stability proteins underpin primed and naive adaptation in E. coli CRISPR-Cas immunity . Nucleic Acids Res ., 43 , 10821 - 10830 . 31. Babu , M. , Beloglazova , N. , Flick , R. , Graham , C. , Skarina , T. , Nocek , B. , Gagarinova , A. , Pogoutse , O. , Brown , G. , Binkowski , A. et al. ( 2011 ) A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair . Mol. Microbiol ., 79 , 484 - 502 . 32. Arslan , Z. , Hermanns , V. , Wurm , R. , Wagner , R. and Pul, U. ( 2014 ) Detection and characterization of spacer integration intermediates in type I-E CRISPR-Cas system . Nucleic Acids Res ., 42 , 7884 - 7893 . 33. Wei , Y. , Chesne , M.T. , Terns , R.M. and Terns , M.P. ( 2015 ) Sequences spanning the leader-repeat junction mediate CRISPR adaptation to phage in Streptococcus thermophilus . Nucleic Acids Res ., 43 , 1749 - 1758 . 34. Nunez , J.K. , Bai , L. , Harrington , L.B. , Hinder, T.L. and Doudna , J.A. ( 2016 ) CRISPR immunological memory requires a host factor for specificity . Mol. Cell , 62 , 824 - 833 . 35. Fujita , T. and Fujii , H. ( 2013 ) Efficient isolation of specific genomic regions and identification of associated proteins by engineered DNA-binding molecule-mediated chromatin immunoprecipitation (enChIP) using CRISPR . Biochem. Biophys. Res. Commun ., 439 , 132 - 136 . 36. Datsenko , K.A. and Wanner , B.L. ( 2000 ) One-step inactivation of chromosomal genes in Escherichia coli K-12 using PCR products . Proc. Natl. Acad. Sci. U.S.A. , 97 , 6640 - 6645 . 37. Baba , T. , Ara , T. , Hasegawa , M. , Takai , Y. , Okumura , Y. , Baba , M. , Datsenko , K.A. , Tomita , M. , Wanner , B.L. and Mori , H. ( 2006 ) Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection . Mol. Syst. Biol ., 2 , 1 - 11 . 38. Qi , L.S. , Larson , M.H. , Gilbert , L.A. , Doudna , J.A. , Weissman , J.S. , Arkin , A.P. and Lim , W.A. ( 2013 ) Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression . Cell , 152 , 1173 - 1183 . 39. St-Pierre , F. , Cui , L. , Priest , D.G. , Endy , D. , Dodd , I.B. and Shearwin , K.E. ( 2013 ) One-step cloning and chromosomal integration of DNA . ACS Synth. Biol., 2 , 537 - 541 . 40. Zwieb , C. and Adhya , S. ( 2009 ) Plasmid vectors for the analysis of protein-induced DNA bending . Methods Mol. Biol ., 543 , 547 - 562 . 41. Waldminghaus , T. and Skarstad , K. ( 2010 ) ChIP on Chip: surprising results are often artifacts . BMC Genomics , 11 , 414 . 42. Papapanagiotou ,I., Streeter , S.D. , Cary , P.D. and Kneale , G.G. ( 2007 ) DNA structural deformations in the interaction of the controller protein C.AhdI with its operator sequence . Nucleic Acids Res ., 35 , 2643 - 2650 . 43. Altschul , S.F. , Madden , T.L. , Schaffer , A.A. , Zhang , J. , Zhang , Z. , Miller , W. and Lipman , D.J. ( 1997 ) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs . Nucleic Acids Res ., 25 , 3389 - 3402 . 44. Swinger , K.K. and Rice , P.A. ( 2004 ) IHF and HU: flexible architects of bent DNA . Curr . Opin. Struct. Biol., 14 , 28 - 35 . 45. Alkhnbashi , O.S. , Shah , S.A. , Garrett , R.A. , Saunders , S.J. , Costa , F. and Backofen , R. ( 2016 ) Characterizing leader sequences of CRISPR loci . Bioinformatics , 32 , i576 - i585 . 46. Crooks , G.E. , Hon , G. , Chandonia , J.M. and Brenner , S.E. ( 2004 ) WebLogo: a sequence logo generator . Genome Res ., 14 , 1188 - 1190 . 47. Friedman , D.I. ( 1988 ) Integration host factor: a protein for all reasons . Cell , 55 , 545 - 554 . 48. Chalmers , R. , Guhathakurta , A. , Benjamin , H. and Kleckner , N. ( 1998 ) IHF modulation of Tn10 transposition: sensory transduction of supercoiling status via a proposed protein/DNA molecular spring . Cell , 93 , 897 - 908 . 49. Rice , P.A. , Yang , S. , Mizuuchi , K. and Nash , H.A. ( 1996 ) Crystal structure of an IHF-DNA complex: a protein-induced DNA U-turn . Cell , 87 , 1295 - 1306 . 50. Leong , J.M. , Nunes-Duby , S. , Lesser , C.F. , Youderian , P. , Susskind , M.M. and Landy , A. ( 1985 ) The phi 80 and P22 attachment sites. Primary structure and interaction with Escherichia coli integration host factor . J. Biol. Chem. , 260 , 4468 - 4477 . 51. Moitoso de Vargas ,L, Kim , S. and Landy , A. ( 1989 ) DNA looping generated by DNA bending protein IHF and the two domains of lambda integrase . Science , 244 , 1457 - 1461 . 52. Pribil , P.A. and Haniford , D.B. ( 2003 ) Target DNA bending is an important specificity determinant in target site selection in Tn10 transposition . J. Mol. Biol ., 330 , 247 - 259 . 53. Segall , A.M. and Nash , H.A. ( 1996 ) Architectural flexibility in lambda site-specific recombination: three alternate conformations channel the attL site into three distinct pathways . Genes Cells , 1 , 453 - 463 . 54. Kim , S. and Landy , A. ( 1992 ) Lambda Int protein bridges between higher order complexes at two distant chromosomal loci attL and attR. Science , 256 , 198 - 203 . 55. Lambert , A.R ., Hallinan , J.P. , Shen , B.W. , Chik , J.K. , Bolduc , J.M. , Kulshina , N. , Robins , L.I. , Kaiser , B.K. , Jarjour , J. , Havens , K. et al. ( 2016 ) Indirect DNA sequence recognition and its impact on nuclease cleavage activity . Structure , 24 , 862 - 873 . 56. Wright , A.V. and Doudna , J.A. ( 2016 ) Protecting genome integrity during CRISPR immune adaptation . Nat. Struct. Mol. Biol ., 23 , 876 - 883 . 57. McGinn , J. and Marraffini , L.A. ( 2016 ) CRISPR-Cas systems optimize their immune response by specifying the site of spacer integration . Mol. Cell ., 64 , 616 - 623 . 58. Erdmann , S. and Garrett , R.A. ( 2012 ) Selective and hyperactive uptake of foreign DNA by adaptive immune systems of an archaeon via two distinct mechanisms . Mol. Microbiol ., 85 , 1044 - 1056 . 59. Garrett , R.A. , Shah , S.A. , Erdmann , S. , Liu , G. , Mousaei , M. , Leon-Sobrino , C. , Peng , W. , Gudbergsdottir , S. , Deng , L. , Vestergaard , G. et al. ( 2015 ) CRISPR-Cas adaptive immune systems of the sulfolobales: unravelling their complexity and diversity . Life , 5 , 783 - 817 . 60. Goren , M.G. , Doron , S. , Globus , R. , Amitai , G. , Sorek , R. and Qimron, U. ( 2016 ) Repeat size determination by two molecular rulers in the type I-E CRISPR array. Cell Rep ., 16 , 2811 - 2818 . 61. Wang , R. , Li , M. , Gong , L. , Hu , S. and Xiang , H. ( 2016 ) DNA motifs determining the accuracy of repeat duplication during CRISPR adaptation in Haloarcula hispanica . Nucleic Acids Res ., 44 , 4266 - 4277 . 62. Wei , Y. and Terns , M.P. ( 2016 ) CRISPR Outsourcing: Commissioning IHF for Site-Specific Integration of Foreign DNA at the CRISPR Array . Mol. Cell , 62 , 803 - 804 . 63. Dillon , S.C. and Dorman , C.J. ( 2010 ) Bacterial nucleoid-associated proteins, nucleoid structure and gene expression . Nat. Rev. Microbiol. , 8 , 185 - 195 . 64. Lange , S.J. , Alkhnbashi , O.S. , Rose , D. , Will , S. and Backofen , R. ( 2013 ) CRISPRmap: an automated classification of repeat conservation in prokaryotic adaptive immune systems . Nucleic Acids Res ., 41 , 8034 - 8044 . 65. Waterhouse , A.M. , Procter , J.B. , Martin , D.M. , Clamp , M. and Barton , G.J. ( 2009 ) Jalview Version 2-a multiple sequence alignment editor and analysis workbench . Bioinformatics , 25 , 1189 - 1191 .


This is a preview of a remote PDF: https://nar.oxfordjournals.org/content/45/1/367.full.pdf

K.N.R. Yoganand, R. Sivathanu, Siddharth Nimkar, B. Anand. Asymmetric positioning of Cas1–2 complex and Integration Host Factor induced DNA bending guide the unidirectional homing of protospacer in CRISPR-Cas type I-E system, Nucleic Acids Research, 2017, 367-381, DOI: 10.1093/nar/gkw1151