Not all predicted CRISPR–Cas systems are equal: isolated cas genes and classes of CRISPR like elements
© The Author(s). 2017
Received: 28 May 2016
Accepted: 31 January 2017
Published: 6 February 2017
The CRISPR–Cas systems in prokaryotes are RNA-guided immune systems that target and deactivate foreign nucleic acids. A typical CRISPR–Cas system consists of a CRISPR array of repeat and spacer units, and a locus of cas genes. The CRISPR and the cas locus are often located next to each other in the genomes. However, there is no quantitative estimate of the co-location. In addition, ad-hoc studies have shown that some non-CRISPR genomic elements contain repeat-spacer-like structures and are mistaken as CRISPRs.
Using available genome sequences, we observed that a significant number of genomes have isolated cas loci and/or CRISPRs. We found that 11%, 22% and 28% of the type I, II and III cas loci are isolated (without CRISPRs in the same genomes at all or with CRISPRs distant in the genomes), respectively. We identified a large number of genomic elements that superficially reassemble CRISPRs but don’t contain diverse spacers and have no companion cas genes. We called these elements false-CRISPRs and further classified them into groups, including tandem repeats and Staphylococcus aureus repeat (STAR)-like elements.
This is the first systematic study to collect and characterize false-CRISPR elements. We demonstrated that false-CRISPRs could be used to reduce the false annotation of CRISPRs, therefore showing them to be useful for improving the annotation of CRISPR–Cas systems.
KeywordsCRISPR–Cas system false-CRISPR Tandem repeat STAR-like element
Phages are believed to largely outnumber their bacterial hosts in the ecosystems [1, 2] and thus pose a significant impact on the diversification of bacteria. On the other hand, bacteria develop various defense mechanisms, such as innate and adaptive immunities to protect them against invading nucleic acids including phages and other elements such as plasmids and genomic islands. The CRISPR–Cas (clustered, regularly interspaced short palindromic repeats–CRISPR-associated proteins) adaptive immune system is one of the mechanisms that prokaryotes have evolved to defend against invaders. The CRISPR–Cas systems are widespread in prokaryote, and have been found in most of the archaea species and about half of the bacterial species [3–5].
The typical genomic architecture of a CRISPR–Cas locus is composed of a CRISPR array, a locus of cas genes, and a leader region. Generally in a CRISPR array, the nearly identical repeats (the length of a repeat is from 21 to 47 bps) are separated by spacers of similar sizes: the spacers are the unique fragments acquired from foreign nucleic acid sequences. The leader sequence is an AT rich ~100-500 bp nucleotide sequence, and it is believed to serve as a promoter element for its adjacent CRISPR transcription  (and internal promoters are found within some CRISPRs [7, 8]). The defense activity of the CRISPR-Cas systems involves three steps: the acquisition of new spacers (the adaptation stage), biogenesis of crRNAs (the CRISPR transcripts), and the interference against cognate invaders guided by crRNAs . During the adaptation stage, the targeted nucleic acid sequence from the invader is integrated into the CRISPR array with the help of Cas proteins, such as Cas1, Cas2 as nuclease proteins . During the expression and interference stages, the precursor CRISPR locus (pre-crRNA) is then transcribed and processed into short mature CRISPR RNAs (crRNAs). Together with a Cas protein complex or a single Cas protein—depending on the different type of interference mechanism (see below)—the crRNA is guided to detect and further degrade the target DNA or RNA that contains the complementary sequence of the spacer [4, 11–13].
At the broadest level, the CRISPR-Cas systems can be divided into two classes. The class 1 system performs the function by a multisubunit Cas protein complex, and the class 2 system requires only a single Cas protein (Cas9 or Cpf1) in the crRNA-effector complex . The class 1 includes type I, III, and IV systems, and the class 2 includes type II and V systems . The signature genes of type I-V systems are cas3, cas9, cas10, csf1, and cpf1, respectively. Five main types can be further divided into 16 distinct subtypes: types I A–F and U, types II A–C, types III A–D, a type IV and a type V based on the different combination of additional cas genes [4, 14, 15]. Type I and II CRISPR-Cas systems provide the immunity against DNA [16, 17], whereas type III CRISPR-Cas systems are believed to target either DNA or RNA (e.g., Streptococcus thermophiles DGCC8004 Csm (III-A) complex (StCsm) has been demonstrated targets RNA ). The Cpf1-family protein found in type V (class 2) CRISPR-Cas systems has been experimentally demonstrated to perform DNA interference in a recent study .
The cas genes are usually believed to present in the direct vicinity of CRISPR loci ; and in the cases when multiple CRISPR arrays exist, some may be distant to the cas genes. Isolated CRISPRs, which lack nearby cas genes, were identified in a few species including Listeria monocytogenes , Aggregatibacter actinomycetemcomitans , and Enterococcus faecalis . Some of these isolated CRISPRs were observed to be expressed but not processed into small crRNA (e.g., in L. monocytogenes), which indicates they may be the remnants of previous functional CRISPR–Cas systems  or be involved in the bacterial autoimmunity . The spacer sequences in the orphan CRISPRs found in A. actinomycetemcomitans were antisense to bacterial self-coding genes , which further suggests that the existence of orphan CRISPRs is related to the regulation of other gene expression . In Haloferax volcanii, which contains three CRISPR loci with almost identical repeat sequences, all three CRISPR loci were expressed, producing CRISPR RNA (crRNA); however, it was found that not all crRNAs can trigger successful interference .
Here we systematically examined the genomic location of the CRISPR–Cas systems in the bacterial complete and draft genomes to quantify the tendency of co-localization of CRISPR array and cas genes, taking advantage of the recently updated classification of Cas proteins by Koonin and colleagues . We further explored the possible explanations to the existence of isolated cas loci using representative species. From isolated CRISPRs (without companion cas genes), we collected highly suspicious CRISPRs that lack any spacer diversity (and therefore unlikely to be real CRISPRs) and named them false-CRISPR elements. It has been shown that some tandem repeats may be confused as CRISPRs as some of them may contain “repeat-spacer” like structures , and Staphylococcus aureus repeat (STAR-like) elements (GC-rich direct repeats) could be confused as CRISPRs in Staphylococcus aureus [27, 28]. No study, however, has been carried out to systematically characterize these false-CRISPRs. We therefore classified the false-CRISPRs we identified into three categories based on their distribution in the genomes and “spacer” diversity: tandem repeats, STAR-like elements, and simple repeats. We note that some false-CRISPR elements were reported as CRISPRs in previous studies [29–32]. We believe this would pose a severe problem if they get propagated into downstream analysis and annotations.
Identifying CRISPR-Cas systems in bacterial genomes
We first used MetaCRT , which we modified from CRT  (to allow detection of partial repeats at the ends of CRISPR arrays), to predict the CRISPR arrays in complete bacterial and archaeal genomes. The genomes were downloaded in October 2016 from the NCBI ftp website (ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq). We focused on complete reference genomes in this study, as CRISPR–Cas systems may be found in separate contigs when draft genomes are used. However, for a few species we analyzed in detail, we augmented the list of genomes with draft genomes: including 13 draft genomes for Streptococcus thermophilus and 4055 draft genomes for Staphylococcus aureus. In some cases, a long CRISPR may be split into multiple ones because of repeats containing excessive mutations or long spacers. To avoid such cases, CRISPRs that are close to each other (<=200 bps) and share very similar repeat sequences were considered to be in the same locus. We then collected the consensus repeat for each putative CRISPR array. We clustered these consensus repeats at 90% sequence identity using CD-HIT-EST . In this way, a “cluster” contains more than two CRISPR arrays, and a “singleton” refers to the repeats exclusively found within their corresponding CRISPR array.
We then used hmmscan  to search putative proteins found in the genomes against a collection of Cas families to predict putative Cas proteins (using the gathering cutoff). In total, the collection contains 403 Cas families, among which eight were identified from the human microbiomes (using a combination of context-based and similarity-search approaches) , and 395 were from a recent study . Since Koonin and colleagues did not build models for the Cas families they curated , we used hmmbuild to construct hmm models for all of their families. Considering that gene prediction is far from perfect for many genomes, for the genomes/contigs that contain CRISPRs but lack cas genes, we further used the FragGeneScan , a gene predictor we have developed for predicting complete as well as fragmented genes in genomic sequences, to re-predict the genes, and then performed cas gene prediction to rule out the possibility of missing cas genes because the genes were not predicted in the first place.
A cas locus defined in this study should contain at least three cas genes, at least one of which belongs to the universal cas genes for CRISPR adaptation (cas1 and cas2) or the main components of interference module including cas7, cas5, cas8, cas10, csf1, cas9, cpf1 .
Determining the type of CRISPR-Cas loci
The CRISPR(s), together with its nearby (within 10,000 bps) cas genes, are defined as a CRISPR-Cas locus. A CRISPR that lacks cas genes in its vicinity region is defined as an isolated CRISPR locus. Conversely, a cas locus that does not have a nearby CRISPR array is called an isolated cas locus. The type of each CRISPR-Cas locus is determined according to type signature cas genes . We say the type assignment of a cas locus is confident if it has at least three type-consistent signature cas genes, except for type V. Since only one signature gene cpf1 is reported for type V , we assign type V based on a single signature gene, cpf1.
Calculating spacer diversity of a CRISPR
Spacers in a true CRISPR array are likely to be distinct (e.g., only two redundant spacers were found among the total 70 spacers in the long CRISPR array in the Streptococcus mutans NN2025 genome). Spacer diversity, therefore, has been used as one of the indications of the activity of CRISPR–Cas systems . We define that a CRISPR contains diverse spacers if at least half of its spacers share no more than 70% sequence identity by CD-HIT-EST clustering .
Phylogenetic tree reconstruction
We build phylogenetic trees for selected species, using concatenated sequences of 35 marker genes predicted from their genomes . To construct the phylogenetic tree, we utilized MUSCLE  to align the protein sequences, and applied the FastTree program  to construct the neighbor-joining trees using the discrete gamma model with 20 rate categories.
Availability of our results and software
We have made our results, including the CRISPRs, false-CRISPRs (and their annotations) at the CRISPRone website (http://omics.informatics.indiana.edu/CRISPRone) for users to download. The CRISPRone website also provides online prediction of CRISPR–Cas systems given genomic sequences, using a pipeline with integrated checking of false-CRISPRs.
Distribution of CRISPR-Cas systems in bacterial genomes
A total of 3323 and 370 cas loci (see in MATERIALS AND METHODS) (with or without CRISPRs in the neighborhood) were identified from 5596 bacterial and 214 archaeal complete genomes, respectively. Overall, Seventy-nine percent (2926 out of 3693) of them were confidently assigned to five main types (I-V), which includes 2001 (~68%) type I cas loci, 477 (~16%) type II cas loci (no type II cas loci were found in archaeal genomes, as discussed in ), 389 (~13%) type III cas loci, 24 type IV cas loci (no type IV cas loci were found in archaeal genomes), and 35 type V cas loci. These results suggest that the type I CRISPR-Cas system is the major type found in the bacterial genomes, which is consistent with the results in previous studies . Since type IV and V CRISPR–Cas systems are rare, in the following analyses, we focused on type I, II and III systems.
Distribution of cas1-cas2 genes pair together with CRISPR in three CRISPR-Cas system types
Prevalence of isolated/orphan cas loci in bacterial genomes
Although cas loci and CRISPRs tend to be clustered in the same genomic neighborhood, isolated cas loci (or CRISPRs) are found in genomes. In this study, if a cas locus (containing at least three cas genes) has no companion CRISPR array within a 10,000 bp window, we call it an isolated locus. An isolated cas locus is considered an orphan if its companion CRISPR is lost from the genome. A total of 2739 (including 2555 bacterial and 184 archaeal) species each were found to contain at least one isolated cas locus, resulting in a total of 753 and 101 isolated cas loci in bacterial and archaeal gnomes, respectively. 86% (650 out of 753) of bacterial species and 31% (57 out of 184) of archaeal species harbor only one isolated cas locus, although some may contain as many as four of such loci. In summary, among predicted cas loci, 12% (236/2001) of type I, 22% (109/477) of type II, and 28% (109/389) of type III cas loci are found to be isolated. Type III CRISPR–Cas systems have the highest ratio of isolated cas loci.
Isolated cas loci are either remnants of CRISPR–Cas systems without the immunity function, or they function together with remote CRISPR(s) in the same genome. On the other hand, an orphan cas locus may be non-functional, or lose its immunity function but maintain other function(s) (it was shown that some components of the CRISPR–Cas systems have a function in DNA repair ). Similarly, isolated CRISPRs can be non-functional (orphan), or work with distant cas locus in the same genome. Below we present selected examples belonging to the different scenarios.
The second example involves 18 Streptococcus thermophilus strains. The total of four CRISPR-Cas loci—including two type II-A loci with different consensus repeats (on the different strands), a type III-A system, and a type I-E system—were found in S. thermophilus (in Fig. 2b): the activity of two type II-A CRISPR-Cas loci was demonstrated in the previous studies [12, 39, 46, 47], and type III-A CRISPR-Cas locus has been experimentally demonstrated to target the RNA . Diverse spacers are found in the CRISPRs among these 18 isolates, consistent with a previous study . Complete and partial loss (resulting in isolated cas locus or CRISPR) of the different CRISPR–Cas systems were observed in this species—eight of the “complete” (based on Makarova et al’s definition ) type III-A cas loci lost their companion CRISPRs; by contrast, only three out of 29 type II-A cas loci do not have companion CRISPRs. This is consistent with the statistics based on the CRISPR–Cas systems in all species (see above), which showed that type III cas loci have the least tendency of co-locating with their companion CRISPRs among the three types of CRISPR–Cas systems.
In the last example, isolated cas loci found in Zymomonas mobilis are likely to function with remote CRISPR(s) in the same genome. Seven closely related strains (including ATCC 29191, ZM4, NCIMB 11163, ATCC 10988, 2 strains of NRRL_B-12526 and CP4 = NRRL B-14023) each harbor a cas locus containing type I-F signature genes, with CRISPRs distant in the genome. One strain (ATCC 29192), which is phylogenetically more distant from other strains, contains a type I-E cas locus and a CRISPR in the distance (Additional file 2). All CRISPRs loci of type I-F, scattered in the genomes, share the same repeat sequence. The large variety of CRISPR length and spacer sequences, together with the “complete” subtype I cas loci, implies that the type I cas loci together with the remotely CRISPR loci may still be active.
Curation of false-CRISPRs
A total of 11,729 putative CRISPRs were predicted including 10,754 from complete bacterial and 975 from archaeal genomes. All CRISPRs are first grouped based on their consensus repeat sequences (by CD-HIT-EST using 90% as the sequence identity cutoff), resulting in a total of 1222 groups, each containing at least two CRISPRs and 2996 singletons (see Methods). Groups of putative CRISPRs are then evaluated using two criteria. (1) Are CRISPRs in a group tend to be located near cas genes? If not, are there cas loci in the same genomes though they are far from the CRISPRs? (2) Do CRISPRs contain diverse spacers?
Characterization of the “CRISPR” clusters according to the cas genes and spacer diversity
# of clusters
# of CRISPRs
cas genes not found in the genome
Groups of putative CRISPRs that lack evidence (i.e., without cas genes in the host genomes and/or spacer diversity) and are not similar to real CRISPRs (containing at least 5 mismatches compared to real CRISPR repeats), on the other hand, are likely to be the genomic elements that superficially reassemble the CRISPR’s repeat-spacer structure but are not real CRISPRs. As a result, we derived a total of 3224 such elements, called false-CRISPR elements (their consensus “repeat” sequences are shown in Additional file 4), from 366 clusters and 1723 singletons of putative “CRISPRs”.
Annotation of false-CRISPR elements
Tandem repeats are the special sequences that are abundant in prokaryotic genomes. The region containing the tandem repeats is potentially hypermutable, which allows the bacteria to adapt to changing environments without increasing overall mutation rate [49, 50]. The hypermutable tandem repeats may have very similar structure with CRISPR arrays. In total 1744 out of 3224 (54%) false-CRISPRs (from 219 clusters and 822 singletons) were predicted to be tandem repeats by Tandem Repeat Finder .
In the previous study, Cramton et al.  identified the Staphylococcus aureus repeat (STAR-like) element, which contains the extraordinarily CG-rich repeats, and this repetitive element was found in up to 21 copies in a S. aureus genome. The structure of STAR-like elements could easily be confused with real CRISPRs. STAR-like elements contain the signature sequence T[G/A/T]TGTTG[G/T]GGCCC[C/A] , We checked for this signature sequence in our collection of false-CRISPRs and found 139 of them contain this signature which were therefore classified as STAR-like elements.
We observed that some of the false-CRISPRs contain short (1 bps - 5 bps) low-complexity repeats. Using RepeatMasker (http://www.repeatmasker.org/cgi-bin/WEBRepeatMasker), 56 false-CRISPRs were identified to contain the simple sets of DNA repeats. For example, the false CRIPSR found in Burkholderia pseudomallei 668 (genome ID: NC_009074; position 924,901 bps - 925,214 bps) contains 12 copies of sequence pattern GCCGTT. Six false-CRISPRs contain low complexity sequences, for example, the false-CRISPR in S. aureus TCH60 (genome ID: NC_017342; position 1,242,548 bps −1,242,837 bps), which is not STAR-like and tandem repeat, is identified as A-rich (43% of the region is adenine) and low complexity region.
Real and false CRISPRs in S. aureus
In total, 219 CRISPRs (in 23 clusters and 17 singletons) were identified by metaCRT from 123 S. aureus complete genomes (i.e., all these elements have the repeat-spacer structures). Six CRISPRs (from 3 clusters) are identified as real CRISPRs in our study. The 213 others are “false” CRISPR elements, among which 53 are tandem repeats, and 136 arrays are identified as STAR-like elements. In addition, we identified 26 real CRISPRs from S. aureus draft genomes, which far outnumbered the complete S. aureus genomes.
False-CRISPR elements in existing collections of CRISPRs
Since most existing methods for CRISPR identification are based on finding regions with repeat-and-spacer like structures, we expect to find false-CRISPRs in the collections of CRISPRs identified using these methods. We checked for presence of false-CRISPRs in Biswas’ collection , CRISPRBank , CRISPRmap , and the NCBI annotations . Because CRISPRmap only provides repeat sequences (but not genome and coordinate information of the repeats), we used similarity search to find false-CRISPRs in this collection: a repeat in CRISPRmap that shares 90% sequence identity, covering 90% of its length, with a false-CRISPR we identified is considered a potential false-CRISPR.
Breakdown of the false-CRISPRs found in existing collections of CRISPRs
Biswas’ collection 
Total # of CRISPRs
# of clusters
# of singletons
Total # of CRISPRs
# of clusters
# of singletons
Total # of CRISPRs
# of clusters
# of singletons
For the CRISPRmap  collection, 98 (out of 3527, 2.8%) repeats are similar to false-CRISPRs, among which 21 and 12 are classified as tandem repeats and STAR-like elements, respectively (Table 3). We further checked the CRISPR annotations provided by the NCBI  which combined CRT  and PILER-CR  to predict CIRPSRs, in archaeal and bacterial genomes. Out of 6386 CRISPR arrays (1557 from archaeal and 4829 from bacterial genomes) that were annotated in NCBI annotation files, 71 (1%; out of 6386) could be identified as false-CRISPRs.
In this study, we provide an overview of the distribution of different types (I-V) of CRISPR-Cas systems and also evaluate the CRISPRs and cas loci co-location tendency among currently available archaeal and bacterial complete genomes. Our analysis has shown that isolated CRISPRs and cas loci could be the remnant of the non-functional CRISPR-Cas systems, or they could function remotely with each other.
The existing, widely used CRISPR detection tools, such as CRISPRFinder  and CRT , predict the CRISPRs primarily based on the typical structure of CRISPRs (the almost identical repeats are separated by spacers). However, this structure is easily confused with other kinds of elements such as tandem repeats, STAR-like elements and simple repeats. Combing genomic context analysis and the diversity analysis of the “spacers,” we collected 3224 (~27%, 3224 out of 11,729 predicted “CRISPRs”) suspicious orphan CRISPRs, named false-CRISPRs.
Although earlier simpler prediction methods [26, 34] will predict false positives, later methods (e.g., the NCBI annotation in RefSeq  and CRISPRDetect ) have lower levels of false positives (for example, CRISPRDetect  has 0.2% false positives). Our results indicate that predictions of CRISPR solely based on the repeat-spacer structural patterns will pose a high risk of false positives, thus the use of additional information (i.e., spacer dis-similarity), proposed both in our study and recently developed approaches including CRISPRDetect , could greatly improve real CRISPR identification. Since about 50% of our false-CRISPR elements are identified as tandem repeats, we believe it is a useful step to run Tandem Repeat Finder  to filter out CRISPR predictions. Our collection of false-CRISPR and their classifications can be utilized in further studies to reduce the false annotation of CRISPR.
There are still a significant number of false-CRISPRs (1285) that remain unknown. We found that some repeat sequences of these unknown false-CRISPRs are extremely prevalent in their corresponding genomes, which may be caused by nucleotide composition bias. For example, false-CRISPRs found in the Conexibacter woesei DSM 14684 genome (whose GC-content is 72%) and in the extremely low GC-content genome Candidatus Carsonella ruddii HT isolate Thao2000 genome (AT-rich with 85% AT in the genome; Carsonella genomes are known to be AT-rich ) are likely to belong to this case. However, the unknown false-CRISPRs remain to be further investigated.
Using available complete archaeal and bacterial genomes, we systematically studied isolated CRISPRs (and cas loci) and false-CRISPRs. We demonstrated that it is important to differentiate isolated and false-CRISPRs, and our curation of false-CRISPRs could be used to reduce the false annotation of CRISPRs, useful for improving the annotation of CRISPR–Cas systems.
Clustered regularly interspaced short palindromic repeats
Genomic elements that superficially reassemble CRISPRs but don’t contain diverse spacers and have no companion cas genes
Staphylococcus aureus repeat (STAR-like) element
The authors thank Kenneth Bikoff for reading the manuscript.
This work has been supported by the National Science Foundation (grant number: DBI-1262588) and National Institutes of Health (grant number: 1R01AI108888).
Availability of data and materials
Repeat sequences of false-CRISPRs and annotations are shown in supporting materials, and are available at the CRISPRone website (http://omics.informatics.indiana.edu/CRISPRone). The CRISPRone website also provides online prediction of CRISPR–Cas systems.
QZ carried out the analyses of the CRISPR–Cas systems and helped to draft the manuscript. YY conceived of the study, participated in the analysis, and helped to draft the manuscript. Both authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Brussow H, Hendrix RW. Phage genomics: small is beautiful. Cell. 2002;108:13–6.View ArticlePubMedGoogle Scholar
- Labrie SJ, Samson JE, Moineau S. Bacteriophage resistance mechanisms. Nat Rev Microbiol. 2010;8(5):317–27.View ArticlePubMedGoogle Scholar
- Grissa I, Vergnaud G, Pourcel C. The CRISPRdb database and tools to display CRISPRs and to generate dictionaries of spacers and repeats. BMC Bioinformatics. 2007;8:172.View ArticlePubMedPubMed CentralGoogle Scholar
- Makarova KS, Haft DH, Barrangou R, Brouns SJ, Charpentier E, Horvath P, Moineau S, Mojica FJ, Wolf YI, Yakunin AF, et al. Evolution and classification of the CRISPR-Cas systems. Nat Rev Microbiol. 2011;9(6):467–77.View ArticlePubMedGoogle Scholar
- Lillestøl R, Redder P, Garrett RA, Brügger K. A putative viral defence mechanism in archaeal cells. Archaea. 2006;2:59–72.View ArticlePubMedPubMed CentralGoogle Scholar
- Jansen R, Embden JD, Gaastra W, Schouls LM. Identification of genes that are associated with DNA repeats in prokaryotes. Mol Microbiol. 2002;43(6):1565–75.View ArticlePubMedGoogle Scholar
- Deng L, Kenchappa CS, Peng X, She Q, Garrett RA. Modulation of CRISPR locus transcription by the repeat-binding protein Cbp1 in Sulfolobus. Nucleic Acids Res. 2012;40(6):2470–80.View ArticlePubMedGoogle Scholar
- Zoephel J, Randau L. RNA-Seq analyses reveal CRISPR RNA processing and regulation patterns. Biochem Soc Trans. 2013;41(6):1459–63.View ArticlePubMedGoogle Scholar
- Marraffini LA. CRISPR-Cas immunity in prokaryotes. Nature. 2015;526(7571):55–61.View ArticlePubMedGoogle Scholar
- Nunez JK, Kranzusch PJ, Noeske J, Wright AV, Davies CW, Doudna JA. Cas1-Cas2 complex formation mediates spacer acquisition during CRISPR-Cas adaptive immunity. Nat Struct Mol Biol. 2014;21(6):528–34.View ArticlePubMedPubMed CentralGoogle Scholar
- Bhaya D, Davison M, Barrangou R. CRISPR-Cas systems in bacteria and archaea: versatile small RNAs for adaptive defense and regulation. Annu Rev Genet. 2011;45:273–97.View ArticlePubMedGoogle Scholar
- Garneau JE, Dupuis ME, Villion M, Romero DA, Barrangou R, Boyaval P, Fremaux C, Horvath P, Magadan AH, Moineau S. The CRISPR/Cas bacterial immune system cleaves bacteriophage and plasmid DNA. Nature. 2010;468(7320):67–71.View ArticlePubMedGoogle Scholar
- Barrangou R, Marraffini LA. CRISPR-Cas systems: prokaryotes upgrade to adaptive immunity. Mol Cell. 2014;54(2):234–44.View ArticlePubMedPubMed CentralGoogle Scholar
- Makarova KS, Wolf YI, Alkhnbashi OS, Costa F, Shah SA, Saunders SJ, Barrangou R, Brouns SJ, Charpentier E, Haft DH, et al. An updated evolutionary classification of CRISPR-Cas systems. Nat Rev Microbiol. 2015;13:722–36.View ArticlePubMedGoogle Scholar
- Terns RM, Terns MP. CRISPR-based technologies: prokaryotic defense weapons repurposed. Trends Genet. 2014;30(3):111–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Brouns SJ, Jore MM, Lundgren M, Westra ER, Slijkhuis RJ, Snijders AP, Dickman MJ, Makarova KS, Koonin EV, van der Oost J. Small CRISPR RNAs guide antiviral defense in prokaryotes. Science. 2008;321(5891):960–4.View ArticlePubMedGoogle Scholar
- Gasiunasa G, Barrangoub R, Horvathc P, Siksnys V. Cas9–crRNA ribonucleoprotein complex mediates specific DNA cleavage for adaptive immunity in bacteria. Proc Natl Acad Sci. 2012;109:39.View ArticleGoogle Scholar
- Tamulaitis G, Kazlauskiene M, Manakova E, Venclovas C, Nwokeoji AO, Dickman MJ, Horvath P, Siksnys V. Programmable RNA shredding by the type III-A CRISPR-Cas system of Streptococcus thermophilus. Mol Cell. 2014;56(4):506–17.View ArticlePubMedGoogle Scholar
- Zetsche B, Gootenberg JS, Abudayyeh OO, Slaymaker IM, Makarova KS, Essletzbichler P, Volz SE, Joung J, van der Oost J, Regev A, et al. Cpf1 is a single RNA-guided endonuclease of a class 2 CRISPR-Cas system. Cell. 2015;163(3):759–71.View ArticlePubMedPubMed CentralGoogle Scholar
- Haft DH, Selengut J, Mongodin EF, Nelson KE. A guild of 45 CRISPR-associated (Cas) protein families and multiple CRISPR/Cas subtypes exist in prokaryotic genomes. PLoS Comput Biol. 2005;1(6):e60.View ArticlePubMedPubMed CentralGoogle Scholar
- Mandin P, Repoila F, Vergassola M, Geissmann T, Cossart P. Identification of new noncoding RNAs in Listeria monocytogenes and prediction of mRNA targets. Nucleic Acids Res. 2007;35(3):962–74.View ArticlePubMedPubMed CentralGoogle Scholar
- Jorth P, Whiteley M. An evolutionary link between natural transformation and CRISPR adaptive immunity. MBio. 2012;3:5.View ArticleGoogle Scholar
- Hullahalli K, Rodrigues M, Schmidt BD, Li X, Bhardwaj P, Palmer KL. Comparative analysis of the orphan CRISPR2 locus in 242 Enterococcus faecalis Strains. PLoS One. 2015;10(9):e0138890.View ArticlePubMedPubMed CentralGoogle Scholar
- Stern A, Keren L, Wurtzel O, Amitai G, Sorek R. Self-targeting by CRISPR: gene regulation or autoimmunity? Trends Genet. 2010;26(8):335–40.View ArticlePubMedPubMed CentralGoogle Scholar
- Maier LK, Lange SJ, Stoll B, Haas KA, Fischer S, Fischer E, Duchardt-Ferner E, Wohnert J, Backofen R, Marchfelder A. Essential requirements for the detection and degradation of invaders by the Haloferax volcanii CRISPR/Cas system I-B. RNA Biol. 2013;10(5):865–74.View ArticlePubMedPubMed CentralGoogle Scholar
- Grissa I, Vergnaud G, Pourcel C. CRISPRFinder: a web tool to identify clustered regularly interspaced short palindromic repeats. Nucleic Acids Res. 2007;35(Web Server issue):W52–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Cramton SE, Schnell NF, Gotz F, Bruckner R. Identification of a new repetitive element in Staphylococcus aureus. Infect Immun. 2000;68(4):2344–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Purves J, Blades M, Arafat Y, Malik SA, Bayliss CD, Morrissey JA. Variation in the genomic locations and sequence conservation of STAR elements among staphylococcal species provides insight into DNA repeat evolution. BMC Genomics. 2012;13:515.View ArticlePubMedPubMed CentralGoogle Scholar
- Biswas A, Fineran PC, Brown CM. Accurate computational prediction of the transcribed strand of CRISPR non-coding RNAs. Bioinformatics. 2014;30:1805–13.View ArticlePubMedGoogle Scholar
- Biswas A, Staals RHJ, Morales SE, Fineran PC, Brown CM. CRISPRDetect: a flexible algorithm to define CRISPR arrays. BMC Genomics. 2016;17:356.View ArticlePubMedPubMed CentralGoogle Scholar
- Lange SJ, Alkhnbashi OS, Rose D, Will S, Backofen R. CRISPRmap: an automated classification of repeat conservation in prokaryotic adaptive immune systems. Nucleic Acids Res. 2013;41:8034–44.View ArticlePubMedPubMed CentralGoogle Scholar
- Osmundson J, Dewell S, Darst SA. RNA-Seq reveals differential gene expression in Staphylococcus aureus with single-nucleotide resolution. PLoS One. 2013;8(10):e76572.View ArticlePubMedPubMed CentralGoogle Scholar
- Rho M, Wu Y, Tang H, Doak T, Ye Y. Diverse CRISPRs evolving in human microbiomes. PLoS Genet. 2012;8(6):e1002441.View ArticlePubMedPubMed CentralGoogle Scholar
- Bland C, Ramsey TL, Sabree F, Lowe M, Brown K, Kyrpides NC, Hugenholtz P. CRISPR Recognition Tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC Bioinformatics. 2007;8:209.View ArticlePubMedPubMed CentralGoogle Scholar
- Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22(13):1658–9.View ArticlePubMedGoogle Scholar
- Finn RD, Clements J, Eddy SR. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 2011;39(Web Server issue):W29–37.View ArticlePubMedPubMed CentralGoogle Scholar
- Zhang Q, Doak TG, Ye Y. Expanding the catalog of cas genes with metagenomes. Nucleic Acids Res. 2014;42(4):2448–9.
- Rho M, Tang H, Ye Y. FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res. 2010;38(20):e191.View ArticlePubMedPubMed CentralGoogle Scholar
- Horvath P, Romero DA, Coute-Monvoisin AC, Richards M, Deveau H, Moineau S, Boyaval P, Fremaux C, Barrangou R. Diversity, activity, and evolution of CRISPR loci in Streptococcus thermophilus. J Bacteriol. 2008;190(4):1401–12.View ArticlePubMedGoogle Scholar
- Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22:2.View ArticleGoogle Scholar
- Raes J, Korbel JO, Lercher MJ, von Mering C, Bork P. Prediction of effective genome size in metagenomic samples. Genome Biol. 2007;8(1):R10.View ArticlePubMedPubMed CentralGoogle Scholar
- Edgar RC. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics. 2004;5:113.View ArticlePubMedPubMed CentralGoogle Scholar
- Price MN, Dehal PS, Arkin AP. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol Biol Evol. 2009;26(7):1641–50.View ArticlePubMedPubMed CentralGoogle Scholar
- Cai F, Axen SD, Kerfeld CA. Evidence for the widespread distribution of CRISPR-Cas system in the Phylum Cyanobacteria. RNA Biol. 2013;10(5):687–93.View ArticlePubMedPubMed CentralGoogle Scholar
- Babu M, Beloglazova N, Flick R, Graham C, Skarina T, Nocek B, Gagarinova A, Pogoutse O, Brown G, Binkowski A, et al. A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair. Mol Microbiol. 2011;79(2):484–502.View ArticlePubMedGoogle Scholar
- Barrangou R, Fremaux C, Deveau H, Richards M, Boyaval P, Moineau S, Romero DA, Horvath P. CRISPR provides acquired resistance against viruses in prokaryotes. Science. 2007;315:1709–12.View ArticlePubMedGoogle Scholar
- Deveau H, Barrangou R, Garneau JE, Labonte J, Fremaux C, Boyaval P, Romero DA, Horvath P, Moineau S. Phage response to CRISPR-encoded resistance in Streptococcus thermophilus. J Bacteriol. 2008;190(4):1390–400.View ArticlePubMedGoogle Scholar
- Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999;27:573–80.View ArticlePubMedPubMed CentralGoogle Scholar
- Zhou K, Aertsen A, Michiels CW. The role of variable DNA tandem repeats in bacterial adaptation. FEMS Microbiol Rev. 2014;38(1):119–41.View ArticlePubMedGoogle Scholar
- Rando OJ, Verstrepen KJ. Timescales of genetic and epigenetic inheritance. Cell. 2007;128(4):655–68.View ArticlePubMedGoogle Scholar
- Holt DC, Holden MT, Tong SY, Castillo-Ramirez S, Clarke L, Quail MA, Currie BJ, Parkhill J, Bentley SD, Feil EJ, et al. A very early-branching Staphylococcus aureus lineage lacking the carotenoid pigment staphyloxanthin. Genome Biol Evol. 2011;3:881–95.View ArticlePubMedPubMed CentralGoogle Scholar
- Tatusova T, DiCuccio M, Badretdin A, Chetvernin V, Nawrocki EP, Zaslavsky L, Lomsadze A, Pruitt K, Borodovsky M, Ostell J. NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res. 2016;44(14):6614–24.View ArticlePubMedPubMed CentralGoogle Scholar
- Edgar RC. PILER-CR: fast and accurate identification of CRISPR repeats. BMC Bioinformatics. 2007;8:18.View ArticlePubMedPubMed CentralGoogle Scholar
- Sloan DB, Moran NA. Genome Reduction and co-evolution between the primary and secondary bacterial symbionts of psyllids. Mol Biol Evol. 2012;29(12):3781–92.View ArticlePubMedPubMed CentralGoogle Scholar
- Trapnell C, Pachter L, Salzberg S. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009;25:1105–11.View ArticlePubMedPubMed CentralGoogle Scholar