CRISPR-Cas based diagnostic assays provide a platform that brings together the benefits of qRT-PCR and serological assays in terms of portability, specificity, and ease-of-use. In recent publications, both CRISPR-Cas12 and CRISPR-Cas13 have been utilized to tackle emerging viral threats [1, 2]. More recently these platforms have been rapidly shifted to aid in the detection of SARS-CoV-2, with assays utilizing both enzymes gaining FDA approval [6, 17]. The broader implementation of assays utilizing both CRISPR-Cas12 and CRISPR-Cas13 has been significantly restricted by the tedious design process required for primer and crRNA guides [18]. The design process challenges researchers to ensure their chosen targets have sequence conservation in both the chosen primer and crRNA target sequences [19]. Mismatches in the PAM sequence results in non-recognition of the target, while mismatches in the seed region decrease target recognition. Both issues lead to reduced cleavage efficiency and therefore, lowered detection sensitivity [20, 21].
In this study, we have demonstrated the utility of PrimedSherlock for designing specific crRNA guides for use with CRISPR-Cas12 platforms in conjunction with primer lists generated by PrimedRPA. For each target of interest, we leveraged publicly available genomes to design libraries of all known strain sequences as well as libraries of potential off-target viruses. In less than twenty-four hours for each target, our Python tool was able to rapidly identify and analytically evaluate the potential specificity of crRNA targets for provided primer pairs for each respective virus or serotype of interest.
The major limiting factor for the speed of the tool is the CPU and the GPU hardware present in the user’s system. Most underlying genomic analysis is powered by Cas-OFFinder. For each primer pair, any valid crRNA sequence is added to a list, which is then provided to Cas-OFFinder for on-target and off-target analysis. This is essentially a BLAST search of each on-target and off-target genome with thousands of queries of Cas-OFFinder for each crRNA. This large list of searches is divided between user-defined available threads and relies on the GPU for each individual run. Users have the option of toggling CPU-based Cas-OFFinder analysis. However, prior studies have determined that doing so leads to a 20-fold increase in run time [22], with analysis speed depending on thread count and GPU. End users can also utilize CPU-based analysis with minor code modifications that are included on the GitHub repository. In our development we utilized three diverse test systems to evaluate the rate at which ideal primer and crRNA combinations could be evaluated. The first was a 32-thread AMD 9 5950X with a founder series NVIDIA 3080TI processor. The second was a 24-thread AMD 3900XT with a founder series NVIDIA 2080TI processor. The third was a Dell XPS 15 equipped with a i7-8750H and a 1050ti processor. All tests were conducted with multithreading and utilization of all but one thread (31 threads, 23 threads, 11 threads) and GPU based Cas-OFFinder analysis. For both enthusiast level PC’s, primer and crRNA design was efficiently carried out with both PrimedRPA and PrimedSherlock with average run times of less than 8 h for most configurations. The midrange Dell XPS took significantly longer with an average of more than two days per test run (data not shown).
The second major factor in speed is the ability for the user to diligently curate on-target and off-target datasets. For each viral or serotype target pairs of highly conserved crRNA were discovered and analytically determined to cover most strains of each pathogen listed on GenBank. However, it required manual curation of GenBank entries and elimination of rouge or mislabeled sequences. A single genome misplaced in either on-target or off-target sequences prevented PrimedRPA or PrimedSherlock from working correctly. For each program, poor curation led to false negatives with the DENV serotype datasets. The script is written in such a manner that if a crRNA target sequence is located within an Off-Target genome it is immediately blacklisted. Having a misplaced genome can result in this occurring for all target sequences, making it imperative for the user to curate the datasets in advance. For curation, we recommend using a program such as Unique Sequences, which can be found within Galaxy Tools to remove duplicate entries. We also recommend removing sequences with aberrant n counts as well as visually screening the databases for mislabeled sequences. For example, we found instances of “vaccine candidate” or “chimera” that were not viral sequences. Lastly, we recommend temporarily combining both on and off target databases and using the program Unique Seq to determine if sequences have been accidently incorporated into the incorrect database. By adopting this curation strategy, we experienced no runtime errors or issues with viral targets for both PrimedRPA and Primed Sherlock.
During refinement, we further explored the platform with several different thread count configurations of the 5950X and 3900XT setups utilizing the ZIKV dataset. We discovered that thread count has a direct effect on total runtime. Without multithreading, run times were approximately three times longer than for 24 × and 32 × multithreading, which were the maximum available thread counts for the 3900XT and 5950X systems, respectively (Fig. 2). To our surprise, each hardware configuration reached near-minimum completion times at one-fourth the maximum thread count: 6X for the 3900XT and 8X for the 5900X (Fig. 2, Additional file 1: Fig. S1). Additionally, the 3900XT outperformed the 5950X in both single and multithreaded configurations. We believe this could be explained by the base clock speed of each CPU, which is 3.8ghz for the 3900XTand 3.4ghz for the 5900X. The 3900XT may also outperform the 5950X in per core performance. Interestingly, we found that the 12-thread configuration slightly outperformed the 24-thread configuration for the 3900XT, which could be due to minor stochastic variations in performance (Fig. 2).
PrimedSherlock was originally developed as an internal tool to rapidly speed up the development of CRISPR-Cas12 assay targets. Each virus or serotype target reported within this article was designed using multiple versions of the script. Of note, the ZIKV and WNV assays were designed with earlier renditions. Earlier bench-validation efforts of analytical datasets generated by the tool provided us with valuable feedback which improved later renditions. Of important note, bugfixes to potentially issue code segments. One such bugfix was an issue with the constraints of what regions should be searched for crRNA targets. Our earliest rendition allowed for the Primer regions to be included. This resulted in one of the original crRNA targets for WNV being partially present within the primer region, which was corrected in later renditions. However, the partial presence within the primer and the formation of a primer dimer was enough to elicit a false-positive, indicating the importance of pre-screening. This phenomenon was resolved by changing the forward primer for WNV, to the one that is included in Table 1.
For any nucleic acid based diagnostic assay, a major challenge is primer conservation. Sequence mutations within the regions responsible for amplifying viral or template can easily become mutated causing reduction or complete failure in template amplification. Utilizing the Isothermal Recombinase Polymerase amplification, of particular interest to us is conservation of 5’ and 3’ ends of the forward and reverse primers. Previous studies have indicated that non-sequence homology in these regions significantly hampered the ability to amplify template [23]. In order to combat this, we set stringent conservation standards in PrimedRPA and relied on at least 80% primer sequence homology for both the forward and reverse primers across all viral targets. Utilizing two diverse strains for each viral or serotype target, we did not experience any issues with detection assays using either RT-RPA or RT-PCR based cDNA amplification nor did we experience any failed detections. As described above, we observed a significantly reduced yield of cDNA for DENV-3 Strain 2 (BEI NR-50532). We attribute this to a reduced amount of template, as compared with the other viral strains. For each NR-50532 spike-in we utilized 2ul of stock at 2.2 × 103 copies per microliter. Other stocks were considerably less diluted which may resonate with the poor amplification experienced for this spike-in. However, as demonstrated in Fig. 5 and Fig. 6, detectability was still achieved in line with that of the other DENV-3 strain.
To validate our analytically generated crRNA pairs and primer sets we utilized two commercially available CRISPR-Cas12 enzymes. We utilized both wildtype recombinant Acidaminococcus sp. BV3L6 (A.s) nuclease as well as a modified recombinant Lachnospiraceae bacterium ND2006 (L.b) nuclease with several modifications to improve on-target editing and temperature tolerance. The choice to include both was strongly due to the influx of commercially available types, as well as manufacturer modifications and a desire to ensure usability across available CRISPR-Cas12 enzymes. For our assays we utilized multiple biological and technical replicates. For each of our included figures, each fluorescence assay graph included two averaged technical replicates representing the fluorescence detected from the detection assay. We observed consistent amplification across technical replicates. However, fluorescence varied significantly between the two versions of Cas12 tested. The apparent fluorescence reduction for the modified L.b Cas12-Ultra (Fig. 6), may be due to proprietary enzyme modification(s) that may otherwise increase on-target gene editing efficiency.
Bench validation of the highest-ranking crRNA pairs utilizing fluorescence assay revealed positive detection of the target virus across all divergent strains. Although there was less fluorescence units produced by the L.b enzyme in response to target presence, there was still a considerable difference between the no template, negative RT controls, off-targets and target samples. For both, however more noticeable in the wildtype figure cleavage efficiency of the Cas12 enzyme varied by viral strain. There was a noticeable demonstration of mismatch effects being provided in Fig. 5 between the two DENV-4 strains. This could either be caused by sequence variation between the crRNA site and the guide presence or the titer of each virus gDNA sample varying significantly (1.4 ng/ul, NR-50533 vs 126 ng/ul, NR-4289).
In drafting PrimeSherlock, we carefully considered published studies when determining the best scoring method for crRNAs. We reasoned that the most important factor for the toolset should be minimizing crRNA mismatches, especially within the PAM region, as mutations can severely disrupt dsDNA target cleavage [3]. After that, our design goal was to ensure that crRNA targets were conserved enough not to function independent of the target amplicon presence. In one study, the authors demonstrated that accumulation of mismatches lead to a diminishment of target induced off-target activity [3]. Cleavage of the non-target ssDNA was reduced and ultimately diminished at an accumulation of less than 15 base pair matches to the target nucleic acid sequence. We incorporated this constraint into the design process by excluding any crRNA possessing matches of more than 10 base-pairs to off-target genomic sequences.
In terms of crRNA specificity for target sequences, as few as two mismatches can cause a significant reduction in detection efficiency [24,25,26]. Furthermore, mismatches in the seed region, or bases 1–6 proximal to the PAM site, can negatively impact on-target recognition [3, 26,27,28]. Our design logic accounts for these mismatches to increase crRNA efficiency. PrimedSherlock is designed to select only crRNAs that demonstrate the highest conservation across all targeted strains or genomes, while mismatched crRNAs are biased against.
We further strengthened Primed Sherlock by selecting two crRNA targets within each amplicon, considering the impact of seed region and minor distal mismatches. By relying on a multiplexed approach, we greatly reduce the impact of a PAM site mutation, seed region mismatch or minor distal sequence mismatch [29]. Multiplexed approaches also reduce the need for template amplification, with larger arrays of crRNA targets diminishing the need for template amplification entirely [30]. After determining the most efficient path for crRNA design, we elected to not automate any modifications to the 3’ or 5’ ends of the identified crRNA targets, given the current debate on the effectiveness of such modifications [31, 32].
The prevention of off-target induced false positives was a major consideration during the design process. In CRISPR-Cas diagnostic assays that involve sample amplification, two factors limit the potential for off-target induced enzymatic activity. The diligent design of primers to minimize off-target amplification and the specificity of crRNAs for target sequences. For example, in assays where RPA is combined with CRISPR-Cas, either an off-target amplicon or an off-target dsDNA sequence independently wouldn’t necessarily lead to a false positive detection. However, in a multiplexed approach false positive detection in the absence of any template amplification has been documented [30]. In our opinion, combining both primer and crRNA specificity, is the best approach to reducing potential false positives. By performing off-target analysis of crRNA sequences, we can control for user-provided primer design independent of off-target enzyme activation. Further, the shift to diagnostic assays directly from sample without amplification is highly debated [33,34,35]. By maintaining this redundancy in our analysis, users should be able to adapt the toolset for CRISPR-Cas assays independent of sample amplification and account for the risks associated with multiplexed crRNA approaches.
An advantage of providing the original source code for PrimedSherlock is that it is modifiable as new evidence comes to light. For example, diverse nucleic acid targets may form secondary and tertiary structures, which themselves may affect the enzymatic properties of CRISPR-Cas12. Mutations in Cas12 may also alter the ability of the enzyme to interact with targeted dsDNA sequences. As more studies regarding the parameters that guide crRNA target acquisition and complexed enzyme activation are conducted, PrimedSherlock can be updated to further improve outcomes.
In validating PrimedSherlock, we focused our efforts on diverse mosquito-borne viruses of medical concern. Our examples were selected as a proof of principle demonstration that given a wide assortment of strains the tool could identify conserved regions enabling highly specific CRISPR-Cas detection assays. One strength of the toolset is that users may readily determine their own diagnostic goals. On one hand, target sequences can be narrowly selected to limit coverage to the most important circulating strains or newly disseminating strains. An example could be targeting strains of ZIKV linked to microcephaly by focusing input datasets [36]. Conversely, coverage can be maximized to all available genomes sequences for a particular pathogen of interest more generally.