Preparation of double-stranded DNA oligonucleotide ribozyme templates
For creation of the ribozymes in vitro, two complementary oligonucleotides containing the hammerhead ribozyme core sequence flanked by the sequences for the two binding arms and a T7 RNA polymerase promoter sequence were annealed into duplex DNA in 10 mM Tris, pH 8.0, and 50 mM NaCl, by incubation at 94°C for 5 min, followed by slow cooling to room temperature. All oligonucleotides were obtained from Integrated DNA Technology (IDT, Coralville, IA).
In vitro transcription of ribozyme and substrate target RNA
In vitro transcription of the substrate and ribozyme RNA was performed using the MEGAscript and MEGAshortscript kits (Ambion-ABI, Austin, TX), respectively, following the manufacturer's instructions. Either 2.5 μg of linearized plasmid (pTRIamp19, Ambion) containing the target ABCG2 cDNA sequence, or 1.5 μg of ribozyme DNA were used as template. After transcription, the DNA templates were digested with RQ1 RNase-free DNase. Unincorporated nucleotides were removed from the RNA transcripts by size-exclusion chromatography with a ProbeQuant G-50 Micro Column (GE-Healthcare, Piscataway, NJ) or by phenol/chloroform extraction, both of which were followed by an ethanol precipitation. The purified in vitro-transcribed RNAs and ribozymes were then quantitated spectrophotometrically, and their quality verified by gel electrophoresis (see Additional file 1). Two separate substrate RNAs were made, one from nucleotides -225 to +1011, and one from nucleotides +586 to +1708, relative to the A of the ATG start codon of the full length ABCG2 cDNA (GenBank accession no. NM_004827). Individual ribozymes are numbered consecutively in the order of occurrence of the GUC cleavage sites to which they bind, starting from nucleotide -285. Thus, for example, GUC1 refers to the first GUC triplet after nucleotide -285. A total of 15 hammerhead ribozymes targeted to GUC sites were designed and prepared (Table 1). These ribozymes were constructed with the same ribozyme core sequence, but with different sequences for binding arms that were complementary to the target sequences at the binding site (Figure 1A; also see Additional file 2). For each of these ribozymes, the 3' binding arm had 11 nucleotides, and the 5' binding arm had nine nucleotides.
In vitro cleavage of target sequence and identification of cleavage products
The target RNA (10 pmol) and ribozyme (50 pmol) under study were mixed in 50 mM Tris, pH 8.0, and the in vitro cleavage reaction was initiated by the addition of 20 mM MgCl2. One μl of RNaseGuard was also added, and the mixture was incubated at 37°C. 10-μl aliquots were removed after 0, 5, 10, 15, 30, and 60 min, and the reaction was terminated by the addition of 50 mM EDTA. The cleavage products were then analyzed by electrophoresis in a 2% (v/v) formaldehyde/2.0% (w/v) agarose gel for 3–4 hr at 70 V. The separated products were stained with SYBR Green or ethidium bromide and photographed under UV light [40].
Quantification of residual substrate by real-time RT-PCR
Since a ribozyme irreversibly cleaves its substrate, we reasoned that the cleavage reaction could be quantified through measurement of the amount of substrate remaining by real-time RT-PCR, using primer pairs that span the cleavage site. Accordingly, an aliquot of the cleavage reaction containing both the remaining, uncleaved substrate and the cleavage products was added to a one-step real-time RT-PCR reaction mix containing SYBR Green (Sigma, St. Louis, MO) according to the manufacturer's instructions, and amplification was carried out for 35–45 cycles in a LightCycler® (Roche, Indianapolis, IN), under conditions appropriate for each primer pair (see Additional file 3). Primers flanking each cleavage site were chosen such that the PCR products were between 600 and 400 bp long. The amount of uncleaved substrate present was determined from the crossing point values (CT) calculated by the Lightcycler software from the amplification curve. The relative amount of template remaining at each time point (Su(t)) was then calculated by , where CT(t) is the CT value at time t, and CT(0) is the CT value at time 0. Each time point was assayed in duplicate, and each cleavage reaction was repeated at least four times independently with different batches of substrate RNA. Selected ribozymes were also analyzed with differing ribozyme preparations. No significant activity differences were observed between separate ribozyme and/or substrate preparations. For the subsequent calculations, the relative amount of substrate cleaved at 3600 sec (1-Su3600) was used as the measure of ribozyme activity. In preliminary experiments, we determined that the RT-PCR reaction was linear with the amount of substrate present (data not shown).
Prediction of mRNA secondary structure
The determination of mRNA secondary structure presents both theoretical and experimental challenges. One major impediment to the accurate prediction of mRNA structures stems from the likelihood that a specific mRNA molecule does not adopt a single structure in solution, but instead likely exists in thermodynamic equilibrium among a population of structures [30, 31, 45]. Thus, the computational prediction of secondary structure based on free energy minimization is not well suited to the task of providing a realistic representation of mRNA structures in vivo.
An alternative to free energy minimization for characterization of the ensemble of probable structures for a given RNA molecule has been developed [32]. In this approach, a statistically representative sample is drawn from the Boltzmann-weighted ensemble of RNA secondary structures for the RNA. Such samples of even moderate size can faithfully and reproducibly characterize structure ensembles of enormous size, so that sampling estimates of structural features are statistically reproducible from one sample to another. In particular, in comparison to free-energy minimization, this method has been shown to make better structural predictions [35] and to better represent the likely population of mRNA structures [34], and to yield a significant correlation between predictions and antisense inhibition data [36, 37]. A sample size of 1,000 structures has been shown to be sufficient to guarantee statistical reproducibility in typical sampling statistics and structure clustering features [32, 34]. In applications to modeling RNA target binding by a (partially) complementary nucleic acids, because a single-stranded block of four or five nucleotides is essential for the nucleation step of the hybridization [25, 46, 47], the probability that such block is single-stranded must be high. Thus, in the current and other related applications, we consider the sample size of 1,000 to be sufficient. In the case that a structural feature of small probability is of interest, a much larger sample would be required. The structure sampling method has been implemented in the Sfold software program for RNA folding and applications [33] and is used here for mRNA folding.
Prediction of ribozyme secondary structure
The core of the ribozyme is considered to exist in a mixture of conformations in solution that can interchange rapidly [48–51]. In accordance with this established dynamic view of the hammerhead structure, we also employed Boltzmann structure samples generated by Sfold for the prediction of ribozyme secondary structure. Again, a sample size of 1,000 was used for characterizing probable ribozyme structures at equilibrium.
Structural and thermodynamic parameters
The catalytic activity of a trans-cleaving ribozyme can be affected by many factors. Here, we have focused on a number of structural and thermodynamic parameters. These parameters take into account the secondary structure of the target, the secondary structure of the ribozyme, and the stability of the ribozyme-target duplex. Below, we define these terms in the current context and compute the total free energy change for modeling the hybridization process.
ΔGdisruption is the free energy cost for disruption of the secondary structure at the ribozyme binding site on the target mRNA (Figure 1B), and thus is a measure of accessibility at the target site. For the 15 designed ribozymes, each with nine nucleotides for the 5' binding arm and 11 nucleotides for the 3' binding arm, the binding site involves 20 nucleotides, excluding the unpaired C of the GUC triplet (Figure 1A). To calculate ΔGdisruption, we adopted the simplifying assumption that the binding of a ribozyme to a relatively much longer mRNA should induce a local structural alteration at the target site, but no longer-range effects on overall target secondary structure. In other words, we defined local structural alteration as the breakage of the intramolecular base pairs involving the target site to permit formation of the ribozyme-target duplex (Figure 1). Specifically, ΔGdisruption was calculated as the energy difference between ΔGbefore, the free energy of the original mRNA structure, and ΔGafter, the free energy of the new, locally altered structure (ΔGdisruption = ΔGbefore- ΔGafter). We calculated ΔGbefore from the average energy of the original 1,000 structures predicted by Sfold, and ΔGafter from the average energy of all of the 1,000 locally altered structures. Therefore, under the local disruption assumption, the calculations did not require refolding of the rest of the target sequence.
ΔGswitch is the free energy cost for the ribozyme to switch from one conformation to the conformation that is most favorable for target binding and subsequent cleavage. Here, the starting conformation is any conformation predicted by Sfold, and the binding conformation is the one for which the ribozyme core is correctly folded and both binding arms are single-stranded (Figure 1B). Thus, ΔGswitch = ΔGs - ΔGb , where ΔGs is the free energy of the starting conformation, and ΔGb is the free energy of the binding conformation. In the case that the starting conformation is the binding conformation, ΔGswitch = 0.0 kcal/mol. We calculated ΔGs by the average free energy of the 1000 structures predicted by Sfold for the ribozyme. ΔGb is the same for different starting conformations of a given ribozyme sequence, so there is no need to average over a structure sample.
ΔGhybrid is the energy gain due to the complete intermolecular hybridization between the ribozyme binding arms and the nucleotide sequence of the target binding site. It is calculated by the sum of base-pair stack energies for the two ribozyme arm-target duplexes, an energetic penalty ("initiation energy") for the initialization of bimolecular interaction [52], and other penalties or energies associated with the multi-branched loop formed by the three adjacent helices. Specifically, ΔGhybrid = ΔGinitiation + ∑1≤i≤10ΔGH3_stacking(i) + ∑1≤j≤8ΔGH1_stacking (j)+ ΔGmulti-loop+ ΔGH3_terminal + ΔGH1_terminal + ΔGdangle, where the initiation energy ΔGinitiation = 4.1 kcal/mol [52]; ΔGH3_stacking (i)(1 ≤ i ≤ 10) is the stacking energy for the i-th base-pair stack for helix III (Figure 1A); ΔGH1_stacking (j)(1 ≤ j ≤ 8) is the stacking energy for the j-th base-pair stack for helix I; ΔGmulti-loop is a linear penalty for the multibranched loop formed by the three helices; ΔGH3_terminal is a penalty of 0.5 kcal/mole for the terminal A-U pair for helix III, while ΔGH1_terminal applies the same penalty for a terminal A-U or G-U pair [53] for helix I (e.g., A-U for ribozyme GUC19, Figure 1A); and ΔGdangle is a sum of free energies for dangling ends (i.e., single base stacks)[52]. More specifically, for the linear multibranched loop penalty, ΔGmulti-loop= a + b(number of unpaired bases)+c(number of helices), where a, b, and c are respectively the offset, the free base penalty and the helix penalty, and a = 3.4 kcal/mol, b = 0.0 kcal/mol, and c = 0.4 kcal/mol [53]. In our present context, there are 11 unpaired bases and three helices in the loop, so ΔGmulti-loop= 5.2 kcal/mol, a constant for all ribozymes studied here. For a terminal base-pair N-N' (A-U for ribozyme GUC19, as shown in Figure 1A) for helix I, ΔGdangle = min [ΔG3(U-A,C), ΔG5(N-N',C)] + ΔG5(A-U,A)+ ΔG3(C-G,G)+ ΔG5(G-C,A)+ ΔG3(N'-N,C), where the free energies for both 5' and 3' dangling ends [53] are used, and min [ΔG3(U-A,C), ΔG5(N-N',C)] is the minimum of the two dangling energies, to take into account two possibilities of single-base stacking for the C of the GUC cleavage triplet. It is assumed that a single unpaired nucleotide between two adjacent helices for a multi-branched loop stacks onto the terminal base pair of the helix possessing the more favorable dangling energy.
Finally, we computed ΔGtotal, the total energy change for the ribozyme-target hybridization. ΔGtotal can be calculated through consideration of the energy gain due to the complete intermolecular hybridization and the energy costs owing to structure alterations for both the target and the ribozyme. With use of the parameters introduced above, ΔGtotal = ΔGhybrid - ΔGswitch - ΔGdisruption.
Statistical analyses
The standard univariate linear regression was used for predicting ribozyme activity by each of the parameters listed above. The P-value measures the statistical significance of the parameter, and the R2 of the regression indicates the degree of variability in ribozyme activity that is attributed to the parameter. The Pearson's correlation coefficient between a parameter and the ribozyme activity was also computed. We note that the P-value of the correlation is the same as the P-value of the parameter from the standard univariate regression analysis. The software package R [54] was used for the statistical analyses.