Identification of regions in multiple sequence alignments thermodynamically suitable for targeting by consensus oligonucleotides: application to HIV genome

Background Computer programs for the generation of multiple sequence alignments such as "Clustal W" allow detection of regions that are most conserved among many sequence variants. However, even for regions that are equally conserved, their potential utility as hybridization targets varies. Mismatches in sequence variants are more disruptive in some duplexes than in others. Additionally, the propensity for self-interactions amongst oligonucleotides targeting conserved regions differs and the structure of target regions themselves can also influence hybridization efficiency. There is a need to develop software that will employ thermodynamic selection criteria for finding optimal hybridization targets in related sequences. Results A new scheme and new software for optimal detection of oligonucleotide hybridization targets common to families of aligned sequences is suggested and applied to aligned sequence variants of the complete HIV-1 genome. The scheme employs sequential filtering procedures with experimentally determined thermodynamic cut off points: 1) creation of a consensus sequence of RNA or DNA from aligned sequence variants with specification of the lengths of fragments to be used as oligonucleotide targets in the analyses; 2) selection of DNA oligonucleotides that have pairing potential, greater than a defined threshold, with all variants of aligned RNA sequences; 3) elimination of DNA oligonucleotides that have self-pairing potentials for intra- and inter-molecular interactions greater than defined thresholds. This scheme has been applied to the HIV-1 genome with experimentally determined thermodynamic cut off points. Theoretically optimal RNA target regions for consensus oligonucleotides were found. They can be further used for improvement of oligo-probe based HIV detection techniques. Conclusions A selection scheme with thermodynamic thresholds and software is presented in this study. The package can be used for any purpose where there is a need to design optimal consensus oligonucleotides capable of interacting efficiently with hybridization targets common to families of aligned RNA or DNA sequences. Our thermodynamic approach can be helpful in designing consensus oligonucleotides with consistently high affinity to target variants in evolutionary related genes or genomes.


Background
Finding optimal targets for oligonucleotides in multiple variants of related sequences is useful for a number of practical tasks. One of them is the design of oligonucleotide probes for RNA/DNA based pathogen detection assays. Beside PCR, such detection can be performed using strand displacement amplification (SD) [1,2], transcription-mediated amplification (TMA) [3], nucleic acid sequence-based amplification (NASBA) [4], hybridization protection assay [5], branched DNA signal amplification [6,7], in situ hybridization [8,9] or other techniques that are currently being developed and require oligonucleotides interacting with RNA or DNA as a basic step.
Sensitive detection of HIV RNA in plasma of infected persons is also achieved by methods that depend on binding of oligonucleotides to viral RNA sequences. Currently, RNA detection of some proportion of HIV-1 variants is not optimal, especially at low viral loads [10,11]. To develop more sensitive and far reaching detection assays, it is important to select HIV-1 RNA target regions where mutations are least disruptive for potential duplex formation with complementary oligonucleotides.
Computer programs for the generation of multiple sequence alignments such as "Clustal W" [12] allow detection of regions that are most conserved among many sequence variants. However, even for regions that are equally conserved, their potential utility as hybridization targets varies. Mismatches in sequence variants are more disruptive in some duplexes than in others. Additionally, the propensity for self-interactions amongst oligonucleotides targeting conserved regions differs and the structure of target regions themselves can also influence hybridization efficiency.
The currently existing methods that predict effective oligonucleotide primers for performing PCR from DNA templates work well for those applications where relatively stringent conditions are employed. This is because PCR experimental design greatly simplifies the prediction problem: hybridization is performed at relatively low ionic strength and high temperature. Under these conditions, oligonucleotide and target secondary structures are relatively unimportant. The secondary structure of an oligonucleotide or its target RNA may be not a significant problem for influencing hybridization efficiency at 65°C or higher temperatures, but it becomes a problem at temperatures close to 37°C. These temperatures are frequently used for oligo-RNA hybridization in a number of different RNA detection assays.
This work suggests thermodynamic filtering procedures to select optimal consensus oligonucleotide targets in multi-ple sequence variants that can be used for RNA detection assays performed at 37°C.

Results and discussion
The scheme developed for discrimination of conserved regions in multiple sequence variants is based on their potential to serve as efficient hybridization targets for oligonucleotides and involves several steps: First, creation of a consensus sequence of RNA or DNA from aligned sequence variants with specification of the lengths of fragments to be used as oligonucleotides in the analyses. Second, selection of DNA oligonucleotides that have pairing potential, greater than a defined threshold, within the set or subset of the aligned RNA sequences. Third, elimination of DNA oligonucleotides that have self-pairing potentials for intra-and inter-molecular interactions greater than defined thresholds. The consensus RNA subsequences complementary to the remaining set of oligonucleotides are preferred potential targets for hybridization.
The discrimination scheme described above was applied to the HIV genome where the need to identify hybridization targets is obvious. Here we present the result of analysis for the HIV-1 gag gene, data for the complete HIV genome are available at: http://www.gesteland.genet ics.utah.edu/HIV/37C_targets.html. For each successive fragment of consensus sequence varying in size from 20 to 35 nt., oligonucleotides forming stable duplexes with 90% of RNA variants (free energies (∆G°3 7 ) ≤ -30 kcal/ mol) and little self-structure with (∆G°3 7 ) ≥ -8 kcal/mol for inter-oligonucleotide pairing and (∆G°3 7 ) ≥ -1.1 kcal/ mol for intra-molecular pairing, were selected [13,14].
Theoretically optimal hybridization targets are shown in Figure 1. Figure 2 shows that only a sub-set of conserved target fragments in the gag gene is "optimal" for hybridization with oligonucleotides at 37°C. This histogram demonstrates that the conservation values for 30 nucleotide gag windows vary from 68% to 95%.
The gag consensus sequence yields a total number of 23704 complementary oligonucleotides ranging in size from 20 to 35 mers. A set of 1747 oligonucleotides that is 14 times smaller than the initial one, remains after application of the thermodynamic discrimination steps described here. Limited work has been performed on simultaneous combinations of thermodynamic or homology analyses for predicting optimal universal targets in related RNA sequences for oligonucleotide hybridization [15][16][17][18][19]. The present work employs a distinctive scheme and shows its utility. The key ingredients in the present scheme are experimentally derived thermodynamic discriminatory steps. Decisions about the suitability of a particular target region are determined by a set of Gag consensus sequence Figure 1 Gag consensus sequence. Last nucleotides in the theoretically optimal target regions are highlighted. The range of fragments that were analyzed was from 23 to 35-mers. The length of optimal region is shown below the highlighted nucleotide. Only numbers for shortest regions in the sets that correspond to each highlighted nucleotide are shown. ATGCATGGGA AAAAATTCGG 849   23 23 23 24 25 25 25 26  25 25 26 27 28 28 29 30 30 30 30 31 31 32 33 34 34 34 33 30  29 29 29 30 29 29 30 31 32 32   850 TTAAGGCCAG GGGGAAAGAA AAAATATAGA CTAAAACATC TAGTATGGGC AAGCAGGGAG 909   32 33 33 34 34 32 30 29  26 26 25 26 25 26 25 24 24 25 25 26 27 27 27 27 27 28  29 29 30 30 31 32 32 32 32 33  34 35 35 35  35 35 35 31 31 26 24 24 24   910 CTGGAAAGAT TTGCACTTAA CCCTGGCCTT TTAGAAACAT CAGAAGGCTG TCAACAAATA  thresholds, which were found after analysis of the efficiency of oligonucleotides in the experiments performed in vivo and in vitro [13,14]. Oligonucleotides that form stable duplexes with RNA (free energies (∆G°3 7 ) ≤ -30 kcal/ mol) and little self-structure are statistically much more likely to be active than molecules, which form less stable oligonucleotide-RNA hybrids or more stable self-structures. To achieve optimal statistical preference, the values for self-interaction should be (∆G°3 7 ) ≥ -8 kcal/mol for inter-oligonucleotide pairing and (∆G°3 7 ) ≥ -1.1 kcal/mol for intra-molecular pairing [13,14]. Selection of oligonucleotides with these thermodynamic values in the analyzed experiments would have increased the proportion of active oligonucleotides by as much as six fold.

ATGGGTGCGA GAGCGTCAGT ATTAAGCGGG GGAAAATTAG
Oligonucleotide length correlates with the numbers of theoretically optimal RNA targets obtained after conservation and thermodynamic selection procedures. More opti-mal targets can be detected for longer oligonucleotides (Figure 3).
The temperature used for the experiments from which the thermodynamic thresholds were derived, is 37°C. Application of these thresholds in the current work yields hybridization target regions that are optimal for the same temperature. The list of selected regions for oligonucleotide hybridization targeting is good for procedures that involve oligonucleotide RNA pairing at 37°C such as branch DNA detection technology and often reverse transcription. For PCR that requires higher temperature, other thermodynamic thresholds obviously need to be used.
Future improvement of the predictions requires incorporation of target RNA secondary structure considerations. RNA secondary structure based thermodynamic filtration

% of conservation
can be added when information about optimal thresholds for discrimination becomes available from analyses of experimental databases. A further improvement needed is combination of the software elements for different steps of discrimination analysis into one package with a common input and output. This can make the suggested analysis faster and more convenient for a broad range of users.
Whether or not an oligo-molecule will be a good hybridization probe for HIV detection also depends on the specificity of the oligo-RNA interactions. It is possible for an oligo-probe to be in the optimal range of thermodynamic values described above, but still to cross-hybridize with unintended RNA targets (human mRNAs for example). The BLAST analysis tool may be used to discriminate oligo-probes with high potential to cross-hybridize with non-specific targets [20]. We applied the BLAST program for this purpose and ranked oligos that passed all the thermodynamic filters in accordance with frequencies of nonspecific targets in the human and HIV genomes (see ftp:// ftp.ncbi.nih.gov/pub/kondrashov/HIV). To detect a maximum of sequence similarities the word size for the BLAST search was set to the smallest allowed value (-W 7). How-ever, BLAST is unable to generate thermodynamic values that could be linked to the experimental oligo-probe hybridization, and the BLAST score alone does not allow prediction of oligoprobe hybridization specificity and sensitivity.
The consensus oligonucleotides for targets that were selected after rounds of discrimination analysis should be prime candidates for sensitive viral detection procedures or experiments that require efficient oligonucleotide-RNA interaction for the broad range of viral variants. The set of oligonucleotides for gag that remains after homology and thermodynamic selection is 14 times smaller than the initial set of all possible oligonucleotides in this range. Statistical and thermodynamic analyses performed with experimental oligo-probe datasets [13,14] suggests that approximately 70% of the oligonucleotides from this theoretically selected set will demonstrate consistency in hybridization behavior with different HIV-1 representatives of group M viruses. It is likely that the suggested thermodynamic selection rules for finding the most efficient hybridization oligo-probes can also be applied to siRNA design. However, the thermodynamic The number of theoretically optimal RNA targets obtained with each possible length of oligonucleotide, in the range from 23 to 35-mers Figure 3 The number of theoretically optimal RNA targets obtained with each possible length of oligonucleotide, in the range from 23 to 35-mers.

Conclusions
A selection scheme with thermodynamic thresholds and new software were developed in this study. The selection scheme can be used for any purpose where there is a need to design optimal consensus oligonucleotides capable to interact efficiently with hybridization targets common to families of aligned RNA or DNA sequences. It employs creation of a consensus sequence of RNA or DNA from aligned sequence variants and filtering procedures with experimentally determined thermodynamic cut off thresholds. This scheme has been applied to the HIV-1 genome and theoretically optimal RNA target regions for consensus oligonucleotides were found. They can be further used for improvement of oligo-probe based HIV detection techniques.

Consensus sequence and multiple sequence alignments
The consensus sequence for HIV-1 variants (group M) and multiple sequence alignments created by Los Alamos Laboratory staff [21] were used in this work.

Plot of conservation
The average percentage of conservation of each 30 consecutive nucleotides in multiple sequence alignments (based on division of the sum of percentage conservation of each nucleotide by the number of nucleotides) was calculated using the program created for this study.

Evaluation of the potential for intra-molecular and intermolecular self-interaction of DNA oligonucleotides
Calculations of thermodynamic properties of oligonucleotides were done with the help of OligoScreen program from RNAStucture 3.71 package [22].

Evaluation of pairing potentials among DNA oligonucleotides and target RNA variants
A computer program AlignScan was created to evaluate, by ∆G°3 7 calculations, the pairing potential of each DNA or RNA consensus fragment with divergent RNA variants. The program is available for downloading from the site http://gesteland.genetics.utah.edu/HIV/AlignScan.zip. AlignScan requires aligned sequence variants as an input file (FASTA format). A sample input file gag.cgi with the list of aligned sequence variants of HIV-1 gag genes is included for downloading. AlignScan also requires for input the temperature, the range of lengths of oligoprobes, ∆G°T threshold value and percentage of sequences for which this threshold have to be valid. AlignScan creates consensus sequence from aligned sequence variants and ∆G°T values are calculated for all complementary duplexes between each successive fragment of consensus sequence and the corresponding fragment in all sequence variants. The AlignScan output file is in txt format and includes all oligonucleotides of inputted length from the consensus sequence with ∆G°T values for duplexes between each consensus oligonucleotide and the corresponding complementary target sequence variants below the threshold. The AlignScan output file can be further used as input file for the OligoScreen program. The output file can be also opened for analysis by other programs, such as Microsoft Excel.