Shared probe design and existing microarray reanalysis using PICKY
© Chou; licensee BioMed Central Ltd. 2010
Received: 13 November 2009
Accepted: 20 April 2010
Published: 20 April 2010
Large genomes contain families of highly similar genes that cannot be individually identified by microarray probes. This limitation is due to thermodynamic restrictions and cannot be resolved by any computational method. Since gene annotations are updated more frequently than microarrays, another common issue facing microarray users is that existing microarrays must be routinely reanalyzed to determine probes that are still useful with respect to the updated annotations.
PICKY 2.0 can design shared probes for sets of genes that cannot be individually identified using unique probes. PICKY 2.0 uses novel algorithms to track sharable regions among genes and to strictly distinguish them from other highly similar but nontarget regions during thermodynamic comparisons. Therefore, PICKY does not sacrifice the quality of shared probes when choosing them. The latest PICKY 2.1 includes the new capability to reanalyze existing microarray probes against updated gene sets to determine probes that are still valid to use. In addition, more precise nonlinear salt effect estimates and other improvements are added, making PICKY 2.1 more versatile to microarray users.
Shared probes allow expressed gene family members to be detected; this capability is generally more desirable than not knowing anything about these genes. Shared probes also enable the design of cross-genome microarrays, which facilitate multiple species identification in environmental samples. The new nonlinear salt effect calculation significantly increases the precision of probes at a lower buffer salt concentration, and the probe reanalysis function improves existing microarray result interpretations.
PICKY 1.0  introduced several novel approaches to oligo microarray design for large and complex genomes. It went beyond sequence comparison and utilized efficient thermodynamic calculations in a whole genome scale to determine the quality of all probe candidates. It also employed a global optimization strategy to ensure the entire microarray, not just individual probes, are optimized for best sensitivity, specificity and uniformity. Since PICKY can run on all major computing platforms and is computationally efficient, it has been used by several research groups to design their oligo microarrays [2–10]. In particular, the rice microarray project sponsored by National Science Foundation selected PICKY to design its whole genome rice microarrays and the Xanthomonas oryzae dual rice pathogen microarray [10–12]. Independent evaluations of microarray design software indicated that PICKY generates quality probes [13, 14]. The results from a recent quantitative evaluation showed that PICKY-designed microarray probes are robust and consistent throughout a wide range of temperature and sample concentration . In this article we describe some new features added to PICKY since version 1.0 release and their algorithmic details; these new features make the latest PICKY 2.1 more versatile to microarray users.
A primary difficulty of microarray design for large genomes, like those of rice or maize, is their large gene families. Each gene family contains many highly similar genes that are thermodynamically indistinguishable to microarray probes. For example, the largest transposon gene family in rice contains over 9399 sequences that are more than 90% similar to each other over 90% of their entire length . Probes designed to detect genes in this family likely will bind to multiple targets -- the keys are how to rationally determine the intended targets and to also avoid all unintended targets. A subset of genes that share common sequence regions may be rolled into a group to share a common probe; these common regions, however, are often highly similar to regions contained in the other genes. The conflicting needs are to target some common regions for shared probe design and to also avoid the other common regions to prevent cross-hybridizations. Therefore, it is more difficult than single-target probe design to choose a probe that can be shared but does not cross-hybridize with the other nontarget genes existing in the same gene family. The "shared probe" design feature has been added to PICKY; it allows genes in a group to be studied as a whole if not individually. This new feature differs from previous methods to discover "non-unique probes" [17, 18]: the shared probes are not ranked by sequence level comparisons (e.g., their longest common substring with nontargets) but by their thermodynamic comparisons with targets and nontargets, any number of genes can share the same probe, and a gene can share different probes with different sets of genes. Generally, an algorithmic method to design probes shared by a few genes is more desirable than the absence of any probe to detect any of the genes. Note that some genes sharing a probe may also acquire their own unique probes with different target regions; thus, the identify of genes detected by a shared probe is often resolvable when considering multiple probes. The division of gene families into groups by PICKY is entirely computational and may not necessarily reflect their evolutionary distances.
A recently published biochemistry study demonstrated that the salt effect on DNA annealing stability is generally nonlinear, in contrast to the linear salt effect correction commonly used in melting temperature estimation equations . Nonlinear salt effect suggests that the optimal microarray probes at different hybridization salt concentrations may not be the same. A more precise nonlinear salt effect calculation is added to PICKY to enhance the precision of the designed probes under a specific microarray protocol with a known salt concentration. Unlike the linear salt effect calculation which depends only on salt concentration, the nonlinear salt effect calculation also depends on DNA context. Therefore, new code is added to the PICKY design algorithm to keep track of the DNA binding context for each probe candidate.
Another frequently occurring issue is that microarrays can seldom keep up with the rapid progress of sequence annotation updates. For example, the NSF 45K rice microarray was designed using gene models from version 3 of the Rice Genome Annotation . This microarray is still being used by many users, but three newer rice annotations have been released ; the latest version 6 annotation has much improved gene models . It is impractical to keep making new microarrays each time the sequence annotation gets updated. Although most probes on the existing rice microarrays should continue to work, some of the probes may no longer function as expected due to conflicts with the newer gene models. It is possible to add new probes to an oligo microarray for newly discovered gene sequences, but it incurs extra cost that should be minimized. A new feature is added to PICKY to reanalyze existing probes against new sequence information to determine probes that are still valid. Invalid probes can be ignored during subsequent data analysis although they cannot be removed from printed microarrays. Only genes, new or old, that no longer have valid probes to detect them will need new probes to be added during the next microarray print. Therefore, the reanalysis feature reduces the cost of microarray update and maintains microarray quality. This feature may also be used to examine vendor supplied microarrays against users' gene sets to include only valid probes for data analysis, even if different gene sets have been used to design the microarrays.
Shared probe design
To design shared probes, there are three requirements: 1) to be able to efficiently determine common regions among input sequences that are long enough to be targeted by probes; 2) to be able to efficiently distinguish these common regions from other highly similar regions during probe design so they would not be considered as nontargets and prevent probe targeting; and 3) to be able to thoroughly examine the thermodynamic characteristics of probe candidates targeting these common regions to prevent them from cross-hybridizing with nontargets in the whole genome. Although it is relatively easy to achieve the first requirement, it is harder to achieve the second and third requirements because common regions among gene family members often vary in their similarity levels -- a slight difference in their mutual similarity can mean either good targets for shared probes or very detrimental nontargets for shared probes.
Scan the LCP array, and locate groups of suffixes whose LCP values are greater than the minimum probe size.
For each of the groups found, check if any of the following conditions is true:
Suffix(es) from a nontarget input sequence or the reverse complement of any input sequence is in the group.
Either one of the LCP values bordering the group is greater than the maximum allowable length of exact nontarget match.
If either condition above is true, this group is invalid and is skipped. Otherwise, record it in a lookup table indexed by the left-most sequence in the group.
Iterate through all host sequences on the lookup table.
Sort groups on a host sequence based on their start position, and then push the very first group onto a stack.
While the stack is not empty, do one of the following:
If there are no remaining groups for the host sequence, pop the group on top of the stack, process it and add its probe candidates to its probe priority queue.
If the start position of the next group overlaps the right end of the group currently on top of the stack with at least maximum nontarget match length, process the region of the stack-top group up to the beginning of the next group, add its probe candidates to its probe priority queue, and then push the next group onto the stack.
If neither of the above is true, pop the group on top of the stack, process it, and add its probe candidates to its probe priority queue. If the stack is now empty, push the next group onto the stack.
To "process" a group region, we meant that the corresponding region of the group on the host sequence is used to identify probe candidates; this will be discussed shortly. The time to sort groups in step 2 is negligible, because only distinct groups long enough to accommodate probes are recorded and few are expected per each host sequence. The three choices in step 3 always advance some distance on a host sequence or reduce the stack size. Therefore, without counting the time to process a group region, this tracking algorithm takes linear time to run in practice.
The final requirement is to process each group region, identify all shared probe candidates and examine their thermodynamic characteristics, to make sure that only probes with the least possibility to cross-hybridize with nontargets will be chosen. Most of this requirement has been solved in PICKY 1.0 for unique probe design . PICKY 1.0, however, considers all common regions detrimental. Longer ones, as those encountered during shared probe design, will immediately rule out probe candidates targeting those regions. The PICKY algorithm has been modified to accommodate the shared probe design feature. Now, when processing each group region, the same algorithm for unique probe design is used but the known span value of each group prevents group members from being identified as nontargets. Therefore, only true nontargets will be screened during the shared probe design process. This modification finally achieves basic requirement 3.
Although a large gene family is expected to contain many common regions and produce many shared probe candidates, most of the candidates are not useful because they often imperfectly match some highly similar nontargets in the same gene family. For example, among the 61420 gene models used to design the NSF 45K rice microarray , 54761 common regions were identified, which produced 26519 distinct shared probe candidates. After thermodynamic screening, however, only 3214 shared probes can be chosen. This example reveals the importance of thermodynamic screening especially for shared probe design -- a sequence comparison method can identify the 26519 probe candidates but does know many of them may cross-hybridize with nontargets.
As seen in Figure 1, a common region may be represented by several disjoint groups and stacking groups (e.g., region 2). Groups that stack on each other may not necessarily be recorded by the same host sequence and may not all be visited during the processing of any particular host sequence (e.g., groups for regions 1, 4 and 5 stack but are on three different host sequences). Nevertheless, due to the insufficient span value of a covered group, the covered region of the group always triggers a maximum nontarget match length violation and is efficiently skipped during probe design. The covering groups with higher span values will then cover (or may have already covered) the skipped region when their host sequences are being processed. In Figure 1, when host sequence B is being processed, the group 1 region covered by groups 5, 6 and 4 is skipped because the span value of group 1 is 2 and it cannot prevent extra members in the other groups from being identified as nontargets. Thus, only in the early part of group 1, which is not covered by any other groups, can shared probes be designed only for sequences A and B using this host sequence.
Nonlinear salt effect calculation
This new equation uses two terms to correct for salt effects: a linear term which depends on both salt concentration and sequence binding context, and a quadratic term which depends only on salt concentration. The gc in the linear term is the GC content of two binding DNA molecules, and it makes the salt effect calculation context-sensitive. All other terms in the two equations above are explained in the cited literature. PICKY 2.1 incorporates this new equation and offers both equations for microarray design. If the nonlinear equation is chosen, PICKY has to dynamically maintain the GC content in its innermost loop of calculation, thus it runs about 1/3 slower than the time it takes when the linear equation is chosen.
Comparison of design results using the linear and nonlinear salt effect equations at different salt concentrations.
The nonlinear salt effect equation induces the dependence of PICKY probe design on salt concentration. As seen in Figure 4b, the melting temperature differences predicted by this equation for the same 50-mer probe candidates now depend not just on target locations but also on salt concentrations. As a result, the probe sets selected by PICKY at different salt concentrations vary greatly; they are summarized under the "Nonlinear Probes" column in Table 1. At each salt concentration, the "Same Probes" and "Overlap Probes" columns in Table 1 compare the probe sets obtained using the linear and nonlinear equations; they show the number of probes between the two sets that are the same or overlap somewhat. About a quarter probes are still different between the two probe sets even at the salt concentration of 950 mM which is very close to 1 M. It can be seen in Table 1 that the average and medium predicted target melting temperatures of the two probe sets differ within just 1~2°C throughout the salt concentration range, but probes in the two sets only converge at the 1 M standard buffer salt concentration. This is because the nonlinear salt effect equation locally influences individual probe selections under different salt concentrations, although it predicts roughly the same melting temperature average as the linear equation predicts.
Reanalysis of existing probes
PICKY 2.1 incorporates another new feature to map existing probes against any gene sets and evaluate their thermodynamic properties. There are many applications of this new feature, e.g., to evaluate third party microarray design quality, to characterize existing microarrays against newly annotated gene sets, and to determine PCR primer specificity. To efficiently map probes, PICKY computes a prefix index table during the construction of the suffix and LCP arrays of the input sequences. This table divides the suffix array into smaller regions that can be independently searched using a binary search algorithm on suffix array . PICKY tries to first locate a suffix that contains a query probe as a prefix, and then expands from the match site to discover all such suffixes. As stated earlier in explaining the shared probe design algorithm, all suffixes sharing the same probe prefix must be collated into the same group on the suffix array, so the expansion is local and efficient. The complexity of the probe mapping algorithm is expected to be O(m+log n/4x), where m is the probe length, n is the total sequence bases, and x is the index prefix length.
Several different outcomes may result for a probe being mapped: it may not be found to target anything in the input sequences; it may be found to target only the user supplied nontarget sequences; or it may be found to target some sequences but also perfectly match some user supplied nontarget sequences or the reverse-complements of any sequences. In all of these cases, the probe is not considered useful. Once a probe is mapped onto target sequences and is not found to exactly match nontargets, its thermodynamic characteristics are evaluated by PICKY. If its target and nontarget melting temperature difference is less than a minimum value set by the users (e.g., 15°C), then the probe is not specific but may still be usable. Finally, a probe may be found to target multiple sequences, i.e., it can be a shared probe. PICKY will sort the query probes based on their classifications and present them on screen using different color coding to indicate their types. A complication is that some probes may overlap each other on their target sites so their colors might be mixed on the screen display; the textual output generated by PICKY unambiguously describes the type of each query probe disregard whether it overlaps with the other probes or not.
PICKY execution time and result under two different design constraints.
Gene set size
More sensitive design
More relaxed design
4 848 788 bp
2-CPU: 0 h 9 m
4-CPU: 0 h 7 m
11 324 genes
6 022 273 bp
2-CPU: 0 h 11 m
10 237 probes
4-CPU: 0 h 17 m
10 995 probes
9 081 699 bp
2-CPU: 0 h 22 m
2-CPU: 0 h 35 m
10 749 024 bp
2-CPU: 0 h 4 m
4-CPU: 0 h 55 m
12 238 genes
23 015 888 bp
4-CPU: 0 h 22 m
2-CPU: 3 h 2 m
10 749 probes
18 962 genes
32 217 720 bp
2-CPU: 1 h 2 m
12 611 probes
4-CPU: 2 h19 m
15 686 probes
26 236 genes
32 759 147 bp
4-CPU: 0 h 23 m
15 931 probes
2-CPU: 4 h 21 m
22 509 probes
30 935 genes
34 783 951 bp
2-CPU: 0 h 41 m
16 887 probes
4-CPU: 3 h 37 m
25 602 probes
28 952 genes
36 327 482 bp
2-CPU: 1 h 3 m
18 297 probes
2-CPU: 8 h 26 m
26 608 probes
58 579 genes
39 022 169 bp
2-CPU: 0 h 48 m
21 993 probes
2-CPU: 8 h 9 m
48 620 probes
35 284 genes
68 639 601 bp
4-CPU: 0 h 23 m
11 450 probes
4-CPU: 9 h 36 m
28 435 probes
28 205 genes
72 748 721 bp
4-CPU: 0 h 19 m
4-CPU: 8 h 58 m
25 080 probes
61 251 genes
94 194 626 bp
2-CPU: 10 h 1 m
39 094 probes
4-CPU: 10 h 33 m
43 376 probes
PICKY employs a comprehensive thermodynamics analysis to determine the similarity among gene sequences in order to design good microarray probes. This analysis helps PICKY find gene-specific probes [13, 14]. In addition, the equations PICKY uses to determine thermodynamic characteristics are deterministic. The deterministic design approach in PICKY means that the commonly used blocking agents for nonspecific bindings such as human COT-1 DNA, yeast tRNA or salmon sperm DNA are not necessary when using PICKY designed microarrays. Unless their DNA sequences are included in the nontarget gene set given to PICKY during a microarray design, the blocking agents may actually degrade microarray data quality. Although many existing microarrays are not designed by PICKY, users can use PICKY to evaluate them and determine a subset of the probes to trust. In principle, this even works for microarray experiments that have already been completed; their results may be improved by filtering the data through the probe evaluation process using PICKY.
Although the shared probe design feature is developed for large genomes as a remedy when unique probes cannot be found for certain gene families, it is also possible to combine several gene sets and ask PICKY to design shared probes among different species. These shared probes can be used in comparative genomics , metagenomics (i.e., environmental sampling)  or pathogen identification . With its default settings, PICKY minimizes probe sharing by first selecting unique probes, but PICKY can also be instructed to opt for probes that are shared by more target genes. The shared probe set from PICKY can then be minimized to detect several known species  or used in its entirety to detect as yet unknown species that are phylogenetically related to the species whose gene sets were used for the design . For either application, the basic requirement is that a hybridization matrix H is given: the H ij entry is 1 if probe j can detect species i or 0 otherwise. In reality, microarray probes do not exhibit this binary behavior but vary their detection signal strength among different but related species. In this respect, the optimization of the melting temperature difference between targets and nontargets of all PICKY designed probes enhances their binary nature in detecting species (i.e., they can detect all target species with equal certainty but none of the nontargets). A recent quantitative evaluation of PICKY designed probes confirmed this characteristic .
Shared probe design is a versatile feature that can increase detectable genes in large gene families and allow cross-genome microarrays to be developed. Usually, some genes sharing a probe also have their own unique probes; thus, by considering a combination of unique and shared probes we can still identify genes that lack unique probes to detect them. The nonlinear salt effect calculation expands the probe design sensitivity to another dimension, the salt concentration, and precisely matches the designed probes to specific microarray protocols and hybridization conditions. The microarray reanalysis function provides no-cost improvements to microarray data quality by utilizing improved genome annotations; this is not limited to microarrays designed by PICKY. An interesting future project will be to reanalyze some completed microarray projects by filtering their existing data through the PICKY reanalysis function to see if the statistical quality of the filtered data may be improved or some alternative conclusions may be drawn from the results.
Availability and requirements
Project name: The PICKY oligo microarray design and analysis software
Project home page: http://www.complex.iastate.edu
Operating system(s): Windows XP or later, Mac OS X 10.4 or later, and most Linux distributions running on ×86 compatible CPUs.
Programming language: C++
Other requirements: none
License: The PICKY project has never received public support and thus depends on commercial licensing fees to sustain its development and maintenance. Free academic licenses are provided to academic and nonprofit users after they execute the online license request and provide proof of their nonprofit status. Commercial users should contact PICKYhttp://email@example.com to obtain commercial license information.
Any restrictions to use by non-academics: Commercial licenses required.
The author thanks Dr. Owczarzy at Integrated DNA Technologies for suggesting his nonlinear salt effect equation, the NSF rice microarray project team  for choosing PICKY to design their microarrays and performing quality evaluations, and a few PICKY users for providing encouragements and valuable feedbacks. The publication cost of this manuscript was supported by the National Science Foundation grant DBI0850195.
- Chou HH, Hsia AP, Mooney DL, Schnable PS: Picky: oligo microarray design for large genomes. Bioinformatics 2004, 20: 2893–2902. 10.1093/bioinformatics/bth347View ArticlePubMedGoogle Scholar
- Ma J, Skibbe DS, Fernandes J, Walbot V: Male reproductive development: gene expression profiling of maize anther and pollen ontogeny. Genome biology 2008, 9: R181. 10.1186/gb-2008-9-12-r181View ArticlePubMedPubMed CentralGoogle Scholar
- Coblentz FE, Towle DW, Shafer TH: Expressed sequence tags from normalized cDNA libraries prepared from gill and hypodermal tissues of the blue crab, Callinectes sapidus. Comparative Biochemistry And Physiology D-Genomics & Proteomics 2006, 1: 200–208.View ArticleGoogle Scholar
- Taliercio EW, Boykin D: Analysis of gene expression in cotton fiber initials. BMC Plant Biol 2007, 7: 22. 10.1186/1471-2229-7-22View ArticlePubMedPubMed CentralGoogle Scholar
- Udall JA, Flagel LE, Cheung F, Woodward AW, Hovav R, Rapp RA, Swanson JM, Lee JJ, Gingle AR, Nettleton D, et al.: Spotted cotton oligonucleotide microarrays for gene expression analysis. BMC Genomics 2007, 8: 81. 10.1186/1471-2164-8-81View ArticlePubMedPubMed CentralGoogle Scholar
- Sato M, Mitra RM, Coller J, Wang D, Spivey NW, Dewdney J, Denoux C, Glazebrook J, Katagiri F: A high-performance, small-scale microarray for expression profiling of many samples in Arabidopsis-pathogen studies. Plant J 2007, 49: 565–577. 10.1111/j.1365-313X.2006.02972.xView ArticlePubMedGoogle Scholar
- Warr E, Lambrechts L, Koella JC, Bourgouin C, Dimopoulos G: Anopheles gambiae immune responses to Sephadex beads: involvement of anti-Plasmodium factors in regulating melanization. Insect Biochem Mol Biol 2006, 36: 769–778. 10.1016/j.ibmb.2006.07.006View ArticlePubMedGoogle Scholar
- Dong Y, Aguilar R, Xi Z, Warr E, Mongin E, Dimopoulos G: Anopheles gambiae immune responses to human and rodent Plasmodium parasite species. PLoS Pathog 2006, 2: e52. 10.1371/journal.ppat.0020052View ArticlePubMedPubMed CentralGoogle Scholar
- Millard A, Tiwari B: Oligonucleotide microarrays for bacteriophage expression studies. In Bacteriophages: Methods and Protocols. Edited by: Clokie M. Kropinski A: Humana Press; 2008.Google Scholar
- Seo YS, Sriariyanun M, Wang L, Pfeiff J, Phetsom J, Lin Y, Jung KH, Chou HH, Bogdanove A, Ronald P: A two-genome microarray for the rice pathogens Xanthomonas oryzae pv. oryzae and X. oryzae pv. oryzicola and its use in the discovery of a difference in their regulation of hrp genes. BMC Microbiol 2008, 8: 99. 10.1186/1471-2180-8-99View ArticlePubMedPubMed CentralGoogle Scholar
- Jung KH, Dardick C, Bartley LE, Cao P, Phetsom J, Canlas P, Seo YS, Shultz M, Ouyang S, Yuan Q, et al.: Refinement of light-responsive transcript lists using rice oligonucleotide arrays: evaluation of gene-redundancy. PLoS ONE 2008, 3: e3337. 10.1371/journal.pone.0003337View ArticlePubMedPubMed CentralGoogle Scholar
- NSF Rice Oligonucleotide Array Project Website[http://www.ricearray.org/]
- Paredes CJ, Senger RS, Spath WS, Borden JR, Sillers R, Papoutsakis ET: A general framework for designing and validating oligomer-based DNA microarrays and its a application to Clostridium acetobutylicum. Applied And Environmental Microbiology 2007, 73: 4631–4638. 10.1128/AEM.00144-07View ArticlePubMedPubMed CentralGoogle Scholar
- Lemoine S, Combes F, Le Crom S: An evaluation of custom microarray applications: the oligonucleotide design challenge. Nucleic acids research 2009, 37: 1726–1739. 10.1093/nar/gkp053View ArticlePubMedPubMed CentralGoogle Scholar
- Chou HH, Trisiriroj A, Park S, Hsing YI, Ronald PC, Schnable PS: Direct calibration of PICKY-designed microarrays. BMC bioinformatics 2009, 10: 347. 10.1186/1471-2105-10-347View ArticlePubMedPubMed CentralGoogle Scholar
- Yuan Q, Ouyang S, Wang A, Zhu W, Maiti R, Lin H, Hamilton J, Haas B, Sultana R, Cheung F, et al.: The institute for genomic research Osa1 rice genome annotation database. Plant Physiol 2005, 138: 18–26. 10.1104/pp.104.059063View ArticlePubMedPubMed CentralGoogle Scholar
- Rahmann S: Fast large scale oligonucleotide selection using the longest common factor approach. Journal of bioinformatics and computational biology 2003, 1: 343–361. 10.1142/S0219720003000125View ArticlePubMedGoogle Scholar
- Rahmann S: Rapid large-scale oligonucleotide selection for microarrays. Algorithms in Bioinformatics, Proceedings 2002, 2452: 434–434. full_textView ArticleGoogle Scholar
- Owczarzy R, You Y, Moreira BG, Manthey JA, Huang L, Behlke MA, Walder JA: Effects of sodium ions on DNA duplex oligomers: improved predictions of melting temperatures. Biochemistry 2004, 43: 3537–3554. 10.1021/bi034621rView ArticlePubMedGoogle Scholar
- Ouyang S, Zhu W, Hamilton J, Lin H, Campbell M, Childs K, Thibaud-Nissen F, Malek RL, Lee Y, Zheng L, et al.: The TIGR Rice Genome Annotation Resource: improvements and new features. Nucleic Acids Res 2007, 35: D883–887. 10.1093/nar/gkl976View ArticlePubMedPubMed CentralGoogle Scholar
- Rice Genome Annotation Project[http://rice.plantbiology.msu.edu/]
- Gusfield D: Algorithms on Strings, Trees and Sequences. Cambridge, United Kingdom: Cambridge University Press; 1997.View ArticleGoogle Scholar
- Kasai T, Lee G, Arimura H, Arikawa S, Park K: Linear-time longest-common-prefix computation in suffix arrays and its applications. In Combinatorial Pattern Matching, 12th Annual Symposium, Jerusalem, Israel. Edited by: Amir A, Landau GM. Springer Verlag, Berlin; 2001. Lecture Notes in Computer Science Lecture Notes in Computer ScienceGoogle Scholar
- Breslauer KJ, Frank M, Blocker H, Marky LA: Predicting DNA duplex stability from the base sequence. Biochemistry 1986, 83: 3746–3750.Google Scholar
- Rychlik W, Spencer WJ, Rhoads RE: Optimization of the annealing temperature for DNA amplification in vitro. Nucleic Acids Research 1990, 18: 6409–6412. 10.1093/nar/18.21.6409View ArticlePubMedPubMed CentralGoogle Scholar
- Allawi HT, SantaLucia JJ: Thermodynamics and NMR of internal G*T mismatches in DNA. Biochemistry 1997, 36: 10581–10594. 10.1021/bi962590cView ArticlePubMedGoogle Scholar
- SantaLucia JJ, Allawi HT, Seneviratne PA: Improved nearest-neighbor parameters for predicting DNA duplex stability. Biochemistry 1996, 35: 3555–3562. 10.1021/bi951907qView ArticlePubMedGoogle Scholar
- Bommarito S, Peyret N, John SantaLucia J: Thermodynamic parameters for DNA sequences with dangling ends. Nucleic Acids Research 2000, 28: 1929–1934. 10.1093/nar/28.9.1929View ArticlePubMedPubMed CentralGoogle Scholar
- ISU Complex Computation Lab[http://www.complex.iastate.edu]
- Schneider TD, Stormo GD, Gold L, Ehrenfeucht A: Information content of binding sites on nucleotide sequences. J Mol Biol 1986, 188: 415–431. 10.1016/0022-2836(86)90165-8View ArticlePubMedGoogle Scholar
- Sebat JL, Colwell FS, Crawford RL: Metagenomic profiling: microarray analysis of an environmental genomic library. Applied and environmental microbiology 2003, 69: 4927–4934. 10.1128/AEM.69.8.4927-4934.2003View ArticlePubMedPubMed CentralGoogle Scholar
- Wang D, Coscoy L, Zylberberg M, Avila PC, Boushey HA, Ganem D, DeRisi JL: Microarray-based detection and genotyping of viral pathogens. Proceedings of the National Academy of Sciences of the United States of America 2002, 99: 15687–15692. 10.1073/pnas.242579699View ArticlePubMedPubMed CentralGoogle Scholar
- Klau GW, Rahmann S, Schliep A, Vingron M, Reinert K: Optimal robust non-unique probe selection using Integer Linear Programming. Bioinformatics 2004, 20(Suppl 1):i186–193. 10.1093/bioinformatics/bth936View ArticlePubMedGoogle Scholar
- Schliep A, Rahmann S: Decoding non-unique oligonucleotide hybridization experiments of targets related by a phylogenetic tree. Bioinformatics 2006, 22: e424–430. 10.1093/bioinformatics/btl254View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.