- Open Access
GENOMEMASKER package for designing unique genomic PCR primers
© Andreson et al; licensee BioMed Central Ltd. 2006
- Received: 19 September 2005
- Accepted: 27 March 2006
- Published: 27 March 2006
The design of oligonucleotides and PCR primers for studying large genomes is complicated by the redundancy of sequences. The eukaryotic genomes are particularly difficult to study due to abundant repeats. The speed of most existing primer evaluation programs is not sufficient for large-scale experiments.
In order to improve the efficiency and success rate of automatic primer/oligo design, we created a novel method which allows rapid masking of repeats in large sequence files, for example in eukaryotic genomes. It also allows the detection of all alternative binding sites of PCR primers and the prediction of PCR products. The new method was implemented in a collection of efficient programs, the GENOMEMASKER package. The performance of the programs was compared to other similar programs. We also modified the PRIMER3 program, to be able to design primers from lowercase-masked sequences.
The GENOMEMASKER package is able to mask the entire human genome for non-unique primers within 6 hours and find locations of all binding sites for 10 000 designed primer pairs within 10 minutes. Additionally, it predicts all alternative PCR products from large genomes for given primer pairs.
- Word Length
- Primer Binding Site
- Template Sequence
- Suffix Array
- Index File
Microarrays and other genomic technologies allow the testing of thousands of genomic regions from each sample. Most of these methods require PCR amplification to achieve sufficiently strong signals. Therefore, there is a growing need for automatic oligo design and PCR primer design methods. There is always a certain chance that the selected PCR primers have alternative binding sites on the template DNA. It is expected that numerous alternative binding sites of the PCR primers may cause failures in PCR. However, the exact relationship between the number of binding sites in the genome and the success rate is not known. The computational prediction of these unwanted interactions would help to increase the quality of microarrays and genomic PCR and to reduce the cost of related experiments.
A large variety of PCR primer design programs exist. Nevertheless, only few of them allow some kind of testing for primer uniqueness and/or testing for alternative products from the same template. Some programs use repeat libraries to avoid primer design from repeated regions. For example, a program called PC-RARE uses 8-mer frequency disparity at the 3' end of primers to avoid the risk of non-specific binding of primers . FAST-PCR program allows homology search against the custom list of repeated sequences . OLIGO6  uses libraries that contain all the frequent 6-mer to exclude the primers that contain highly repeated motifs. A well-known PRIMER3  can be used with the collection of repeats (repeat library) to avoid non-specific binding of primers. However, the primers are compared to a repeat library using the Smith-Waterman algorithm , which makes use of large libraries for the design of a large number of primers slow and therefore unpractical.
Some programs use suffix trees or suffix arrays to ensure the uniqueness of selected oligos. PICKY is an oligo design software that allows one to create unique hybridisation oligos . This program uses a suffix-tree based algorithm to test the uniqueness of oligo candidates in a user-given sequence set. Rahmann presents a method that uses the longest common substring as a specificity measure for candidate oligos . This algorithm is based on a suffix array with additional information that is efficient both in terms of memory usage and running time to rank all candidate oligos according to their specificity.
The masking of repeats on the template DNA is another common approach that is used to avoid non-specific binding. DUST  and TandemRepeatsFinder  are commonly used for masking simple (short) repeat motifs. RepeatMasker is a universal program that is used for masking out several kinds of repeats and therefore mostly used for this kind of sequence analysis . Similarly, BLAST  can be used to mask the non-unique regions of the genome [12, 13]. Onodera and Melcher have found that unfavored and preferred 3' end triplets existed in different viral genome sequences . They recommend a scoring system incorporating empirical frequencies of different 3' end triplets and that information may be used in masking primer candidates with poor success rate in other genomic sequences.
Another widely used method related to the success of the PCR is prediction of all PCR products from the genome sequence. This process is known as electronic PCR (e-PCR) or virtual PCR (vPCR). Electronic PCR was first developed to detect the presence of sequence tagged sites (STS) in template DNA . Virtual PCR is a web based service that predicts PCR products from different genomes . SPCR is a recently published program that can help the user to choose a PCR primer pair giving the least possible non-targeted products . Its algorithm is based on the hypothesis that the annealing of a primer to a template is an information transfer process. We will use the term 'e-PCR' for all similar computational predictions of PCR products throughout this paper. The process of e-PCR is typically performed with the help of sequence alignment software, which counts the number of identical or nearly identical matches between the primer and the template DNA. The BLAST program is most frequently used for this purpose in multiple applications [18–20]. Nevertheless, the speed of BLAST is not sufficient for e-PCR in large eukaryotic genomes with large number of primers. High-speed methods applicable to large-scale problems are becoming more important with the increasing number of genome sequences. The speed can be increased by using MEGABLAST , BLAT  or SSAHA  which are specifically designed for large scale sequence search and alignment. A primer design program called MuPlex includes BLAT software to align primer candidates against genomic DNA . These methods are relatively fast, but unfortunately all of them require specific parsers to count all primer binding sites and to find all PCR products on the template DNA. PRIMEX , on the other hand is specifically designed for testing oligos and counting primer binding sites from genomic DNA. Another recently reported program, me-PCR , is designed for the detection of locations of STS markers in the human genome and is helpful for the detection of PCR products of any type. There is a similar program called In-Situ PCR (isPCR)  created by Jim Kent that is also designed for predicting possible PCR products that two primers could produce.
In this paper, we describe a novel and efficient method, which masks large sequence files for repeats, performs a rapid prediction of all binding sites of PCR primers and predicts possible PCR products. The efficiency of our program is compared to several existing methods.
Components of the software package
The GENOMEMASKER package can be divided into two separate parts: 1. A repeat masking application (GenomeMasker with auxiliary programs), 2. An e-PCR application for predicting primer binding sites and PCR products (GenomeTester with auxiliary programs). They contain the following executables:
1. GenomeMasker application
glistmaker – creates a so called blacklist of over-represented (occurring more frequently than user-defined threshold) primer binding sites in a given genome.
gmasker – performs a binary search in the blacklist for each studied FastA sequence and masks the words present in the blacklist.
gm_primer3 – a modified PRIMER3 program that is able to use gmasker output for primer design.
2. GenomeTester application
gindexer – creates binary index files containing locations of all the predicted binding sites in a given genome.
gtester – performs a binary search in index files for each primer to locate all their binding sites in a given genome.
gt2multiplex – extracts nucleotide sequences of all PCR products from template sequence using the output information of gtester.
The GenomeMasker algorithm
The first part of the package – GenomeMasker application – contains programs required for the masking of repeated primer binding sites on the template DNA. The program glistmaker reads through the template sequence(s) and counts the number of occurrences of each word of user-defined length. After that it creates a blacklist containing only over-represented words, encoded into 32 bit integers. The encoding is done by allocating two consecutive bits for each nucleotide in a word. Thus the maximum word length in current implementation is 16 nucleotides. We define the over-represented word as the sequence that occurs in the given genome more times than a user-given cutoff (eg. 1, 2, 3, etc.). The entire list of encoded words is sorted for faster access. The input for glistmaker is one or more FastA files (either DNA template or genome/chromosome sequences).
The program gmasker uses blacklist file as a reference to mask the sequence file in FastA format. It iterates over the whole input sequence with step 1 nucleotide and checks forward word, its reverse complement or both against the blacklist. If a given word is in the blacklist, the corresponding word in the template sequence is masked. Only one nucleotide at the 3' end of the matching word is masked by default, because this should be sufficient to avoid most of the low-success PCR primers. Nevertheless, the user can define how many nucleotides will be masked by gmasker with a special option. The output of the program is a FastA file identical to input file, except that it contains masked nucleotides (uppercase letter changed to lowercase letter) in regions where primers should not be selected. Instead of uppercase-lowercase masking, any user-defined character can also be used for masking by gmasker. However, the lowercase masking is recommended because it maintains the sequence information at masked sites and allows subsequent primer design from the masked sequence.
The program gm_primer3 is a modification of PRIMER3 that is able to use the lowercase-masked sequence for primer design. The overall functionality and algorithm of the program is the same as in the original PRIMER3, but we have added a new filtering feature that rejects the primer candidates with lowercase letters in 3' end. However, gm_primer3 can be used to design primers from templates masked by other methods as well.
The GenomeTester algorithm
The second part of the package, the GenomeTester application, contains programs for predicting primer binding sites and PCR products in long template DNA sequences, e.g. eukaryotic genomes. The program gindexer is needed to create index files for gtester to work. It creates 4 different index files for each sequence file – all words starting with A, C, T and G nucleotide. For example, four index files will be created for 24 human chromosomes creating a total number of 96 files. The creation of a separate index file for each nucleotide is necessary to reduce the file size and memory usage of gtester. In an index file we keep 8 bytes of data for each occurrence of each word in the whole genomic sequence. The first 4 bytes represent the word, encoded as 32 bit integer, 2 bits per nucleotide. The last 4 bytes contain the position of the current occurrence of the word in the genomic sequence. The word length can be specified by the user within the range between 8 and 16 nucleotides. The entire file is sorted by the encoded word (by first 4 bytes) in order to allow high speed binary search with the gtester. The sequence files of chromosomes in FastA format are used as input of gindexer.
The program gtester works in four steps. During the first step, it creates an array structure of primer pairs. For each primer pair, it takes a word of given length (the same length as was used for creation of index files) from the 3' end of the primer and creates 2 words – one original and one reverse complement. In step 2, a binary search is performed with all words in the array against index files. Another list with the locations of the binding sites is created in step 3. Finally, the program finds all PCR products that are possibly synthesized by the given primer pair hybridizing on both sense and antisense chains of the DNA. PCR products generated by a single primer are also considered. For both parts of the package, primer binding sites can be modelled with a custom, user-defined word length.
Advantages of our implementation
The speed of the programs in this package is achieved by extensive pre-processing of the genomic data. Both applications, GenomeMasker and GenomeTester, require preprocessed files to work efficiently. During the creation of these files all the locations in the genome are counted, sorted and recorded in a binary format. The blacklist of GenomeMasker contains the list of all over-represented binding sites. The index files of GenomeTester contain a list of all binding sites together with their location in the genome. There are two reasons why GenomeTester and GenomeMasker are faster than most other similar applications. First, both of them use the fast binary search algorithm which runs in O(log n) time. The binary search algorithm  can be briefly explained as follows. The search itself begins by examining the value in the center of the list (index). These values in the list are sorted, so the program knows whether the value occurs before or after the center value. Then the program searches through the correct half in the same way as before. Those cycles will be repeated until the searched value is found or until there are no more values to check. The second advantage of our programs is an on-demand memory-mapping technique that allows us to achieve optimal speed for analyzing both small and large input datasets. The programs gmasker and gtester also use an on-demand memory-mapping technique. Instead of pre-reading the entire index into memory they only read the requested parts (positions in the middle of intervals). Those parts will stay in a memory cache and thus the search speed per string will increase gradually when using more search strings, as more of the index will need to be read into the memory.
Parameters for comparing different programs
All software programs were tested on assembled chromosome sequences derived from ENSEMBL database Human 19.34 (NCBI Build 34). The computational performance tests described here were obtained by running all the programs on a 2.66 GHz Intel Xeon™ processor machine with 6 GB of RAM.
RepeatMasker (version 2004/03/06) was used with different sensitivity parameters -s, -q and -qq. -s means "slow search", which is 0–5% more sensitive and 2–3 times slower than default. -q is a "quick search", 5–10% less sensitive and 2–5 times faster than default. -qq is a "rush job", about 10% less sensitive and 4–>10 times faster than default. We used RepBase Update  8.12 library (6 March 2004) of repeated motifs in human genome. DUST was used with default parameters. TandemRepeatFinder was used with alignment parameters (match, mismatch, indels) 2, 7 and 7, minimum alignment score to report repeat 50 and maximum period size 500. GenomeMasker blacklist was created with word length 12 and word length 16 and with over-represented cutoffs 1000 and 10 respectively. The masking program gmasker was used with masking letter parameter 'l' (lower-case masking) and masking type parameter 'target 500 501'.
The SSAHA indexes were created with the word length parameter (-wl) 10 and step length (-sl) 1. SSAHA searches were performed with match lengths (-ml) 16. PRIMEX indexes were created with default parameters and word length (-wl) 10. The number of mismatches allowed in each lookup table word (-m1) and the number of mismatches allowed in the entire query (-m2) were set to 0 and 1 respectively. Me-PCR was executed with parameters: word size (W) 11, number of mismatches allowed (N) 0, margin (M) 500 and default PCR size (Z) 240. The parameters as word size (-W) 12 and maximal total length of queries for a single search (-M) 16 were used for MEGABLAST. All results were parsed with word length 16 (32 bits for BLAST and MEGABLAST). The maximum size of the PCR product (-maxSize) was set to 1000 bp with stand-alone isPCR. GenomeTester indexes were created with word length 16 and the program gtester was used with default parameters.
The GenomeMasker package
Predicting the number and location of PCR primer binding sites in large genomes can pose computational challenges. Therefore we have designed a software package GENOMEMASKER that helps to achieve a fully automatic sequence masking and PCR primer testing for large genomic applications. The first part of the package – GenomeMasker – is designed to mask all repeated primer binding sites in the template sequence to avoid selection of such primers. The essence of sequence masking is in finding and marking sequence regions with specific properties, e.g. repeated regions of sequence. Masked nucleotides are typically replaced by character 'X' or converted to a lowercase character. Unlike the widely used masking program RepeatMasker, GenomeMasker masks only the 3'-terminal nucleotide of each repeated word, which is sufficient to avoid primer design from the repeated region. To be able to use that masked sequence for designing unique PCR primers, the PRIMER3 program was modified to distinguish between upper- and lowercase letters in a masked template file. If a primer candidate ends with a lower-case letter it will be rejected by PRIMER3 and unique primer will be designed from the remaining candidates. However, other primer design methods that can use masked sequence could also work with the GenomeMasker output file.
Another important test for genomic PCR applications is the prediction of the number of all possible PCR products, which a given primer pair can generate from a given genome. This can effectively be done by the second part of our package called GenomeTester. GenomeTester counts and locates all potential binding sites of the PCR primer pair in the genome and predicts the location of all PCR products that could be generated with these primers. These two main methods – GenomeMasker and GenomeTester – can be used independently.
Our software models the PCR primer binding site as 100% identical match to a continuous string (word) from primer's 3' end with a fixed length. User defined length in the range of 8 and 16 nucleotides can be used. The bindings with mismatches are not modeled for several reasons. Firstly, for each 16-mer oligonucleotide there are 48 variants with a single mismatch. Counting the number of occurences and keeping track of genomic location of all these variants would increase the computing time significantly. Secondly, it is not known whether counting mismatch containing primer binding sites helps to predict the PCR success rate better than counting full-match primer binding sites. Until the relationship between the number of mismatched binding sites and PCR success is known we prefer not to use mismatched binding sites.
Workflow of the program package
The creation of either a blacklist for GenomeMasker or the indexes for GenomeTester from the entire human genome takes approximately three hours on a Linux server with 2.4 GHz Xeon processor, at least 2 GB of RAM and SCSI disks. Once the preprocessing is done, the masking, primer design and primer testing steps are extremely fast. The algorithmic details are more thoroughly described in the chapter "Implementation and methods".
GenomeMasker: comparison of repeat masking methods
The effect of different repeat-masking methods onto PCR primer design.
PRIMER3 + No masking
PRIMER3 + DUST
PRIMER3 + Tandem
PRIMER3 + repeat library
PRIMER3 + Repeat Masker -qq
GM_PRIMER3 + Genome
Masker -wl 16
was not possible (no primers)
>10 predicted binding sites for
at least one primer of the primer pair
GenomeTester: comparison between e-PCR methods
Although the masking of template sequences with GenomeMasker avoids low-quality primer design, some primer pairs may still produce two or more alternative PCR products which should be avoided. Also, users may have existing primer pairs that they would like to evaluate against the given genome for the number of binding sites. Therefore, it may be necessary to perform a search against the entire genome for primer binding sites and detect the location of possible products. Such searches are typically done with sequence homology programs (eg. BLAST) or with dedicated e-PCR programs (eg. me-PCR). The computation time and the memory requirements of e-PCR programs are important factors, particularly when dealing with large datasets like eukaryotic genomes. To compare the speed of different methods we created five datasets, consisting of 1, 10, 100, 1000 and 10000 randomly selected primer pairs. All datasets were subjected to the e-PCR against the human genome with seven different programs and the computing time of the results was recorded.
Physical memory requirement for the e-PCR procedure on human chromosomes is approximately 1GB for SSAHA, 500 MB for GenomeTester and isPCR and ca 300 MB for other methods. Please note that all the programs shown in Figure 4 were compared in standalone mode. The performance of most of the programs can be increased by using server-client architecture (with all the genomic data stored in server RAM).
Large-scale genomic studies often require amplification of genomic DNA by PCR. Therefore, automatic PCR primer design is one of the critical steps in such studies. We have created a novel method for masking repeated regions in sequences. It allows masking of the entire template DNA before primer design to avoid the consideration of poor primer candidates. GenomeMasker is able to identify and mask repeating words that have not been included in the current repeat libraries. This, combined with a specific 3'-end masking technique, allowed us to achieve a more sensitive masking than the existing approaches. Programs like DUST and TandemRepeatsFinder are designed for masking short repeat motifs and are faster than our method, but their ability to find different types of longer repeats is lower, as demonstrated in Table 1. Masking repeats may help to increase PCR success rate, but this might not be sufficient for some applications. For example, genotyping  requires that a unique PCR product must be generated from the human genome. Therefore, additional e-PCR step is still required after candidate primer pair is selected, which in our case is done with the GenomeTester application. The method presented in this paper offers the possibility to increase specificity and speed of finding non-unique words by using word indexing and binary search algorithm compared to other repeat-masking and e-PCR methods. The package also contains an optional primer design program that can be replaced with desired software. Therefore, the main advantage of our method is speed, unique masking style and the possibility to quickly locate alternative PCR primer binding sites and products.
All programs in our test must read the genome sequence information from a hard drive. This information is typically pre-processed and saved into a database or an index in a specific binary format. Databases or indexes that need to be read from hard drive are rather large. Total size of indexes for the human chromosomes is approximately 1 GB for BLAST and MEGABLAST, 11 GB for SSAHA and 22 GB for GenomeTester. When executed, most programs initially read all database/index data into RAM, which speeds up the search, particularly for large datasets. However, for searching only a few primer pairs the entire database/index may not be necessary. The pre-processed and sorted index of GenomeTester allows identifying locations of the given primer pair by reading only a fraction of the index file.
Our package is based on finding the exact matches between sequences. Newer methods  and algorithms  allow considering mismatches when comparing oligos against large genomic sequence. Whereas it would be possible to enhance our programs so that the bindings with single mismatches would be taken into account, the current implementation would require about 45–48 times more computing time for both gtester and gmasker. The number comes from the fact that there are 3 mismatched variants per each nucleotide within a given word. For example, 16-mer oligonucleotide would have 48 different single-mismatch oligos. We could forbid the mismatch at 3'-end of oligo, but nevertheless the number of different variants to search is huge. The problem is particularly serious with the GenomeTester which keeps track of the location of each potential binding site. Thus, the memory requirement for storing all the locations would be enormous. Also, there is no good model to estimate how different mismatches should be weighted. For example, it is usually expected that mismatches near the 3' end disturb primer binding and subsequent replication matter much more than the ones near 5' end. The exact dependence between mismatch location and binding strength is not known. Additionally, the strength of the mismatched DNA duplex is somewhat weaker than the duplex with exact matches . Thus, the relative influence of mismatched binding sites to the success rate of PCR compared to fully complementary binding sites is poorly understood and their importance in predicting PCR success cannot be estimated without extensive experimental studies.
However, we have examined the correlation between the number of perfect match binding sites and the number of binding sites with single mismatch. We have taken all strings with length 12 nucleotides from the human genome. For each oligonucleotide, two things were counted 1) the number of binding sites in the genome as 100% identical match and 2) the number of binding sites in the genome containing single mismatch. These two numbers were plotted against each other and a strong linear correlation was observed (with correlation coefficient r = 0.938). As these two values are strongly correlated we would expect them to be almost equally efficient predictors of PCR success. Thus, counting of mismatched binding sites is not likely to give a significant improvement for the prediction of PCR success.
We have created a novel method for masking repeated regions in sequences, detecting all PCR primer binding sites and possible PCR products from the human genome. The GENOMEMASKER package is suitable for researchers who need to evaluate or design unique PCR primers in genomic scale. It is able to mask the entire human genome for non-unique primers within 6 hours and find locations of all binding sites for 10 000 designed primer pairs within 10 minutes.
Project name: GENOMEMASKER package
Project homepage: http://bioinfo.ebc.ee/download/
Operating system: Unix/Linux
Programming language: C/C++
Other requirements: None
License: The package is freely available to academic users
Any restrictions to use by non-academics: Licence needed
This work was supported by the Estonian Ministry of Education and Research grant no. 0182649s04 and partially by grants no. 42/2002 and EU19730 from Enterprise Estonia. The testing of the programs was performed on a dedicated computer farm supported by grant no. 03-03.01-10 from the Estonian Information Technology Foundation. The authors thank Tõnu Möls, Ulvi Gerst, Priit Palta and Tõnis Org for critical reading of the manuscript, and Katre Palm and Signe Sumerik for valuable help with English grammar.
- Chenal V, Souque P, Markovits A, Griffais R: Choosing highly specific primers for the polymerase chain reaction using the octomer frequency disparity method: application to Chlamydia trachomatis. Gene 1996, 176: 97–101. 10.1016/0378-1119(96)00226-0View ArticlePubMedGoogle Scholar
- Kalendar R, Tanskanen J, Immonen S, Nevo E, Schulman AH: Genome evolution of wild barley (Hordeum spontaneum) by BARE-1 retrotransposon dynamics in response to sharp microclimatic divergence. Proc Natl Acad Sci U S A 2000, 97: 6603–6607. 10.1073/pnas.110587497PubMed CentralView ArticlePubMedGoogle Scholar
- Rychlik W: Selection of primers for polymerase chain reaction. Mol Biotechnol 1995, 3: 129–134.View ArticlePubMedGoogle Scholar
- Rozen S, Skaletsky H: Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol 2000, 132: 365–386.PubMedGoogle Scholar
- Smith TF, Waterman MS: Identification of common molecular subsequences. J Mol Biol 1981, 147: 195–197. 10.1016/0022-2836(81)90087-5View ArticlePubMedGoogle Scholar
- Chou HH, Hsia AP, Mooney DL, Schnable PS: Picky: oligo microarray design for large genomes. Bioinformatics 2004, 20: 2893–2902. 10.1093/bioinformatics/bth347View ArticlePubMedGoogle Scholar
- Rahmann S: Algorithms for Probe Selection and DNA Microarray Design. Berlin, University of Berlin; 2004:208.Google Scholar
- DUST source code[ftp://ftp.ncbi.nlm.nih.gov/pub/tatusov/dust/]
- Benson G: Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 1999, 27: 573–580. 10.1093/nar/27.2.573PubMed CentralView ArticlePubMedGoogle Scholar
- RepeatMasker homepage[http://www.repeatmasker.org]
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25: 3389–3402. 10.1093/nar/25.17.3389PubMed CentralView ArticlePubMedGoogle Scholar
- van Hijum SA, de Jong A, Buist G, Kok J, Kuipers OP: UniFrag and GenomePrimer: selection of primers for genome-wide production of unique amplicons. Bioinformatics 2003, 19: 1580–1582. 10.1093/bioinformatics/btg203View ArticlePubMedGoogle Scholar
- Rouillard JM, Zuker M, Gulari E: OligoArray 2.0: design of oligonucleotide probes for DNA microarrays using a thermodynamic approach. Nucleic Acids Res 2003, 31: 3057–3062. 10.1093/nar/gkg426PubMed CentralView ArticlePubMedGoogle Scholar
- Onodera K, Melcher U: Selection for 3' end triplets for polymerase chain reaction primers. Mol Cell Probes 2004, 18: 369–372. 10.1016/j.mcp.2004.05.007View ArticlePubMedGoogle Scholar
- Schuler GD: Sequence mapping by electronic PCR. Genome Res 1997, 7: 541–550.PubMed CentralPubMedGoogle Scholar
- Lexa M, Horak J, Brzobohaty B: Virtual PCR. Bioinformatics 2001, 17: 192–193. 10.1093/bioinformatics/17.2.192View ArticlePubMedGoogle Scholar
- Cao Y, Wang L, Xu K, Kou C, Zhang Y, Wei G, He J, Wang Y, Zhao L: Information theory-based algorithm for in silico prediction of PCR products with whole genomic sequences as templates. BMC Bioinformatics 2005, 6: 190. 10.1186/1471-2105-6-190PubMed CentralView ArticlePubMedGoogle Scholar
- Xu D, Li G, Wu L, Zhou J, Xu Y: PRIMEGENS: robust and efficient design of gene-specific probes for microarray analysis. Bioinformatics 2002, 18: 1432–1437. 10.1093/bioinformatics/18.11.1432View ArticlePubMedGoogle Scholar
- Weckx S, De Rijk P, Van Broeckhoven C, Del-Favero J: SNPbox: a modular software package for large-scale primer design. Bioinformatics 2005, 21: 385–387. 10.1093/bioinformatics/bti006View ArticlePubMedGoogle Scholar
- Rouchka EC, Khalyfa A, Cooper NG: MPrime: efficient large scale multiple primer and oligonucleotide design for customized gene microarrays. BMC Bioinformatics 2005, 6: 175. 10.1186/1471-2105-6-175PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang Z, Schwartz S, Wagner L, Miller W: A greedy algorithm for aligning DNA sequences. J Comput Biol 2000, 7: 203–214. 10.1089/10665270050081478View ArticlePubMedGoogle Scholar
- Kent WJ: BLAT--the BLAST-like alignment tool. Genome Res 2002, 12: 656–664. 10.1101/gr.229202. Article published online before March 2002PubMed CentralView ArticlePubMedGoogle Scholar
- Ning Z, Cox AJ, Mullikin JC: SSAHA: a fast search method for large DNA databases. Genome Res 2001, 11: 1725–1729. 10.1101/gr.194201PubMed CentralView ArticlePubMedGoogle Scholar
- Rachlin J, Ding C, Cantor C, Kasif S: MuPlex: multi-objective multiplex PCR assay design. Nucleic Acids Res 2005, 33: W544–7. 10.1093/nar/gki377PubMed CentralView ArticlePubMedGoogle Scholar
- Lexa M, Valle G: PRIMEX: rapid identification of oligonucleotide matches in whole genomes. Bioinformatics 2003, 19: 2486–2488. 10.1093/bioinformatics/btg350View ArticlePubMedGoogle Scholar
- Murphy K, Raj T, Winters RS, White PS: me-PCR: a refined ultrafast algorithm for identifying sequence-defined genomic elements. Bioinformatics 2004, 20: 588–590. 10.1093/bioinformatics/btg466View ArticlePubMedGoogle Scholar
- In-Situ PCR (isPCR) source code[http://www.soe.ucsc.edu/~kent/src/unzipped/isPcr/]
- Knuth D: The Art of Computer Programming. In Sorting an Searching. Volume 3. Third edition., Addison-Wesley; 1997:409–426.Google Scholar
- Jurka J: Repbase update: a database and an electronic journal of repetitive elements. Trends Genet 2000, 16: 418–420. 10.1016/S0168-9525(00)02093-XView ArticlePubMedGoogle Scholar
- Kaplinski L, Andreson R, Puurand T, Remm M: MultiPLX: automatic grouping and evaluation of PCR primers. Bioinformatics 2005, 21: 1701–1702. 10.1093/bioinformatics/bti219View ArticlePubMedGoogle Scholar
- Syvanen AC: Accessing genetic variation: genotyping single nucleotide polymorphisms. Nat Rev Genet 2001, 2: 930–942. 10.1038/35103535View ArticlePubMedGoogle Scholar
- Zheng J, Close TJ, Jiang T, Lonardi S: Efficient selection of unique and popular oligos for large EST databases. Bioinformatics 2004, 20: 2101–2112. 10.1093/bioinformatics/bth210View ArticlePubMedGoogle Scholar
- SantaLucia JJ, Hicks D: The thermodynamics of DNA structural motifs. Annu Rev Biophys Biomol Struct 2004, 33: 415–440. 10.1146/annurev.biophys.32.110601.141800View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.