Designing multiple degenerate primers via consecutive pairwise alignments
© Najafabadi et al; licensee BioMed Central Ltd. 2008
Received: 04 April 2007
Accepted: 27 January 2008
Published: 27 January 2008
Different algorithms have been proposed to solve various versions of degenerate primer design problem. For one of the most general cases, multiple degenerate primer design problem, very few algorithms exist, none of them satisfying the criterion of designing low number of primers that cover high number of sequences. Besides, the present algorithms require high computation capacity and running time.
PAMPS, the method presented in this work, usually results in a 30% reduction in the number of degenerate primers required to cover all sequences, compared to the previous algorithms. In addition, PAMPS runs up to 3500 times faster.
Due to small running time, using PAMPS allows designing degenerate primers for huge numbers of sequences. In addition, it results in fewer primers which reduces the synthesis costs and improves the amplification sensitivity.
Polymerase Chain Reaction, or PCR , is a ubiquitous technique which amplifies a specific region of DNA, so that enough copies of that region is available to be adequately tested, sequenced or manipulated in other fashions. In order to use PCR, one must know the exact sequences which lie on either side of the DNA region of interest. These sequences are used to design two synthetic DNA oligonucleotides, or primers, one complementary to each strand of the DNA double-helix and lying on opposite sides of the target region. The primers are typically 20–30 nucleotides long.
Traditionally, degenerate primers were designed manually by examining multiple alignments of the target sequences. However, several programs are now available for designing degenerate primers for aligned sequences. CODEHOP  and DePiCt  are programs for designing degenerate primers for aligned protein sequences in order to identify new members of protein families. For each given multiple sequence alignment, CODEHOP constructs a pair of primers. Each primer consists of a degenerate 3' core region, typically with degeneracy of at most 128, and a 5' consensus sequence that stabilizes annealing. It works well for small sets of proteins, taking into account the codon usage of the target genome as well as the desired annealing temperature. However, it is inappropriate for constructing primers with high degeneracy on large sets of long genomic sequences. DePiCt clusters the sequences using a simple similarity score and then designs a pair of primers for each cluster by translating conserved blocks of amino acids into nucleotides.
Maximum Coverage Degenerate Primer Design (MC-DPD) tries to find a primer of length l and degeneracy at most d max that covers a maximum number of strings (sequences) of a given input set, each of length l. HYDEN , an algorithm based on a heuristic approach, basically addresses this variant of DPD problem and was first used to design degenerate primers for a set of genomic sequences in order to find new human olfactory receptor genes [9, 10].
Minimum Degeneracy Degenerate Primer Design (MD-DPD) addresses the problem of finding a primer of length l and minimum degeneracy that covers all the input strings, each of which having a length equal to or greater than l.
Minimum Primers Degenerate Primer Design (MP-DPD) is applied when a set of strings of length l is given, and finds a minimum number of primers of length l and degeneracy at most d max , so that each input string is covered by at least one primer.
MP-DPD has the constraint that all input sequences are of the same length as the primers, which is not the case for most real situations. Removing this constraint, i.e. allowing the strings to have arbitrary lengths, results in a more general problem, Multiple Degenerate Primer Design (MDPD) . MDPD is to find a minimum number of primers of length at least l min and degeneracy at most d max , given a set of n strings of various lengths (equal to or greater than l min ), so that each input string is covered by at least one primer. A currently available algorithm for designing multiple degenerate primers, called PT-MIPS , has been developed in the context of SNP genotyping. It uses an iterative beam-search technique to construct progressively a set of primers until all sequences are covered.
In this work, we introduce a new algorithm for solving MDPD problems which consecutively uses an ad hoc p airwise a lignment for m ultiple p rimer s election – hence called PAMPS. We will show that PAMPS performs better than previous algorithms on different sets of input strings, i.e. results in smaller number of primers in a considerably less computation time.
Results and Discussion
PT-MIPS  is previously compared with HYDEN . Though HYDEN is basically designed to solve MC-DPD problems, it can be used iteratively to approximate MDPD problems, i.e. once a primer of length l min and degeneracy at most d max is found that covers the maximum number of input sequences, the sequences which are covered by this primer are subtracted from input set and HYDEN is run again on the remaining sequences. By repeating this procedure, eventually a set of primers is obtained which covers all sequences. Since it has been shown that PT-MIPS outperforms HYDEN  and as PAMPS outperforms PT-MIPS, we avoided the direct comparison of PAMPS and HYDEN.
In this work we presented a new algorithm, called PAMPS, for solving MDPD problems. PAMPS exploits an altered pairwise alignment to select the subsequences which may be merged into degenerate primers. PAMPS was shown to run significantly faster than a previously developed software, PT-MIPS  and also gives better results (i.e. smaller sets of primers), reducing the synthesis costs of primers. Besides, when the number of mixed primers that are used in a PCR reaction are decreased, the concentration of the reacting primer increases, which usually improves the sensitivity of amplification. PAMPS, in contrast to previous algorithms, does not restrict the output to the exact primer length that was given; instead, it may result in primers longer than the requested length which allows selecting an appropriate primer in terms of annealing temperature. PAMPS can be used to design degenerate primers for amplification of genes with uncertain sequences, such as new members of gene families or libraries of antibody variable fragments. An implementation of PAMPS is provided in the Additional file 1.
Merging two aligned sequences
The alignment algorithm that is used by PAMPS is very similar to the conventional global alignment . However, the scoring methods differ in some details. Since the purpose is to achieve an alignment that results in a merged sequence with low degeneracy, we defined the score of each match/mismatch asM(x, y) = 2 - log2 |x ∪ y|.
Designing degenerate primers
In order to design degenerate primers, pairs of sequences should be aligned and merged consecutively until no more sequences could be merged (i.e. merging any more pairs of sequences results in primers either with lengths less than l min or with degeneracy more than d max ). However, there are different combinations in which sequences can be merged, each of which may result in a different set of primers. The optimum set is the one that contains the least number of primers. PAMPS uses a procedure similar to MIPS  to search for the optimum set of primers:
PAMPS uses a similar heuristic approach as MIPS  to reduce the search space. Assume P1 is a previously found set of primers that contains m primers and covers n sequences, and P2 is a newly found set that also contains m primers and covers n sequences. P2 is only expanded if the sum of scores of its primers (see section Alignment) exceeds that of P1 (Figure 10).
Authors are grateful to Manely Rashedan for her everlasting supports. This project is granted by Avesina Research Institute, Tehran. HSN and NT are also supported by University of Tehran.
- Mullis K, Faloona F, Scharf S, Saiki R, Horn G, Erlich H: Specific enzymatic amplification of DNA in vitro : The polymerase chain reaction. Cold Spring Harbor Symp Quant Biol 1986, 51: 263–273.View ArticlePubMed
- Souvenir R, Buhler J, Stormo G, Zhang W: Selecting degenerate multiplex PCR primers. Proceedings of the 3rd Workshop on Algorithms in Bioinformatics (WABI 2003) 2003, 512–526.View Article
- Kwok S, Chang S, Sninsky J, Wang A: A guide to the design and use of mismatched and degenerate primers. PCR Methods Appl 1994, 3: S39-S47.View ArticlePubMed
- Fuchs T, Malecova B, Linhart C, Sharan R, Khen M, Herwig R, Shmulevich D, Elkon R, Steinfath M, O'Brien JK, Radelof U, Lehrach H, Lancet D, Shamir R: DEFOG: A Practical Scheme for Deciphering Families of Genes. Genomics 2002, 80(3):1–8. 10.1006/geno.2002.6830View Article
- Jarman SN: Amplicon: software for designing PCR primers on aligned DNA sequences. Bioinformatics 2007, 20(10):1644–1645. 10.1093/bioinformatics/bth121View Article
- Jarman SN, Deagle BE, Gales NJ: Group-specific PCR for DNA-based analysis of species diversity and identity in dietary samples. Mol Ecol 2003, 13: 1313–1322. 10.1111/j.1365-294X.2004.02109.xView Article
- Rose T, Schultz E, Henikoff J, Pietrokovski S, McCallum C, Henikoff S: Consensus-degenerate hybrid oligonucleotide primers for amplification of distantly related sequences. Nucleic Acids Res 1998, 26: 1628–1635. 10.1093/nar/26.7.1628PubMed CentralView ArticlePubMed
- Wei X, Kuhn D, Narasimhan G: Degenerate primer design via clustering. Proceedings of the 2nd IEEE Computer Society Bioinformatics Conference (CSB 2003) 2003, 75–83.View Article
- Linhart C, Shamir R: The degenerate primer design problem. Bioinformatics 2002, 18: S172-S180.View ArticlePubMed
- Linhart C, Shamir R: The Degenerate Primer Design Problem: Theory and Applications. J Comput Biol 2005, 12: 431–456. 10.1089/cmb.2005.12.431View ArticlePubMed
- Breslauer KJ, Frank R, Blocker H, Marky LA: Predicting DNA duplex stability from the base sequence. Proc Natl Acad Sci USA 1986, 83: 3746–3750. 10.1073/pnas.83.11.3746PubMed CentralView ArticlePubMed
- Needleman SB, Wunsch CD: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 1970, 48: 443–453. 10.1016/0022-2836(70)90057-4View ArticlePubMed
- Cornish-Bowden A: IUPAC-IUB symbols for nucleotide nomenclature. Nucleic Acids Res 1985, 13: 3021–3030. 10.1093/nar/13.9.3021PubMed CentralView ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.