MPrime: efficient large scale multiple primer and oligonucleotide design for customized gene microarrays
© Rouchka et al; licensee BioMed Central Ltd. 2005
Received: 02 February 2005
Accepted: 13 July 2005
Published: 13 July 2005
Enhancements in sequencing technology have recently yielded assemblies of large genomes including rat, mouse, human, fruit fly, and zebrafish. The availability of large-scale genomic and genic sequence data coupled with advances in microarray technology have made it possible to study the expression of large numbers of sequence products under several different conditions in days where traditional molecular biology techniques might have taken months, or even years. Therefore, to efficiently study a number of gene products associated with a disease, pathway, or other biological process, it is necessary to be able to design primer pairs or oligonucleotides en masse rather than using a time consuming and laborious gene-by-gene method.
We have developed an integrated system, MPrime, in order to efficiently calculate primer pairs or specific oligonucleotides for multiple genic regions based on a keyword, gene name, accession number, or sequence fasta format within the rat, mouse, human, fruit fly, and zebrafish genomes. A set of products created for mouse housekeeping genes from MPrime-designed primer pairs has been validated using both PCR-amplification and DNA sequencing.
These results indicate MPrime accurately incorporates standard PCR primer design characteristics to produce high scoring primer pairs for genes of interest. In addition, sequence similarity for a set of oligonucleotides constructed for the same set of genes indicates high specificity in oligo design.
Recent advances in DNA sequencing technologies have resulted in the availability of the assembly of a number of genomes at various stages. As of December 24, 2004, the Genomes OnLine Database  lists 1,245 on-going or completed genome projects. Included are the genomes of human [2, 3], mouse , Norwegian brown rat , fruit fly , fugu , chimpanzee, and zebrafish . In addition, various projects have been concerned with the curation of the genic regions of these genomes, most notably RefSeq .
Advances in microarray technology  have made it possible to simultaneously study the expression levels of thousands of genes under a given condition. Since a molecular biologist may be interested in studying a particular subset of genes in a given genome using microarray experimentation, it becomes necessary for the scientist to design complementary sequences to uniquely identify the genes of interest for either a cDNA product  or synthetic oligonucleotide [11–13] approach.
Custom microarray experimentation involves several steps, each of which must be performed in a timely manner in order to avoid becoming a bottleneck in the system. The steps involved are: determination of target genes; designing primers or oligonucleotides to create products uniquely identifying the target genes; spotting of the products on the microarray slide; detection of the presence of the gene under a given condition; and data analysis. MPrime focuses on the primer and oligonucleotide design stage in the microarray experimentation process.
Comparison to existing primer and oligo design software
Comparison of properties of MPrime with various primer and oligonucleotide design programs.
Large Scale Design
Sequence Similarity Avoidance
Proutski and Holmes (1996) 
Li, et al. (1997) 
Haas, et al. (1998) 
Rozen and Skaletsky (2000) 
Kampke et al., (2001) 
Raddatz et al. (2001) 
Varotto et al. (2001) 
Li and Stormo (2001) 
Pozhitkov and Tautz (2002) 
Xu et al. (2002) 
Nielsen et al. (2003) 
Rouillard et al. (2003) 
Wang and Seed (2003) 
Reymond, et al. (2004) 
There are a number of limitations to these approaches. Some of the algorithms find a primer or oligo from a single sequence at a time. Such an approach is time consuming and prone to human error, since the results must be transferred one at a time. Other algorithms may consider multiple sequences, yet fail to consider a sequence similarity score, resulting in products which do not uniquely identify genes. Some approaches pre-compute a set of products for a given genome, which works well if the complete genome is available, but behaves poorly in an adaptive environment. Very few of these approaches allow the user to specify whether they would like to design either primers or oligos, limiting them to one or the other.
MPrime attempts to overcome each of these limitations by allowing for efficient large scale multiple product design. MPrime approaches primer and oligo design by looking at a user-defined subset of genes of interest identified by either a GenBank  or RefSeq  accession number, by a gene name, or by the raw fasta sequence data. When the user specifies a particular genome of interest, MPrime will search through that genome to ensure that the products chosen will uniquely identify the gene the user is trying to identify.
Due to the methods employed, MPrime can be useful in large-scale primer and oligo design. Two software packages, GenomePRIDE  and GST-PRIME , have been described as offering large-scale, genome-wide design of oligomers and primer pairs for the construction of custom microarray products. MPrime provides advantages to each of these approaches.
GST-PRIME allows for an automated, genome-wide design of primer pairs in a similar fashion to MPrime. MPrime improves upon the functionality of GST-PRIME by allowing for the construction of gene-specific oligos for use with oligo-based custom microarrays. In addition, while GST-PRIME allows for the detection of primer pairs for a cDNA based chip on a whole genome, users of the system are limited to only choosing genes associated with a protein GenBank Identifier (GI) number. This inherently imposes limitations for the user. For instance, computationally predicted genes, genes supported by biological evidence (ESTs for example), or pre-published gene sequences cannot be used. MPrime avoids these complications by allowing the user to obtain the gene information directly from the GI number, or optionally from a raw fasta formatted sequence.
GenomePRIDE is the only known software package that incorporates high throughput primer pair and oligo design. MPrime distinguishes itself from GenomePRIDE by taking into account several factors, including GC content, secondary structure loop formation, and the presence of a GC clamp. While omitting these calculations makes the detection of primers and oligos much more efficient, it can also potentially lead to problems with both PCR product amplification and probe to product binding affinities. In addition, MPrime is freely available via a web interface, whereas GenomePRIDE has an associated cost of 150 euros for academic institutions.
Selection of genic regions
The first step in primer design is to decide the regions to study. The interests of our collaborating lab lie in collating sets of mammalian programmed cell death genes related to neurological diseases such as Alzheimer's, Parkinson's, and Huntington's for use in microarray analyses. Therefore, an interface has been incorporated into MPrime in order to allow the user to retrieve genic regions by either a GenBank accession, gene name, or by a keyword. Retrieval by a keyword allows the user to have a general idea of the region of interest without having to spend time searching for an exact gene name or GenBank accession. Potential candidates are presented to the user. MPrime allows the user to refine their results by deciding on which candidate regions will be included before proceeding to primer or oligo design.
Primer design calculations
Although there are potentially a number of applications for the use of MPrime, it was originally designed with the intent for the construction of primers for custom cDNA microarray chips. In order to obtain the primer products with a high throughput, multiplex PCR can be implemented as the underlying biological experimentation. To ensure that the multiplex PCR experiments function properly and the microarray experimentation produces results that can be meaningfully analyzed the primer products should adhere to the properties of uniformity, sensitivity, and specificity.
Multiplex PCR  is a technique allowing for the amplification of two or more targets in an organism of interest by incorporating multiple sets of PCR primers into a single reaction. Since each set of primers is subject to the exact same reaction conditions, they must be defined uniformly in order to react to the thermo cycling in a similar fashion. The properties which need to be relatively uniform include primer melting temperature, primer G+C concentration, and primer length.
MPrime requires the user to input a range of primer melting temperatures along with an optimal melting temperature. Any primers designed are required to fall inside of this range. There are a number of different approaches to calculating the melting temperature (T m ) of an oligonucleotide sequence, including an arbitrary method based on the formula T m = 2 * (A + T) + 4 * (G + C) ; a long probe method based on the length and GC concentration ; and a nearest neighbor method based on dinucleotide interactions [35, 36]. MPrime incorporates the nearest neighbor approach using the formula of Rychlik, et al.  where the T m is calculated as follows:
Where ΔH is the enthalpy for helix formation, ΔS is the entropy for helix formation, c is the molar concentration of the primer (set at 250 pM); M is the molar concentration of Na+ (set at 50 mM) and R is the gas constant (1.987 cal/degree * mol).
Calculation of the primer G+C concentration is a straight-forward calculation of the percentage of the primer oligo containing G's or C's. MPrime requires the user to input the minimum, maximum, and optimal G+C concentration for the primers. In addition, the user can specify a GC clamp length, or the number of G's or C's required at the 3' end of the primer. The GC clamp helps to promote primer end stability, resulting in more efficient and specific bonding of the primer to the amplicon template.
In addition, MPrime requires the user to enter information concerning the primer length. The minimum, maximum, and optimal primer lengths are required. The primer length uniformity along with the uniform G+C content helps to maintain a uniform primer melting temperature.
One problem that can occur in product design is that either the products or the corresponding probe sequences could form secondary or tertiary structures, thus causing them to fail to interact, resulting in an apparent low signal. In order to overcome these problems of unwanted annealing between primers and products, MPrime incorporates a scoring scheme for paired-end and self-end annealing to reduce primer dimers; paired annealing to reduce partial double stranded structures; and self annealing to reduce secondary structure formation of single stranded sequences within both products and individual primers. These scoring schemes are adapted from Kampke, et al.  where each G-C base pair is assigned a score of 4 and each A-T base pairing is assigned a score of 2.
It is important that each primer set results in only the product of interest, and that each product uniquely identifies the gene being studied. In order to build in these constraints, MPrime searches the primers against a RefSeq  database for the organism being studied using wublastn . The highest scoring sequences should be the genic region of interest; all other similar, suboptimal alignments will result from spurious hits. In order to assure that the product designed is specific to the gene of interest, the blast scores for all database matches are considered, and a specificity score is computed as the percent identity of the match multiplied by the percent of the product covered by the match. The blast results are then resorted based on this specificity score. The native locus should have a score of 1 (100% identity * 100% coverage). The next highest scoring segment is then taken to give an idea of the specificity of the designed product. For instance, if the product is completely unique, the second highest score will be 0. If the next highest scoring segment has an 80% identity of 60% of the sequence, the score would be 0.48. The product specificity score is weighted and calculated into the overall primer score. Note that in some instances, a gene will have duplicate entries in RefSeq. This may result in the product specificity score of 1 for all possible products. In such a case, the best scoring primer set is reported along with the product specificity score. In these cases, it is up to the user to determine whether or not such a primer set should be used.
MPrime incorporates an overall scoring scheme for each pair of primers. The basis behind this scheme is to find the best scoring primer pair with the smallest deviation from the overall optimal values. The smaller the score is, the more likely it is that the primer pair will function as desired. Each of the parameters entered is weighted evenly, with the exception that a larger weight is associated to product specificity. Similar scoring schemes have been proposed in Kampke, et al.  and are used in Primer3 .
MPrime also has the capability of calculating oligonucleotides for sequences of interest. The scoring scheme for determining the optimal oligonucleotides is very similar with the exception that there are no primer-primer interactions that can occur. With oligonucleotides, the similarity scores are weighted slightly higher, and are based on the actual length of the oligos desired. Typically, the 3' end of genes tends to be more variable and therefore more unique in their sequence. In addition, MPrime has the added feature of allowing the user to specify a region in which the oligo must be located.
Sequence input into MPrime
Sequences can be input into MPrime in one of four ways: by gene name, by GenBank accession number, by keyword, or by the raw fasta sequence data. The user can specify whether they are interested in sequences from the human, rat, mouse, fruit fly, or zebrafish genome. MPrime will then search for the sequence of interest, and return sequence records from RefSeq and the appropriate GenBank databases matching the search string. If the GenBank accession is given, MPrime first checks to see if there is a gene matching exactly.
Once the regions of interest are decided upon, the sequences are retrieved from local copies of either RefSeq or GenBank. For cDNAs, these sequences are then searched in order to retrieve primer pairs adhering to a set of guidelines. The primer length, G+C content, melting temperature, self-hybridization characteristics, and primer-primer hybridization characteristics must be taken into account in order to allow the primers to effectively be used in PCR experiments . In addition, information on the sequence product is taken in as well, including the product length, G+C content, and melting temperature. The MPrime software was developed using both Perl and C++.
Since the end goal is to create products for incorporation into gene microarray chips, it is extremely important to provide relatively uniform product lengths in order to provide equal concentrations of sequence products and to be able to distinguish between the expression levels of different genes within a single experiment. The MPrime interface allows the user to enter in a range of values along with an optimal product length.
After potential primer pairs are created, they are searched against the appropriate database of genomic sequence in order to ensure the primer pairs and subsequent products are unique. If the products on the microarray chip are to uniquely identify genes, then it is necessary to check that the resulting products do not represent domains repeated in several genes.
Validation of MPrime
Sequence products produced by primer pairs selected from MPrime were validated using two independent techniques: (i) polymerase chain reaction and (ii) sequencing using the Beckman Coulter CEQ8000 Genetic Analysis System. Oligonucleotides constructed from MPrime were checked for their specificity by a nucleotide sequence search against Mus musculus sequences.
Polymerase chain reaction (PCR)
After primer pairs are determined, the sequence products are amplified using a 96 well plate DNA Engine Tetrad Thermal Cycler (MJ Research, Inc., Boston, MA). A MultiPROBE II Liquid Handler (Packard Instruments, Boston, MA) is used to help automate the process of creating the 96 well trays used in PCR amplification. The resulting products are then spotted on customized microarrays using a BioChip Arrayer (Packard Instruments).
PCR was performed using mouse brain total RNA (Clontech, Palo Alto, CA) and reverse transcribed into cDNA. RT-PCR was performed for a set of housekeeping genes and the primers used for RT-PCR were the same as those designed by MPrime program. Briefly, 5 μl of total RNA was used reverse transcribed using SuperScript II (Invitrogen, Carlsbad, CA) as described by the manufacture protocols. PCR reaction including 10 μl of cDNA template in a 100 μl reaction volume was amplified using DNA Engine Tetrad (MJ Research, Inc., Boston, MA). The thermal cycling profile consisted of 95°C for 3 min, 94°C for 30 s, 60°C for 30 s, 72°C for 30 s and 72°C for 7 min. Cycling kinetics were performed using 30 cycles. Amplified PCR reactions were separated on 2% agarose gel in the presence of ethidium bromide for visualization.
Purified mouse housekeeping genes products were sequenced using the CEQ 8000 Genetic Analysis System (Beckman Coulter Inc., Fullerton, CA). PCR was carried out using left primer and target cDNA from mouse brain as described above for 25 cycles. Sequences were then compared using a blast search against the corresponding GenBank and/or RefSeq entry for verification.
MPrime-computed oligonucleotides were compared against sequences from the organism Mus musculus using NCBI Blast . The nucleotide-nucleotide Blast program blastn was used with default parameters. The advanced search option limiting to the Mus musculus dataset was used. Similar sequences reporting a bit score of 99.7 or higher (five mismatches in 70 bases) were reported as significant alignments for the purpose of oligonucleotide specificity.
Sample set of genes and resulting primer pairs using MPrime1.0. The top sequence in each primer pair is the forward primer, the second sequence is the reverse primer.
Computed 70-mer oligos for eleven mouse housekeeping genes using MPrime 1.3 default parameters.
70-mer oligonucleotide sequence
Blastn similarities to MPrime computed oligos. 1mRNA sequence from which the RefSeq sequence was derived.
RefSeq entry for GAPDH
RefSeq entry for Actb
mRNA for beta actin
beta actin cDNA
Mouse A-X beta actin mRNA
Chromosome X genomic sequence
RefSeq entry for Sdf4
mRNA for Sdf4
Chromosome 4 genomic sequence
RefSeq entry for Ubb
mRNA for Ubb
RefSeq entry for Ywhaz
mRNA for Ywhaz
mRNA for phosholipase A2
RefSeq entry for Hprt
mRNA for Hprt
Chromosome X genomic sequence
RefSeq entry for Tubd1
mRNA for Tubd1
Chromosome 11 genomic sequence
RefSeq entry for Tuba1
mRNA for Tuba1
Tuba1 gene, 3' end
RefSeq entry for Lgals3bp (Ppicap pseudonym)
mRNA for mama
Chromosome 11 genomic sequence
RefSeq entry for Hspca
RefSeq entry for Htf9c
Chromosome 16 genomic sequence
For GAPDH, there were not any oligonucleotides found in the 500 bases on the 3' end of the RefSeq sequence of length 70 that were unique to GAPDH. The highest scoring oligo resulted in 37 matches with a perfect bit score of 139. This is reflected in the high sequence similarity score reported by MPrime. For these 37 matches, fourteen were to predicted similar mRNAs; five were similar to GAPDH; and four were GAPDH mRNA. The remaining fourteen sequences were genomic matches to chromosome 2 (2 sequences) chromosome 3 (3), chromosome 4 (2), chromosome 7 (2), chromosome 8 (3), chromosome 10 (1), and chromosome 15 (1).
MPrime uses RefSeq as the default database for searching for gene sequences. In order to understand more fully what happens with GAPDH, RefSeq was examined more closely. A search of RefSeq with the constraint "Mus musculus similar to glyceraldehyde-3" "Mus musculus" [ORGN] results in 118 different sequences, many of which completely cover the GAPDH sequence used by MPrime. Since the GAPDH sequence is not unique within the set of RefSeq sequences, a unique 70-mer could not be detected.
In addition to the detection of primer products and single, long oligomers (on the order of 40–70 bases) that uniquely identify a gene of interest, individual researchers may want to design multiple probes for a single gene. Such an approach can help to limit the effects of cross-hybridization, the presence of multiple gene isoforms, and single nucleotide polymorphisms. An approach to design multiple short oligonucleotide probes (on the order of 25 bases long) has previously been presented . MPrime presents the user the option of reporting back more than one resulting primer product or oligonucleotide for each gene. These results are reported in order based on their optimal score, with a score of 0 having the least deviation from the optimal user-selected conditions.
Researchers may additionally be interested in looking at multiple overlapping probes for each gene to produce tiled arrays that can help to detect the presence of different isoforms or single nucleotide polymorphisms present in an organism . This option is not currently available in MPrime, but is a planned addition in a future release.
MPrime has proven to be an efficient and effective tool for both primer and oligonucleotide design. The methodology behind MPrime allows molecular biologists to construct a large number of primers and/or oligonucleotides for genes that need to have consistent properties. In addition, multiple sub-optimal results can be reported and tested. Preliminary tests on a set of housekeeping genes indicate that primer products obtained by MPrime-designed primer pairs produce uniquely identifying regions. Oligonucleotides designed by MPrime also appear to produce highly specific segments as indicated by similarity searches to a number of databases. As more organisms become sequenced and customized gene microarrays become cheaper, the availability of interactive tools such as MPrime to design sequences for custom use will become even more important.
Availability and requirements
Project name: MPrime: multiple primer design
Project home page: http://kbrin.a-bldg.louisville.edu/Tools/MPrime/
Operating system: Platform independent (developed using Perl and C++)
Other requirements: MPrime is a freely available to both academic and commercial users as a web-based form.
List of abbreviations used
- Actb :
- BLAST :
Basic Local Alignment Search Tool
- bp :
- BRG :
University of Louisville Bioinformatics Research Group
- cDNA :
- DNA :
- GAPDH :
- GB :
One billion bytes
- Hprt :
- Hspca :
Heat shock protein 1, alpha
- Htf9c :
HpaII tiny fragments locus 9C
- KB :
One thousand nucleotide bases
- NCBI :
National Center for Biotechnology Information
- oligo :
- PCR :
Polymerase chain reaction
- Ppicap :
Peptidylprolyl isomerase C-associated protein
- RT-PCR :
Reverse transcription polymerase chain reaction
- Sdf4 :
Stromal cell derived factor 4
- Tuba1 :
Tubulin, alpha 1
- Tubd1 :
Tubulin, delta 1
- Ubb :
- Ywhaz :
Tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein, zeta polypeptide
The authors wish to thank members of the University of Louisville Bioinformatics Research Group (BRG) for all of their support and important feedback. Support from NIH:NCRR grant P20 RR16481 (NC) is gratefully acknowledged.
- Bernal A, Ear U, Kyrpides N: Genomes OnLine Database (GOLD): a monitor of genome projects world-wide. Nucleic Acids Res 2001, 29: 126–127. 10.1093/nar/29.1.126PubMed CentralView ArticlePubMedGoogle Scholar
- Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, Levine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann N, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, Gibbs RA, Muzny DM, Scherer SE, Bouck JB, Sodergren EJ, Worley KC, Rives CM, Gorrell JH, Metzker ML, Naylor SL, Kucherlapati RS, Nelson DL, Weinstock GM, Sakaki Y, Fujiyama A, Hattori M, Yada T, Toyoda A, Itoh T, Kawagoe C, Watanabe H, Totoki Y, Taylor T, Weissenbach J, Heilig R, Saurin W, Artiguenave F, Brottier P, Bruls T, Pelletier E, Robert C, Wincker P, Smith DR, Doucette-Stamm L, Rubenfield M, Weinstock K, Lee HM, Dubois J, Rosenthal A, Platzer M, Nyakatura G, Taudien S, Rump A, Yang H, Yu J, Wang J, Huang G, Gu J, Hood L, Rowen L, Madan A, Qin S, Davis RW, Federspiel NA, Abola AP, Proctor MJ, Myers RM, Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV, Kaul R, Raymond C, Shimizu N, Kawasaki K, Minoshima S, Evans GA, Athanasiou M, Schultz R, Roe BA, Chen F, Pan H, Ramser J, Lehrach H, Reinhardt R, McCombie WR, de la BM, Dedhia N, Blocker H, Hornischer K, Nordsiek G, Agarwala R, Aravind L, Bailey JA, Bateman A, Batzoglou S, Birney E, Bork P, Brown DG, Burge CB, Cerutti L, Chen HC, Church D, Clamp M, Copley RR, Doerks T, Eddy SR, Eichler EE, Furey TS, Galagan J, Gilbert JG, Harmon C, Hayashizaki Y, Haussler D, Hermjakob H, Hokamp K, Jang W, Johnson LS, Jones TA, Kasif S, Kaspryzk A, Kennedy S, Kent WJ, Kitts P, Koonin EV, Korf I, Kulp D, Lancet D, Lowe TM, McLysaght A, Mikkelsen T, Moran JV, Mulder N, Pollara VJ, Ponting CP, Schuler G, Schultz J, Slater G, Smit AF, Stupka E, Szustakowski J, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Wallis J, Wheeler R, Williams A, Wolf YI, Wolfe KH, Yang SP, Yeh RF, Collins F, Guyer MS, Peterson J, Felsenfeld A, Wetterstrand KA, Patrinos A, Morgan MJ, de JP, Catanese JJ, Osoegawa K, Shizuya H, Choi S, Chen YJ: Initial sequencing and analysis of the human genome. Nature 2001, 409: 860–921. 10.1038/35057062View ArticlePubMedGoogle Scholar
- Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, Gocayne JD, Amanatides P, Ballew RM, Huson DH, Wortman JR, Zhang Q, Kodira CD, Zheng XH, Chen L, Skupski M, Subramanian G, Thomas PD, Zhang J, Gabor Miklos GL, Nelson C, Broder S, Clark AG, Nadeau J, McKusick VA, Zinder N, Levine AJ, Roberts RJ, Simon M, Slayman C, Hunkapiller M, Bolanos R, Delcher A, Dew I, Fasulo D, Flanigan M, Florea L, Halpern A, Hannenhalli S, Kravitz S, Levy S, Mobarry C, Reinert K, Remington K, bu-Threideh J, Beasley E, Biddick K, Bonazzi V, Brandon R, Cargill M, Chandramouliswaran I, Charlab R, Chaturvedi K, Deng Z, Di FV, Dunn P, Eilbeck K, Evangelista C, Gabrielian AE, Gan W, Ge W, Gong F, Gu Z, Guan P, Heiman TJ, Higgins ME, Ji RR, Ke Z, Ketchum KA, Lai Z, Lei Y, Li Z, Li J, Liang Y, Lin X, Lu F, Merkulov GV, Milshina N, Moore HM, Naik AK, Narayan VA, Neelam B, Nusskern D, Rusch DB, Salzberg S, Shao W, Shue B, Sun J, Wang Z, Wang A, Wang X, Wang J, Wei M, Wides R, Xiao C, Yan C, Yao A, Ye J, Zhan M, Zhang W, Zhang H, Zhao Q, Zheng L, Zhong F, Zhong W, Zhu S, Zhao S, Gilbert D, Baumhueter S, Spier G, Carter C, Cravchik A, Woodage T, Ali F, An H, Awe A, Baldwin D, Baden H, Barnstead M, Barrow I, Beeson K, Busam D, Carver A, Center A, Cheng ML, Curry L, Danaher S, Davenport L, Desilets R, Dietz S, Dodson K, Doup L, Ferriera S, Garg N, Gluecksmann A, Hart B, Haynes J, Haynes C, Heiner C, Hladun S, Hostin D, Houck J, Howland T, Ibegwam C, Johnson J, Kalush F, Kline L, Koduru S, Love A, Mann F, May D, McCawley S, McIntosh T, McMullen I, Moy M, Moy L, Murphy B, Nelson K, Pfannkoch C, Pratts E, Puri V, Qureshi H, Reardon M, Rodriguez R, Rogers YH, Romblad D, Ruhfel B, Scott R, Sitter C, Smallwood M, Stewart E, Strong R, Suh E, Thomas R, Tint NN, Tse S, Vech C, Wang G, Wetter J, Williams S, Williams M, Windsor S, Winn-Deen E, Wolfe K, Zaveri J, Zaveri K, Abril JF, Guigo R, Campbell MJ, Sjolander KV, Karlak B, Kejariwal A, Mi H, Lazareva B, Hatton T, Narechania A, Diemer K, Muruganujan A, Guo N, Sato S, Bafna V, Istrail S, Lippert R, Schwartz R, Walenz B, Yooseph S, Allen D, Basu A, Baxendale J, Blick L, Caminha M, Carnes-Stine J, Caulk P, Chiang YH, Coyne M, Dahlke C, Mays A, Dombroski M, Donnelly M, Ely D, Esparham S, Fosler C, Gire H, Glanowski S, Glasser K, Glodek A, Gorokhov M, Graham K, Gropman B, Harris M, Heil J, Henderson S, Hoover J, Jennings D, Jordan C, Jordan J, Kasha J, Kagan L, Kraft C, Levitsky A, Lewis M, Liu X, Lopez J, Ma D, Majoros W, McDaniel J, Murphy S, Newman M, Nguyen T, Nguyen N, Nodell M: The sequence of the human genome. Science 2001, 291: 1304–1351. 10.1126/science.1058040View ArticlePubMedGoogle Scholar
- Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, Antonarakis SE, Attwood J, Baertsch R, Bailey J, Barlow K, Beck S, Berry E, Birren B, Bloom T, Bork P, Botcherby M, Bray N, Brent MR, Brown DG, Brown SD, Bult C, Burton J, Butler J, Campbell RD, Carninci P, Cawley S, Chiaromonte F, Chinwalla AT, Church DM, Clamp M, Clee C, Collins FS, Cook LL, Copley RR, Coulson A, Couronne O, Cuff J, Curwen V, Cutts T, Daly M, David R, Davies J, Delehaunty KD, Deri J, Dermitzakis ET, Dewey C, Dickens NJ, Diekhans M, Dodge S, Dubchak I, Dunn DM, Eddy SR, Elnitski L, Emes RD, Eswara P, Eyras E, Felsenfeld A, Fewell GA, Flicek P, Foley K, Frankel WN, Fulton LA, Fulton RS, Furey TS, Gage D, Gibbs RA, Glusman G, Gnerre S, Goldman N, Goodstadt L, Grafham D, Graves TA, Green ED, Gregory S, Guigo R, Guyer M, Hardison RC, Haussler D, Hayashizaki Y, Hillier LW, Hinrichs A, Hlavina W, Holzer T, Hsu F, Hua A, Hubbard T, Hunt A, Jackson I, Jaffe DB, Johnson LS, Jones M, Jones TA, Joy A, Kamal M, Karlsson EK, Karolchik D, Kasprzyk A, Kawai J, Keibler E, Kells C, Kent WJ, Kirby A, Kolbe DL, Korf I, Kucherlapati RS, Kulbokas EJ, Kulp D, Landers T, Leger JP, Leonard S, Letunic I, Levine R, Li J, Li M, Lloyd C, Lucas S, Ma B, Maglott DR, Mardis ER, Matthews L, Mauceli E, Mayer JH, McCarthy M, McCombie WR, McLaren S, McLay K, McPherson JD, Meldrim J, Meredith B, Mesirov JP, Miller W, Miner TL, Mongin E, Montgomery KT, Morgan M, Mott R, Mullikin JC, Muzny DM, Nash WE, Nelson JO, Nhan MN, Nicol R, Ning Z, Nusbaum C, O'Connor MJ, Okazaki Y, Oliver K, Overton-Larty E, Pachter L, Parra G, Pepin KH, Peterson J, Pevzner P, Plumb R, Pohl CS, Poliakov A, Ponce TC, Ponting CP, Potter S, Quail M, Reymond A, Roe BA, Roskin KM, Rubin EM, Rust AG, Santos R, Sapojnikov V, Schultz B, Schultz J, Schwartz MS, Schwartz S, Scott C, Seaman S, Searle S, Sharpe T, Sheridan A, Shownkeen R, Sims S, Singer JB, Slater G, Smit A, Smith DR, Spencer B, Stabenau A, Stange-Thomann N, Sugnet C, Suyama M, Tesler G, Thompson J, Torrents D, Trevaskis E, Tromp J, Ucla C, Ureta-Vidal A, Vinson JP, Von Niederhausern AC, Wade CM, Wall M, Weber RJ, Weiss RB, Wendl MC, West AP, Wetterstrand K, Wheeler R, Whelan S, Wierzbowski J, Willey D, Williams S, Wilson RK, Winter E, Worley KC, Wyman D, Yang S, Yang SP, Zdobnov EM, Zody MC, Lander ES: Initial sequencing and comparative analysis of the mouse genome. Nature 2002, 420: 520–562. 10.1038/nature01262View ArticlePubMedGoogle Scholar
- Gibbs RA, Weinstock GM, Metzker ML, Muzny DM, Sodergren EJ, Scherer S, Scott G, Steffen D, Worley KC, Burch PE, Okwuonu G, Hines S, Lewis L, DeRamo C, Delgado O, Dugan-Rocha S, Miner G, Morgan M, Hawes A, Gill R, Celera, Holt RA, Adams MD, Amanatides PG, Baden-Tillson H, Barnstead M, Chin S, Evans CA, Ferriera S, Fosler C, Glodek A, Gu Z, Jennings D, Kraft CL, Nguyen T, Pfannkoch CM, Sitter C, Sutton GG, Venter JC, Woodage T, Smith D, Lee HM, Gustafson E, Cahill P, Kana A, Doucette-Stamm L, Weinstock K, Fechtel K, Weiss RB, Dunn DM, Green ED, Blakesley RW, Bouffard GG, de Jong PJ, Osoegawa K, Zhu B, Marra M, Schein J, Bosdet I, Fjell C, Jones S, Krzywinski M, Mathewson C, Siddiqui A, Wye N, McPherson J, Zhao S, Fraser CM, Shetty J, Shatsman S, Geer K, Chen Y, Abramzon S, Nierman WC, Havlak PH, Chen R, Durbin KJ, Egan A, Ren Y, Song XZ, Li B, Liu Y, Qin X, Cawley S, Worley KC, Cooney AJ, D'Souza LM, Martin K, Wu JQ, Gonzalez-Garay ML, Jackson AR, Kalafus KJ, McLeod MP, Milosavljevic A, Virk D, Volkov A, Wheeler DA, Zhang Z, Bailey JA, Eichler EE, Tuzun E, Birney E, Mongin E, Ureta-Vidal A, Woodwark C, Zdobnov E, Bork P, Suyama M, Torrents D, Alexandersson M, Trask BJ, Young JM, Huang H, Wang H, Xing H, Daniels S, Gietzen D, Schmidt J, Stevens K, Vitt U, Wingrove J, Camara F, Mar AM, Abril JF, Guigo R, Smit A, Dubchak I, Rubin EM, Couronne O, Poliakov A, Hubner N, Ganten D, Goesele C, Hummel O, Kreitler T, Lee YA, Monti J, Schulz H, Zimdahl H, Himmelbauer H, Lehrach H, Jacob HJ, Bromberg S, Gullings-Handley J, Jensen-Seaman MI, Kwitek AE, Lazar J, Pasko D, Tonellato PJ, Twigger S, Ponting CP, Duarte JM, Rice S, Goodstadt L, Beatson SA, Emes RD, Winter EE, Webber C, Brandt P, Nyakatura G, Adetobi M, Chiaromonte F, Elnitski L, Eswara P, Hardison RC, Hou M, Kolbe D, Makova K, Miller W, Nekrutenko A, Riemer C, Schwartz S, Taylor J, Yang S, Zhang Y, Lindpaintner K, Andrews TD, Caccamo M, Clamp M, Clarke L, Curwen V, Durbin R, Eyras E, Searle SM, Cooper GM, Batzoglou S, Brudno M, Sidow A, Stone EA, Venter JC, Payseur BA, Bourque G, Lopez-Otin C, Puente XS, Chakrabarti K, Chatterji S, Dewey C, Pachter L, Bray N, Yap VB, Caspi A, Tesler G, Pevzner PA, Haussler D, Roskin KM, Baertsch R, Clawson H, Furey TS, Hinrichs AS, Karolchik D, Kent WJ, Rosenbloom KR, Trumbower H, Weirauch M, Cooper DN, Stenson PD, Ma B, Brent M, Arumugam M, Shteynberg D, Copley RR, Taylor MS, Riethman H, Mudunuri U, Peterson J, Guyer M, Felsenfeld A, Old S, Mockrin S, Collins F: Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 2004, 428: 493–521. 10.1038/nature02426View ArticlePubMedGoogle Scholar
- Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, Scherer SE, Li PW, Hoskins RA, Galle RF, George RA, Lewis SE, Richards S, Ashburner M, Henderson SN, Sutton GG, Wortman JR, Yandell MD, Zhang Q, Chen LX, Brandon RC, Rogers YH, Blazej RG, Champe M, Pfeiffer BD, Wan KH, Doyle C, Baxter EG, Helt G, Nelson CR, Gabor GL, Abril JF, Agbayani A, An HJ, ndrews-Pfannkoch C, Baldwin D, Ballew RM, Basu A, Baxendale J, Bayraktaroglu L, Beasley EM, Beeson KY, Benos PV, Berman BP, Bhandari D, Bolshakov S, Borkova D, Botchan MR, Bouck J, Brokstein P, Brottier P, Burtis KC, Busam DA, Butler H, Cadieu E, Center A, Chandra I, Cherry JM, Cawley S, Dahlke C, Davenport LB, Davies P, de PB, Delcher A, Deng Z, Mays AD, Dew I, Dietz SM, Dodson K, Doup LE, Downes M, Dugan-Rocha S, Dunkov BC, Dunn P, Durbin KJ, Evangelista CC, Ferraz C, Ferriera S, Fleischmann W, Fosler C, Gabrielian AE, Garg NS, Gelbart WM, Glasser K, Glodek A, Gong F, Gorrell JH, Gu Z, Guan P, Harris M, Harris NL, Harvey D, Heiman TJ, Hernandez JR, Houck J, Hostin D, Houston KA, Howland TJ, Wei MH, Ibegwam C, Jalali M, Kalush F, Karpen GH, Ke Z, Kennison JA, Ketchum KA, Kimmel BE, Kodira CD, Kraft C, Kravitz S, Kulp D, Lai Z, Lasko P, Lei Y, Levitsky AA, Li J, Li Z, Liang Y, Lin X, Liu X, Mattei B, McIntosh TC, McLeod MP, McPherson D, Merkulov G, Milshina NV, Mobarry C, Morris J, Moshrefi A, Mount SM, Moy M, Murphy B, Murphy L, Muzny DM, Nelson DL, Nelson DR, Nelson KA, Nixon K, Nusskern DR, Pacleb JM, Palazzolo M, Pittman GS, Pan S, Pollard J, Puri V, Reese MG, Reinert K, Remington K, Saunders RD, Scheeler F, Shen H, Shue BC, Siden-Kiamos I, Simpson M, Skupski MP, Smith T, Spier E, Spradling AC, Stapleton M, Strong R, Sun E, Svirskas R, Tector C, Turner R, Venter E, Wang AH, Wang X, Wang ZY, Wassarman DA, Weinstock GM, Weissenbach J, Williams SM, WoodageT, Worley KC, Wu D, Yang S, Yao QA, Ye J, Yeh RF, Zaveri JS, Zhan M, Zhang G, Zhao Q, Zheng L, Zheng XH, Zhong FN, Zhong W, Zhou X, Zhu S, Zhu X, Smith HO, Gibbs RA, Myers EW, Rubin GM, Venter JC: The genome sequence of Drosophila melanogaster. Science 2000, 287: 2185–2195. 10.1126/science.287.5461.2185View ArticlePubMedGoogle Scholar
- Aparicio S, Chapman J, Stupka E, Putnam N, Chia JM, Dehal P, Christoffels A, Rash S, Hoon S, Smit A, Gelpke MD, Roach J, Oh T, Ho IY, Wong M, Detter C, Verhoef F, Predki P, Tay A, Lucas S, Richardson P, Smith SF, Clark MS, Edwards YJ, Doggett N, Zharkikh A, Tavtigian SV, Pruss D, Barnstead M, Evans C, Baden H, Powell J, Glusman G, Rowen L, Hood L, Tan YH, Elgar G, Hawkins T, Venkatesh B, Rokhsar D, Brenner S: Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 2002, 297: 1301–1310. 10.1126/science.1072104View ArticlePubMedGoogle Scholar
- UCSC Genome Browser2005. [http://genome.ucsc.edu/]
- Pruitt KD, Tatusova T, Maglott DR: NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 2005, 33: D501-D504. 10.1093/nar/gki025PubMed CentralView ArticlePubMedGoogle Scholar
- Schena M, Shalon D, Davis RW, Brown PO: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 1995, 270: 467–470.View ArticlePubMedGoogle Scholar
- Gerhold D, Rushmore T, Caskey CT: DNA chips: promising toys have become powerful tools. Trends Biochem Sci 1999, 24: 168–173. 10.1016/S0968-0004(99)01382-1View ArticlePubMedGoogle Scholar
- Lockhart DJ, Dong H, Byrne MC, Follettie MT, Gallo MV, Chee MS, Mittmann M, Wang C, Kobayashi M, Horton H, Brown EL: Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol 1996, 14: 1675–1680. 10.1038/nbt1296-1675View ArticlePubMedGoogle Scholar
- Pease AC, Solas D, Sullivan EJ, Cronin MT, Holmes CP, Fodor SP: Light-generated oligonucleotide arrays for rapid DNA sequence analysis. Proc Natl Acad Sci U S A 1994, 91: 5022–5026.PubMed CentralView ArticlePubMedGoogle Scholar
- Haas S, Vingron M, Poustka A, Wiemann S: Primer design for large scale sequencing. Nucleic Acids Res 1998, 26: 3006–3012. 10.1093/nar/26.12.3006PubMed CentralView ArticlePubMedGoogle Scholar
- Kampke T, Kieninger M, Mecklenburg M: Efficient primer design algorithms. Bioinformatics 2001, 17: 214–225. 10.1093/bioinformatics/17.3.214View ArticlePubMedGoogle Scholar
- Li LC, Dahiya R: MethPrimer: designing primers for methylation PCRs. Bioinformatics 2002, 18: 1427–1431. 10.1093/bioinformatics/18.11.1427View ArticlePubMedGoogle Scholar
- Li P, Kupfer KC, Davies CJ, Burbee D, Evans GA, Garner HR: PRIMO: A primer design program that applies base quality statistics for automated large-scale DNA sequencing. Genomics 1997, 40: 476–485. 10.1006/geno.1996.4560View ArticlePubMedGoogle Scholar
- McKay SJ, Jones SJ: AcePrimer: automation of PCR primer design based on gene structure. Bioinformatics 2002, 18: 1538–1539. 10.1093/bioinformatics/18.11.1538View ArticlePubMedGoogle Scholar
- Proutski V, Holmes EC: Primer Master: a new program for the design and analysis of PCR primers. Comput Appl Biosci 1996, 12: 253–255.PubMedGoogle Scholar
- Raddatz G, Dehio M, Meyer TF, Dehio C: PrimeArray: genome-scale primer design for DNA-microarray construction. Bioinformatics 2001, 17: 98–99. 10.1093/bioinformatics/17.1.98View ArticlePubMedGoogle Scholar
- Rozen S, Skaletsky H: Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol 2000, 132: 365–386.PubMedGoogle Scholar
- Fernandes RJ, Skiena SS: Microarray synthesis through multiple-use PCR primer design. Bioinformatics 2002, 18 Suppl 1: S128-S135.View ArticlePubMedGoogle Scholar
- Varotto C, Richly E, Salamini F, Leister D: GST-PRIME: a genome-wide primer design software for the generation of gene sequence tags. Nucleic Acids Res 2001, 29: 4373–4377. 10.1093/nar/29.21.4373PubMed CentralView ArticlePubMedGoogle Scholar
- Xu D, Li G, Wu L, Zhou J, Xu Y: PRIMEGENS: robust and efficient design of gene-specific probes for microarray analysis. Bioinformatics 2002, 18: 1432–1437. 10.1093/bioinformatics/18.11.1432View ArticlePubMedGoogle Scholar
- Li F, Stormo GD: Selection of optimal DNA oligos for gene expression arrays. Bioinformatics 2001, 17: 1067–1076. 10.1093/bioinformatics/17.11.1067View ArticlePubMedGoogle Scholar
- Nielsen HB, Wernersson R, Knudsen S: Design of oligonucleotides for microarrays and perspectives for design of multi-transcriptome arrays. Nucleic Acids Res 2003, 31: 3491–3496. 10.1093/nar/gkg622PubMed CentralView ArticlePubMedGoogle Scholar
- Pozhitkov AE, Tautz D: An algorithm and program for finding sequence specific oligonucleotide probes for species identification. BMC Bioinformatics 2002, 3: 9. 10.1186/1471-2105-3-9PubMed CentralView ArticlePubMedGoogle Scholar
- Reymond N, Charles H, Duret L, Calevro F, Beslon G, Fayard JM: ROSO: optimizing oligonucleotide probes for microarrays. Bioinformatics 2004, 20: 271–273. 10.1093/bioinformatics/btg401View ArticlePubMedGoogle Scholar
- Rouillard JM, Zuker M, Gulari E: OligoArray 2.0: design of oligonucleotide probes for DNA microarrays using a thermodynamic approach. Nucleic Acids Res 2003, 31: 3057–3062. 10.1093/nar/gkg426PubMed CentralView ArticlePubMedGoogle Scholar
- Wang X, Seed B: Selection of oligonucleotide probes for protein coding sequences. Bioinformatics 2003, 19: 796–802. 10.1093/bioinformatics/btg086View ArticlePubMedGoogle Scholar
- Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL: GenBank. Nucleic Acids Res 2005, 33: D34-D38. 10.1093/nar/gki063PubMed CentralView ArticlePubMedGoogle Scholar
- Chamberlain JS, Gibbs RA, Ranier JE, Nguyen PN, Caskey CT: Deletion screening of the Duchenne muscular dystrophy locus via multiplex DNA amplification. Nucleic Acids Res 1988, 16: 11141–11156.PubMed CentralView ArticlePubMedGoogle Scholar
- Wallace RB, Shaffer J, Murphy RF, Bonner J, Hirose T, Itakura K: Hybridization of synthetic oligodeoxyribonucleotides to phi chi 174 DNA: the effect of single base pair mismatch. Nucleic Acids Res 1979, 6: 3543–3557.PubMed CentralView ArticlePubMedGoogle Scholar
- Meinkoth J, Wahl G: Hybridization of nucleic acids immobilized on solid supports. Anal Biochem 1984, 138: 267–284. 10.1016/0003-2697(84)90808-XView ArticlePubMedGoogle Scholar
- Breslauer KJ, Frank R, Blocker H, Marky LA: Predicting DNA duplex stability from the base sequence. Proc Natl Acad Sci U S A 1986, 83: 3746–3750.PubMed CentralView ArticlePubMedGoogle Scholar
- Rychlik W, Spencer WJ, Rhoads RE: Optimization of the annealing temperature for DNA amplification in vitro. Nucleic Acids Res 1990, 18: 6409–6412.PubMed CentralView ArticlePubMedGoogle Scholar
- WU BLAST2005. [http://blast.wustl.edu/]
- NCBI BLAST2005. [http://www.ncbi.nlm.nih.gov/BLAST/]
- Rahmann S: Fast large scale oligonucleotide selection using the longest common factor approach. J Bioinform Comput Biol 2003, 1: 343–361. 10.1142/S0219720003000125View ArticlePubMedGoogle Scholar
- Shoemaker DD, Schadt EE, Armour CD, He YD, Garrett-Engele P, McDonagh PD, Loerch PM, Leonardson A, Lum PY, Cavet G, Wu LF, Altschuler SJ, Edwards S, King J, Tsang JS, Schimmack G, Schelter JM, Koch J, Ziman M, Marton MJ, Li B, Cundiff P, Ward T, Castle J, Krolewski M, Meyer MR, Mao M, Burchard J, Kidd MJ, Dai H, Phillips JW, Linsley PS, Stoughton R, Scherer S, Boguski MS: Experimental annotation of the human genome using microarray technology. Nature 2001, 409: 922–927. 10.1038/35057141View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.