Selection of long oligonucleotides for gene expression microarrays using weighted rank-sum strategy
© Hu et al; licensee BioMed Central Ltd. 2007
Received: 24 November 2006
Accepted: 19 September 2007
Published: 19 September 2007
The design of long oligonucleotides for spotted DNA microarrays requires detailed attention to ensure their optimal performance in the hybridization process. The main challenge is to select an optimal oligonucleotide element that represents each genetic locus/gene in the genome and is unique, devoid of internal structures and repetitive sequences and its Tm is uniform with all other elements on the microarray. Currently, all of the publicly available programs for DNA long oligonucleotide microarray selection utilize various combinations of cutoffs in which each parameter (uniqueness, Tm, and secondary structure) is evaluated and filtered individually. The use of the cutoffs can, however, lead to information loss and to selection of suboptimal oligonucleotides, especially for genomes with extreme distribution of the GC content, a large proportion of repetitive sequences or the presence of large gene families with highly homologous members.
Here we present the program OligoRankPick which is using a weighted rank-based strategy to select microarray oligonucleotide elements via an integer weighted linear function. This approach optimizes the selection criteria (weight score) for each gene individually, accommodating variable properties of the DNA sequence along the genome. The designed algorithm was tested using three microbial genomes Escherichia coli, Saccharomyces cerevisiae and the human malaria parasite species Plasmodium falciparum. In comparison to other published algorithms OligoRankPick provides significant improvements in oligonucleotide design for all three genomes with the most significant improvements observed in the microarray design for P. falciparum whose genome is characterized by large fluctuations of GC content, and abundant gene duplications.
OligoRankPick is an efficient tool for the design of long oligonucleotide DNA microarrays which does not rely on direct oligonucleotide exclusion by parameter cutoffs but instead optimizes all parameters in context of each other. The weighted rank-sum strategy utilized by this algorithm provides high flexibility of oligonucleotide selection which accommodates extreme variability of DNA sequence properties along genomes of many organisms.
DNA microarray is one of the most powerful and versatile tools for post-genomic research . After the initial success with cDNA and PCR product-based microarrays, application of long oligonucleotides became widely used in "spotted" DNA microarray technology in the last five years [2–5]. From the beginning it became clear that the design of the oligonucleotide probes requires special attention. Under a single stringency condition, hybridization specificity and efficiency of all oligonucleotides must be globally maximized across the entire array. Thus for the selection of the optimal oligonucleotide candidates, four major parameters are being evaluated: (i) uniqueness which analyzes other possible cross-hybridization targets in the genome; (ii) sequence complexity which evaluates the presence of short nucleotide repeats; (iii) melting temperature (Tm) or GC content which ensures a uniform hybridization efficiency across the microarray; and (iv) level of internal secondary structures which helps to avoid all possible self-binding interference with the specific target hybridization. In principle each of these properties can be calculated individually for every potential oligonucleotide candidate, however, the main challenge that remains is to derive a selection strategy that combines these parameters and selects the most optimal oligonucleotide representative for a given genetic locus/gene.
All currently available programs for long oligonucleotide microarray design utilize different parameters: the binding energy or BLAST-based score to alternative targets to evaluate uniqueness, the GC content or Tm to estimate hybridization stringency, the reverse Smith-Waterman score or free energy to evaluate levels of secondary structure and various types of complexity coefficients to evaluate the presence of short nucleotide repeats in each oligonucleotide element [5–11]. Typically these programs select one or more oligonucleotide representatives of a gene using various systems of cutoff-based filters. For example ArrayOligoSelector creates an intersection of oligonucleotides that pass parameter-based cutoffs for uniqueness, self-binding and sequence complexity. The intersection candidate list is then passed on to the GC filter and subsequently the final representative(s) are selected using a 3' proximity criteria . The cutoff based algorithms provide a powerful approach to select DNA microarray oligonucleotide sets and were successfully used to design DNA microarrays for a large number of species [5, 11–13]. The use of these algorithms is, however, not completely optimal for genomes with high abundance of repetitive sequences and large fluctuations of GC content. To accommodate such genomic sequences, the methods must relax the parameter filter adjustments. The wide "opening" of the cutoff filters can cause selection of suboptimal oligonucleotides for a significant number of genes, due to the fact that all oligonucleotides that pass a particular filter are treated as equal by the subsequent steps, disregarding their subtle diversity within the filtered interval of the parameter (unpublished observations).
To overcome these shortcomings new algorithms which incorporate optimization strategies of oligonucleotide parameters were developed including OligoDesign  and CommOligo . OligoDesign was developed specifically for the design of the locked nucleic acid (LNA) microarray platform which takes advantage of the improved nucleic on-chip capture sensitivity of the LNA substitute mixmer oligonucleotides. Design of these specialized probes requires careful optimizations of the hybridization specificity and efficiency for each probe. For this purpose, OligoDesign uses an extensive fuzzification process derived from neural network approaches to ensure the optimal performance of this highly specialized microarray platform . Similar to the fuzzy logic approach, CommOligo uses a piece-wise linear function to select optimal oligonucleotides via a user configurable iterative process . Both of these methods represented a step in the right direction, recognizing the need for parallel optimization of all used parameters and elimination of cutoffs that cause information loss. At its presently available implementation, however, both OligoDesign and CommOligo utilize complex and computer-time consuming processes that render them unsuitable for high throughput applications. Nevertheless both methods have been useful for design of focused "miniarrays" which typically contain smaller numbers of genes e.g. 120 stress response and toxicological markers from Caenorhabditis elegans  or microarrays for relatively small genomes such as Methanoccocus maripaludis with 1759 genes .
Here we present a novel program named OligoRankPick that is inspired by the aforementioned parameter optimization approaches and it is suitable for the design of gene specific long oligonucleotide probes for genomes of all sizes. The final decision making process is based on a weighted rank-sum strategy which significantly streamlines the entire computation process. This complete eliminates all cutoff based filters, thereby significantly improves the quality of the resulting microarray oligonucleotide design. Moreover, the weighted rank-sum approach enables us to implement an integer weighted linear function to automatically optimize the oligonucleotide parameters for each gene individually. Finally to demonstrate the utility of OligoRankPick we design, assemble and verify a new version of P. falciparum microarray which comprises of 10166 oligonucleotides representing 5363 genes.
Several previous methods have been introduced to normalize oligonucleotide parameter scores including a piece-wise, and sigmoid function [14, 15]. In our approach the transformations of score values calculated for each parameter into rank numbers allows us to uniformly assesses and adjust the contribution of different parameters for the optimal oligonucleotide selection. For all coding sequences of the test genome (P. falciparum) there is a complete agreement between the overall profiles of the scores and the converted ranks in each parameter (Spearman rank order test, P≅0, see Additional file 1 figure S1). This indicates that the rank transformation does not affect the oligonucleotide status in the oligonucleotide set, and that the rank transformation has an equal power of parameter comparisons to the original scores.
Optimizations of weight sets
The challenge of deriving weight values that will select an optimal oligonucleotide in the rank-sum strategy is two-fold: (i) the weight coefficients should correspond to the relative contribution of each parameter to the oligonucleotide performance during the microarray hybridization (formula 1); (ii) the weight set optimization should provide sufficient flexibility to accommodate the variable nature of the primary structure along the genome. For this purpose we aimed to develop a strategy that optimizes the weight set for each gene individually by considering broader intervals of weight coefficients rather then a single target value (formula 2).
To derive and subsequently evaluate this strategy, we used the P. falciparum genome which is characterized by large fluctuations of GC content, and an abundance of repetitive sequences and large highly homologous gene families . In the first step we calculated the top oligonucleotide for all Plasmodium genes using 162 different weight sets. These sets originate from all combinations of four weight intervals: wBLAST=[1,2,3,4,5,6,7,8,9], wGC=[1,2,3,4,5,6,7,8,9], wSW=[1,2], wLZ=. The broader intervals for uniqueness and CG content weight coefficients are intended to allow higher impact of these parameters on the final oligonucleotide selection. These adjustments are based on our previous observations, indicating that variations in GC content and uniqueness have a greater effect on specificity and efficiency of microarray hybridization compared to the other parameters [2, 17] and Bozdech et al unpublished data). The implementation of weight interval optimization strategy (formula 2) then facilitates the gene specific optimizations of the oligonucleotide selection with respect to the sequence properties of a particular gene.
Comparison with other programs
The comparison of designed oligonucleotides from different programs
E. coli K12 (4237 cds)
S. cerevisiae (6680 cds)
P. falciparum (5363 cds)
One of the unique features of the P. falciparum genome is the presence of several large highly homologous gene families whose role has been implicated in the antigenic variation including var (76 members), rifin (164 members) and stevor (34 members)[16, 18]. Table 2 indicates the number of unique oligonucleotides designed by all the four programs for these genes. OligoRankPick was capable of designing unique oligonucleotides for 234 genes (85.4%) of total 274 genes which by far exceeded the performance of the three other algorithms.
Design of a gene specific DNA microarray for P. falciparum
The oligonulceotide design of large gene families from different programs
Var family (Total No. 76)
Rifin family (Total No. 164)
Stevor family (Total No. 34)
To evaluate the level of uniqueness of the designed oligonucleotides we used the identical quality control criteria used for the weight optimization strategy which is consistent with previously established conditions of optimal microarray hybridization performance (see above). In total 9909 (97.5%) oligonucleotides passed the uniqueness criteria and 9795(96.4%) oligonucleotides were found to be in the range of 5% deviation from the GC content target value (31.4%) (figure 4). There are 9584 (94.7%) oligonucleotides meeting both criteria while only 275 oligonucleotides (2.7%) were outside of the ± 5% GC content interval and 257 oligonucleotides (2.5%) were not unique in the genome. Manual inspections of the MEFs represented by these oligonucleotides indicated that no suitable 70 nt window exists within these DNA fragments. The 257 non-unique oligonucleotides represented 193 genes (3.6% of total CDS) from which 67 genes belong to the large multigenic gene families, var,rifin and stevor. Pair-wise sequence homology analysis of these genes revealed that these genes do not contain any 70 nt window that shares less than 40% homology with any other member of the corresponding gene family and thus no unique oligonucleotide could be selected by any conceivable strategy. Interestingly for the remaining 185 (73.4%) members of these families a specific oligonucleotide was selected which further demonstrates the power of OligoRankPick for microarray design.
Transcriptome analysis of the trophozoite and schizont stages of P. falciparum
Although all parameters of the oligonucleotide microarray sets designed by OligoRankPick indicate their high quality, the ultimate evidence for their functionality can be provided only by physical microarray experiments. For this purpose we have synthesized all the 10166 oligonucleotides for the P. falciparum genome-wide microarray and spotted these onto polylysine-coated microscopic slides as previously described . Using these microarrays we compare the global mRNA patterns between two developmental stages of the P. falciparum intraerythrocytic development, trophozoite and schizont. All experimental procedures were carried out as previously described  and the complete results for three replicates of the microarray hybridizations are available in the supplementary data. The P. falciparum genome sequence reference strain 3D7 was used for this analysis. Total 4183 genes were found to be expressed in at least one of the studied developmental stages in three replicates of microarray hybridization. From these 1891 and 841 mRNA transcripts exhibited at least 2-fold higher abundance in the trophozoite and the schizont stage, respectively (see Additional file 4).
P. falciparum microarray data and their comparisons to existing transcriptomes
3-fold in at least two replicates
Present in the LOM-IDC transcriptome
*Same stage classification in LOM-IDC Transcriptome
Present in the HDSO-Affymetrix transcriptome
**Same stage classification in HDSO-Affymetrix transcriptome
Taken together these data demonstrate that the newly designed microarray for P. falciparum successfully recapitulates data from previous transcriptome analyses and has a potential to further expand on these results. Overall these data verify the improved performance of OligoRankPick in designing unique microarray elements for gene expression microarrays
The main goal of this work was to develop a microarray design algorithm which combines the thoroughness of the parameter optimization methods (such as CommOligo ) and performs with high computational efficiency of the earlier, cutoff based techniques (such as OligoArraySelector ). The newly developed algorithm, OligoRankPick, is the first method using a parameter optimization approach that is computationally fast and robust for genome-wide microarray design. The core principle of this technique consists of the rank transformations of the parameter scores and the subsequent weighted rank-sum strategy. This allowed us to eliminate all cutoff based filters that are typically applied to the input data (by existing optimization programs) or to partial oligonucleotide lists that are generated prior or during the decision-making step (in cutoff-based methods). Instead the derived rank-based system maintains all the oligonucleotide candidates in their rank order throughout the entire process. This approach removes any ambiguities in the selection process as all oligonucleotides are constantly prioritized based on their properties. Since no oligonucleotides are eliminated by arbitrary cutoffs, this method also significantly expands the genome coverage of the designed microarrays. The simplicity of the rank-based approach also allows the algorithm to perform gene specific optimizations of the weight coefficients in which the contribution of each parameter is modified based on the sequence properties of a particular gene. This is especially useful for optimal probe design in genes with extreme parameters distributions such as high AT content or high sequence homology to other genomic locus (low uniqueness). For example AT richness of some genes causes the GC content parameter to be over emphasized due to a stronger priority that is given to the GC rich oligonucleotide windows. This could force a selection of less unique oligonucleotides or oligonucleotides with complex secondary structure from these CG rich oligonucleotide candidates. The implementation of the gene specific optimizations is likely the most innovative approach introduced by this method because it generates a tighter distribution for each oligonucleotide parameter compared to other publicly available programs (figure 3). For general functionality we derived and validate optimal weight set intervals which could be applied to a wide range of genomes. The flexibility of the OligoRankPick package, however, allows the users to tune these setting for other specialized applications.
For the development and validation of OligoRankPick we design a new DNA microarray for the most lethal species of the human malaria parasites P. falciparum whose genome was completed in 2002 . We chose this genome for its extreme AT/GC distribution and high level of gene duplication do demonstrate the utility of the newly design program for its future applications. The average GC content in the P. falciparum genome is estimated 19.4% (23.7% in coding and 13.5% in non-coding sequences). For this design, however, we wished to select oligonucleotides with higher GC content to ensure higher Tm and thus specificity and selectivity of each probe. In addition the requirement for high GC content will help to select oligonucleotides with high sequence complexity as AT rich sequences in P. falciparum contain numerous short nucleotide repeats. As demonstrated in figure 3 OligoRankPick was able to design a set of oligonucleotides whose GC content is tightly distributed around 31.4%. At the same time high levels of uniqueness and sequence complexity and a low level of secondary structures were preserved in the vast majority of the probes. This feature of OligoRankPick will be particularly useful for microarray design of many organisms with extreme fluctuations in GC content such as Mycoplasma mycoides  and other bacterial species , other "AT rich" Plasmodium spp.  and Dictyostelium discoideum  or GC rich Leishmania spp. . The P. falciparum genome was found to contain a large number of duplicated genes sharing high levels of homology . The extreme examples are the three gene families (var, rifin, stevor) which are involved in the parasite virulence and are presently explored as potential molecular targets for malaria intervention strategies . Despite the high levels of homology amongst the individual members of these gene families, OligoRankPick was capable of designing specific oligonucleotide representative for 74.3% of these genes which by far exceeded the performance of the three tested publicly available programs. This improved performance will render OligoRankPick useful for studies of many organisms with highly homologous, biologically significant gene families ranging from microbial pathogens  to high eukaryotes .
OligoRankPick provides a powerful alternative for long oligonucleotide microarray design for genomes with extreme GC content fluctuations and high abundance of highly homologous gene families. In its simplest implementation a user needs only to define the probe length and an expected GC content or Tm. However, for specialized applications, OligoRankPick provides the user with the option of setting the range of relative importance (weight) of each parameter as well as optimization of the quality control target values. Using this method we have designed and assembled a next generation of long oligonucleotide DNA microarray for the main parasitic species of human malaria P. falciparum. Transcriptome analyses of two P. falciparum developmental stages demonstrated that the designed microarray provides the most comprehensive coverage of the P. falciparum genome presently available.
The oligonucleotide sequences and the transcriptome data are available from the supplementary file.
Genome sequences and annotations
The E. coli gene sequence file with 4237 CDSs and genomic sequence file were downloaded from the NCBI genome database. The S. cerevisiae gene sequence file with 6680 CDSs, and whole genome sequence file were downloaded from the ENSEMBL database. The P. falciparum protein coding sequence file with 5363 coding sequences (CDSs) and whole genomic sequence file were downloaded from PlasmoDB version 4.4 .
The OligoRankPick Program
The OligoRankPick is divided into two parts (two scripts). The first script (oligoblast.pl) is used to generate all possible oligonucleotides and their parsed BLAST results including its first, second and third best hybridization target (top three). The oligoblast.pl script can be run on different computers or a computer cluster using parallel processing methods such as mpiBLAST  and the results should be parsed according to the format of oligoblast.pl output. The second script (oligorankpick.pl) selects the optimal oligonucleotide for each sequence. There are four additional scripts which can be used to optimize the OligoRankPick package performance including masker.pl, used to mask the repeat sequence based on the NCBI dust program; GC_dis.pl, used to plot the GC content distribution of all oligonucleotides in the dataset in order to define a suitable GC content; fragmentation.pl, used to partition the long sequences to increase the oligonucleotide density in the coding sequences (see P. falciparum microarray design); simulation_ws.pl, used to modify the weight set file (wt_pool.opt) for special genomes.
Parameters of oligonucleotide measurements
For each input sequence OligoRankPick uses a sliding window of a given size (user setting, e.g. 70 nucleotides) to produce all possible oligonucleotides and calculates four parameters for each oligonucleotide: uniqueness (NCBI-BLAST score to its second best genomic target), GC content, secondary structure (reverse Smith-Waterman score, SW), and sequence complexity (LZ) (figure 1). (i) OligoRankPick uses the bit score of the second best match within the genome to calculate the level of oligonucleotide specificity using the NCBI-BLAST program version 2.0 . The input values for the BLAST algorithm are adjusted as follows, -e 1 (E-value < 1) and -b 20 (maximum output items = 20) to limit the computer-time consumption; -m 8 (tabular output is chosen for more efficient parsing). (ii) To ensure the uniformity of the hybridization temperature of the microarray elements, strict criteria for GC content are implemented. Perl script (GC_dis.pl) is provided to evaluate the GC distributions for all possible oligonucleotides in the input. Users can convert melting temperature (Tm) into GC content using the following formula : GC content = (Tm – 64.9)/41 + 600/(41*oligo length) . OligoRankPick calculates the absolute deviation of the oligonucleotide GC content (Tm) from the desired value for the final oligonucleotide selection. (iii) OligoRankPick uses the reverse Smith-Waterman algorithm with the PAM47 DNA matrix to calculate the optimal alignment score between the candidate oligonucleotide sequence and its reverse complement sequence  to avoid any complex secondary structures which can be detrimental to hybridization performance. (iv) OligoRankPick uses the compression score which is calculated by the Lempel-Ziv algorithm (LZ score), to avoid the presence of low-complexity sequences. These typically signal a presence of short nucleotide repeats that could result in significant non-specific cross-hybridizations. The use of the Lempel-Ziv (LZ) compression algorithm  was first introduced by Wright and Church  and further explored by ArrayOligoSelector . This approach was particularly useful in elimination of short repetitive sequences during the oligonucleotide design for the AT-rich P. falciparum genome that contains a large number continuous stretches of A and T nucleotides in both the non-coding and the coding regions.
Selection of optimal oligonucleotides by Rank-sum strategy
Where Wj is the weight of the j-th parameter (j = 1, 2, 3, 4), Rjk is rank score of j-th parameter of the k-th oligonucleotide (k = 1, ..., n). In the first step the rank-sum function selects the oligonucleotide with the minimal rank-sum (RS) as the candidate for one given weight set.
Where RSKi is the optimal selected oligonucleotide (K oligonucleotide) for weight set i, ∑wi is the sum of weights for weight set i. TO (T arget O ligonucleotide) is the final selected oligonucleotide. The optimization step (formula 2) is performed for all weight sets reflecting all combination of weight values in the input intervals. Oligonucleotides with the minimal RS from each weight set (RSKi) are transformed ("normalized") by the sum of weight values for the four parameters. The oligonucleotide with the minimum RS value is the optimal local solution of the rank-sum function in the given weight set interval (figure 1). This oligonucleotide is chosen as the final candidate.
Microarray hybridization and quantitative real-time PCR
Microarray hybridizations were conducted as previously described . Real time RT PCR was performed in a total reaction volume of 20 μl which contained 1 μl cDNA template (10 ng/ul), 0.5 μl forward and reverse primer (10μM), and 10 μl of 2 × Power SYBR Green PCR Master Mix (Applied Biosystems). The temple cDNA was generated using the first strand cDNA synthesis protocol used for the microarray hybridization. For the amplification the universal thermal cycling parameters were programmed as follows: 5 min activation at 95°C, followed by 40 cycles of 20s at 95°C, 30s at 50°C, 40s at 72°C and 1 min at 60°C. Each reaction was run in triplicates. The mRNA abundance ratios were calculated using ABI 7500 Fast Real-Time PCR Systems and the relative quantitation of gene expression was performed using the comparative CT method. Primers for PCR were designed using DNAMAN (Lynnon Corporation).
Availability and requirements
Project name: OligoRankPick;
Project home page: http://zblab.sbs.ntu.edu.sg/OligoRankPick;
Operating system: Linux;
Programming language: Perl, C;
Licence: GNU GPL.
This work was supported by A*STAR, Agency for Science, Technology and Research, Singapore and the Academic Research Council of the Ministry of Education, Singapore.
- Brown PO, Botstein D: Exploring the new world of the genome with DNA microarrays. Nat Genet 1999, 21(1 Suppl):33–37. 10.1038/4462View ArticlePubMedGoogle Scholar
- Kane MD, Jatkoe TA, Stumpf CR, Lu J, Thomas JD, Madore SJ: Assessment of the sensitivity and specificity of oligonucleotide (50mer) microarrays. Nucleic Acids Res 2000, 28(22):4552–4557. 10.1093/nar/28.22.4552PubMed CentralView ArticlePubMedGoogle Scholar
- Hughes TR, Mao M, Jones AR, Burchard J, Marton MJ, Shannon KW, Lefkowitz SM, Ziman M, Schelter JM, Meyer MR, Kobayashi S, Davis C, Dai H, He YD, Stephaniants SB, Cavet G, Walker WL, West A, Coffey E, Shoemaker DD, Stoughton R, Blanchard AP, Friend SH, Linsley PS: Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer. Nat Biotechnol 2001, 19(4):342–347. 10.1038/86730View ArticlePubMedGoogle Scholar
- Li F, Stormo GD: Selection of optimal DNA oligos for gene expression arrays. Bioinformatics 2001, 17(11):1067–1076. 10.1093/bioinformatics/17.11.1067View ArticlePubMedGoogle Scholar
- Bozdech Z, Zhu J, Joachimiak MP, Cohen FE, Pulliam B, DeRisi JL: Expression profiling of the schizont and trophozoite stages of Plasmodium falciparum with a long-oligonucleotide microarray. Genome Biol 2003, 4(2):R9. 10.1186/gb-2003-4-2-r9PubMed CentralView ArticlePubMedGoogle Scholar
- Wright MA, Church GM: An open-source oligomicroarray standard for human and mouse. Nat Biotechnol 2002, 20(11):1082–1083. 10.1038/nbt1102-1082View ArticlePubMedGoogle Scholar
- Rouillard JM, Zuker M, Gulari E: OligoArray 2.0: design of oligonucleotide probes for DNA microarrays using a thermodynamic approach. Nucleic Acids Res 2003, 31(12):3057–3062. 10.1093/nar/gkg426PubMed CentralView ArticlePubMedGoogle Scholar
- Wang X, Seed B: Selection of oligonucleotide probes for protein coding sequences. Bioinformatics 2003, 19(7):796–802. 10.1093/bioinformatics/btg086View ArticlePubMedGoogle Scholar
- Nielsen HB, Wernersson R, Knudsen S: Design of oligonucleotides for microarrays and perspectives for design of multi-transcriptome arrays. Nucleic Acids Res 2003, 31(13):3491–3496. 10.1093/nar/gkg622PubMed CentralView ArticlePubMedGoogle Scholar
- Reymond N, Charles H, Duret L, Calevro F, Beslon G, Fayard JM: ROSO: optimizing oligonucleotide probes for microarrays. Bioinformatics 2004, 20(2):271–273. 10.1093/bioinformatics/btg401View ArticlePubMedGoogle Scholar
- Nordberg EK: YODA: selecting signature oligonucleotides. Bioinformatics 2005, 21(8):1365–1370. 10.1093/bioinformatics/bti182View ArticlePubMedGoogle Scholar
- Boyer LA, Lee TI, Cole MF, Johnstone SE, Levine SS, Zucker JP, Guenther MG, Kumar RM, Murray HL, Jenner RG, Gifford DK, Melton DA, Jaenisch R, Young RA: Core transcriptional regulatory circuitry in human embryonic stem cells. Cell 2005, 122(6):947–956. 10.1016/j.cell.2005.08.020PubMed CentralView ArticlePubMedGoogle Scholar
- Carter MG, Sharov AA, VanBuren V, Dudekula DB, Carmack CE, Nelson C, Ko MS: Transcript copy number estimation using a mouse whole-genome oligonucleotide microarray. Genome Biol 2005, 6(7):R61. 10.1186/gb-2005-6-7-r61PubMed CentralView ArticlePubMedGoogle Scholar
- Tolstrup N, Nielsen PS, Kolberg JG, Frankel AM, Vissing H, Kauppinen S: OligoDesign: Optimal design of LNA (locked nucleic acid) oligonucleotide capture probes for gene expression profiling. Nucleic Acids Res 2003, 31(13):3758–3762. 10.1093/nar/gkg580PubMed CentralView ArticlePubMedGoogle Scholar
- Li X, He Z, Zhou J: Selection of optimal oligonucleotide probes for microarrays using multiple criteria, global alignment and parameter estimation. Nucleic Acids Res 2005, 33(19):6114–6123. 10.1093/nar/gki914PubMed CentralView ArticlePubMedGoogle Scholar
- Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S, Paulsen IT, James K, Eisen JA, Rutherford K, Salzberg SL, Craig A, Kyes S, Chan MS, Nene V, Shallom SJ, Suh B, Peterson J, Angiuoli S, Pertea M, Allen J, Selengut J, Haft D, Mather MW, Vaidya AB, Martin DM, Fairlamb AH, Fraunholz MJ, Roos DS, Ralph SA, McFadden GI, Cummings LM, Subramanian GM, Mungall C, Venter JC, Carucci DJ, Hoffman SL, Newbold C, Davis RW, Fraser CM, Barrell B: Genome sequence of the human malaria parasite Plasmodium falciparum. Nature 2002, 419(6906):498–511. 10.1038/nature01097View ArticlePubMedGoogle Scholar
- He Z, Wu L, Li X, Fields MW, Zhou J: Empirical establishment of oligonucleotide probe design criteria. Appl Environ Microbiol 2005, 71(7):3753–3760. 10.1128/AEM.71.7.3753-3760.2005PubMed CentralView ArticlePubMedGoogle Scholar
- Kyes S, Horrocks P, Newbold C: Antigenic variation at the infected red cell surface in malaria. Annu Rev Microbiol 2001, 55: 673–707. 10.1146/annurev.micro.55.1.673View ArticlePubMedGoogle Scholar
- DeRisi JL, Iyer VR, Brown PO: Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 1997, 278(5338):680–686. 10.1126/science.278.5338.680View ArticlePubMedGoogle Scholar
- Bozdech Z, Llinas M, Pulliam BL, Wong ED, Zhu J, DeRisi JL: The transcriptome of the intraerythrocytic developmental cycle of Plasmodium falciparum. PLoS Biol 2003, 1(1):E5. 10.1371/journal.pbio.0000005PubMed CentralView ArticlePubMedGoogle Scholar
- Le Roch KG, Zhou Y, Blair PL, Grainger M, Moch JK, Haynes JD, De La Vega P, Holder AA, Batalov S, Carucci DJ, Winzeler EA: Discovery of gene function by expression profiling of the malaria parasite life cycle. Science 2003, 301(5639):1503–1508. 10.1126/science.1087025View ArticlePubMedGoogle Scholar
- Westberg J, Persson A, Holmberg A, Goesmann A, Lundeberg J, Johansson KE, Pettersson B, Uhlen M: The genome sequence of Mycoplasma mycoides subsp. mycoides SC type strain PG1T, the causative agent of contagious bovine pleuropneumonia (CBPP). Genome Res 2004, 14(2):221–227. 10.1101/gr.1673304PubMed CentralView ArticlePubMedGoogle Scholar
- Parkhill J, Sebaihia M, Preston A, Murphy LD, Thomson N, Harris DE, Holden MT, Churcher CM, Bentley SD, Mungall KL, Cerdeno-Tarraga AM, Temple L, James K, Harris B, Quail MA, Achtman M, Atkin R, Baker S, Basham D, Bason N, Cherevach I, Chillingworth T, Collins M, Cronin A, Davis P, Doggett J, Feltwell T, Goble A, Hamlin N, Hauser H, Holroyd S, Jagels K, Leather S, Moule S, Norberczak H, O'Neil S, Ormond D, Price C, Rabbinowitsch E, Rutter S, Sanders M, Saunders D, Seeger K, Sharp S, Simmonds M, Skelton J, Squares R, Squares S, Stevens K, Unwin L, Whitehead S, Barrell BG, Maskell DJ: Comparative analysis of the genome sequences of Bordetella pertussis, Bordetella parapertussis and Bordetella bronchiseptica. Nat Genet 2003, 35(1):32–40. 10.1038/ng1227View ArticlePubMedGoogle Scholar
- Carlton JM, Angiuoli SV, Suh BB, Kooij TW, Pertea M, Silva JC, Ermolaeva MD, Allen JE, Selengut JD, Koo HL, Peterson JD, Pop M, Kosack DS, Shumway MF, Bidwell SL, Shallom SJ, van Aken SE, Riedmuller SB, Feldblyum TV, Cho JK, Quackenbush J, Sedegah M, Shoaibi A, Cummings LM, Florens L, Yates JR, Raine JD, Sinden RE, Harris MA, Cunningham DA, Preiser PR, Bergman LW, Vaidya AB, van Lin LH, Janse CJ, Waters AP, Smith HO, White OR, Salzberg SL, Venter JC, Fraser CM, Hoffman SL, Gardner MJ, Carucci DJ: Genome sequence and comparative analysis of the model rodent malaria parasite Plasmodium yoelii yoelii. Nature 2002, 419(6906):512–519. 10.1038/nature01099View ArticlePubMedGoogle Scholar
- Glockner G, Eichinger L, Szafranski K, Pachebat JA, Bankier AT, Dear PH, Lehmann R, Baumgart C, Parra G, Abril JF, Guigo R, Kumpf K, Tunggal B, Cox E, Quail MA, Platzer M, Rosenthal A, Noegel AA: Sequence and analysis of chromosome 2 of Dictyostelium discoideum. Nature 2002, 418(6893):79–85. 10.1038/nature00847View ArticlePubMedGoogle Scholar
- Ivens AC, Peacock CS, Worthey EA, Murphy L, Aggarwal G, Berriman M, Sisk E, Rajandream MA, Adlem E, Aert R, Anupama A, Apostolou Z, Attipoe P, Bason N, Bauser C, Beck A, Beverley SM, Bianchettin G, Borzym K, Bothe G, Bruschi CV, Collins M, Cadag E, Ciarloni L, Clayton C, Coulson RM, Cronin A, Cruz AK, Davies RM, De Gaudenzi J, Dobson DE, Duesterhoeft A, Fazelina G, Fosker N, Frasch AC, Fraser A, Fuchs M, Gabel C, Goble A, Goffeau A, Harris D, Hertz-Fowler C, Hilbert H, Horn D, Huang Y, Klages S, Knights A, Kube M, Larke N, Litvin L, Lord A, Louie T, Marra M, Masuy D, Matthews K, Michaeli S, Mottram JC, Muller-Auer S, Munden H, Nelson S, Norbertczak H, Oliver K, O'Neil S, Pentony M, Pohl TM, Price C, Purnelle B, Quail MA, Rabbinowitsch E, Reinhardt R, Rieger M, Rinta J, Robben J, Robertson L, Ruiz JC, Rutter S, Saunders D, Schafer M, Schein J, Schwartz DC, Seeger K, Seyler A, Sharp S, Shin H, Sivam D, Squares R, Squares S, Tosato V, Vogt C, Volckaert G, Wambutt R, Warren T, Wedler H, Woodward J, Zhou S, Zimmermann W, Smith DF, Blackwell JM, Stuart KD, Barrell B, Myler PJ: The genome of the kinetoplastid parasite, Leishmania major. Science 2005, 309(5733):436–442. 10.1126/science.1112680PubMed CentralView ArticlePubMedGoogle Scholar
- Rowe JA, Kyes SA: The role of Plasmodium falciparum var genes in malaria in pregnancy. Mol Microbiol 2004, 53(4):1011–1019. 10.1111/j.1365-2958.2004.04256.xPubMed CentralView ArticlePubMedGoogle Scholar
- Stringer JR, Keely SP: Genetics of surface antigen expression in Pneumocystis carinii. Infect Immun 2001, 69(2):627–639. 10.1128/IAI.69.2.627-639.2001PubMed CentralView ArticlePubMedGoogle Scholar
- Harrison PM, Gerstein M: Studying genomes through the aeons: protein families, pseudogenes and proteome evolution. J Mol Biol 2002, 318(5):1155–1174. 10.1016/S0022-2836(02)00109-2View ArticlePubMedGoogle Scholar
- ENSEMBL: .[http://www.ensembl.org]
- PlasmoDB: .[http://www.plasmodb.org]
- mpiBLAST: .[http://mpiblast.lanl.gov/]
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25(17):3389–3402. 10.1093/nar/25.17.3389PubMed CentralView ArticlePubMedGoogle Scholar
- Schildkraut C: Dependence of the melting temperature of DNA on salt concentration. Biopolymers 1965, 3(2):195–208. 10.1002/bip.360030207View ArticlePubMedGoogle Scholar
- Smith TF, Waterman MS: Identification of common molecular subsequences. J Mol Biol 1981, 147(1):195–197. 10.1016/0022-2836(81)90087-5View ArticlePubMedGoogle Scholar
- Ziv J LA: A universal algorithm for sequential data compression. IIEEE TRANSACTIONS ON INFORMATION THEORY 1977, 23(3):3389–3402.Google Scholar
- Chou CC, Chen CH, Lee TT, Peck K: Optimization of probe length and the number of probes per gene for optimal microarray analysis of gene expression. Nucleic Acids Res 2004, 32(12):e99. 10.1093/nar/gnh099PubMed CentralView ArticlePubMedGoogle Scholar
- Letowski J, Brousseau R, Masson L: Designing better probes: effect of probe size, mismatch position and number on hybridization in DNA oligonucleotide microarrays. J Microbiol Methods 2004, 57(2):269–278. 10.1016/j.mimet.2004.02.002View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.