Background correction using dinucleotide affinities improves the performance of GCRMA
© Gharaibeh et al; licensee BioMed Central Ltd. 2008
Received: 27 March 2008
Accepted: 23 October 2008
Published: 23 October 2008
High-density short oligonucleotide microarrays are a primary research tool for assessing global gene expression. Background noise on microarrays comprises a significant portion of the measured raw data, which can have serious implications for the interpretation of the generated data if not estimated correctly.
We introduce an approach to calculate probe affinity based on sequence composition, incorporating nearest-neighbor (NN) information. Our model uses position-specific dinucleotide information, instead of the original single nucleotide approach, and adds up to 10% to the total variance explained (R2) when compared to the previously published model. We demonstrate that correcting for background noise using this approach enhances the performance of the GCRMA preprocessing algorithm when applied to control datasets, especially for detecting low intensity targets.
Modifying the previously published position-dependent affinity model to incorporate dinucleotide information significantly improves the performance of the model. The dinucleotide affinity model enhances the detection of differentially expressed genes when implemented as a background correction procedure in GeneChip preprocessing algorithms. This is conceptually consistent with physical models of binding affinity, which depend on the nearest-neighbor stacking interactions in addition to base-pairing.
Affymetrix GeneChip arrays are one of the most popular gene expression array systems used by researchers worldwide . The purpose of an expression microarray experiment is to measure the abundance of each known transcript in the sample under investigation. Abundance is inferred from the signal generated by a set of 11–20 probe pairs. Each pair is composed of a perfect match probe (PM), which exactly complements a region on the transcript, and a mismatch probe (MM), which is identical to the PM probe except at the 13th base, where the reverse complement nucleotide is introduced . The fluorescent signal from each probe, however, includes background noise that not only measures the transcript abundance, but also non-specific binding (NSB) and autofluorescence of the chip surface. MM probes were originally introduced by Affymetrix to measure background noise. It has been shown by many groups that MM probes contain significant amount of the PM signal and are therefore unreliable as estimators of background noise [3–5].
A gene expression experiment using the Affymetrix GeneChip system usually involves a design step, a preprocessing step, an inference step and finally, a validation step . The preprocessing step is of special importance; preprocessing transforms the raw fluorescence signals from each probe in a probeset into a composite gene expression value. The main goal of the preprocessing step is to remove non-biological variation from the raw data . Usually, the preprocessing step in Affymetrix GeneChip array analysis includes three main treatments of the raw data. A background adjustment step separates the specific signal from the non-specific signal. A probe-level normalization step then removes non-biological variation between arrays. Finally, a summarization step generates a single expression value for each gene from its corresponding probeset. The method described in this manuscript is an implicit physical model that modifies the background adjustment step.
Background noise and non-biological variation of the signal generated from each probe are common phenomena in GeneChip microarray experiments [7, 8]. The differences in the signal produced can be attributed to many sources: optical noise, cross-hybridization, dye-related contributions and probe sequence composition. Many preprocessing algorithms have been developed in an attempt to correct for these artifacts . According to Allison et al.  there is no clear winner among the available preprocessing algorithms. However, GCRMA , a modification of RMA , often performs as well as or better than other algorithms [9, 12–14]. GCRMA incorporates probe sequence composition into background adjustment, following the physical model of Naef and Magnasco . The model describes a probe affinity that is dependent on its base composition and the position of each base along the probe and suggests that probe sequence can significantly affect the intensity of the signal generated from that probe, independent of the concentration of its target.
Performance assessment of GCRMA has been done using both spike-in [13, 16, 17] and real  datasets followed by quantitative real time PCR confirmation . So far, a number of reports have been published recommending the use of GCRMA for detecting differentially expressed genes and estimating relative expression, emphasizing its outstanding performance in detecting low-intensity, differentially expressed genes [13, 17]. When comparing microarray analysis algorithms, Irizarry et al.  have argued for an approach that balances accuracy and precision. Irizarry et al., define accuracy as the ability of the algorithm to detect the relative expression of a transcript without bias to its abundance (concentration). They define precision as low variance; this is characterized by a steady performance on replicates of the same sample. GCRMA is among the few preprocessing algorithms that scores well in both accuracy and precision .
In this study, we modified the portion of GCRMA derived from the model of Naef and Magnasco  to calculate probe affinity using position-specific dinucleotide information. The dinucleotide is a fundamental chemical unit that contributes a well-understood component to nucleic acid duplex stability and to the free energy of duplex formation during hybridization [18, 19]. We applied the new model to different datasets, and achieved an improved fit to microarray data with R2 increasing by 5–10%. Then, we tested the downstream effect of our modified background model on the performance of GCRMA in detecting differentially expressed genes, when used to analyze two publicly available control datasets: the human genome U133 Latin Square dataset  and the golden spikein dataset . In both data sets, application of the dinucleotide model in background correction improved the detection of differentially expressed genes. Therefore, we propose that probe affinity be modeled based on dinucleotide composition of the probe instead of the original single nucleotide approach.
Dinucleotide affinity model
Equation 1 is a simple model that has four free parameters for each probe base (100 free parameters for a 25-base probe). The values of these 100 free parameters are generated by linear least squares fit. Given the large number of probes on each chip (about half a million for the human genome U133 chip, for example) over-fitting is not a concern.
By assuming the affinities can be modeled as a third order polynomial function of position, the number of free parameters in the model can be reduced from 100 to 16 with little loss of predictive accuracy as the polynomial generated with 16 parameters (Fig. 1 solid lines) closely matches the 100 independently estimated parameters (Fig. 1 symbols) and the R2 of both models are similar (additional file 1).
Note that we do not explicitly fit the stacking energies of the NN pairs; rather we explicitly fit the NN pairs' affinities along the probe sequence position.
Dinucleotide model performance on different datasets
Single nucleotide model (eq. 1)
Dinucleotide model (eq. 4)c
Latin Square 
0.17 ± 0.01
0.22 ± 0.01
0.40 ± 0.01
0.50 ± 0.01
Golden spikein 
0.20 ± 0.02
0.22 ± 0.02
0.46 ± 0.02
0.51 ± 0.02
0.49 ± 0.06
0.55 ± 0.07
0.60 ± 0.04
0.69 ± 0.04
Etoposide response 
0.05 ± 0.04
0.08 ± 0.06
0.11 ± 0.06
0.16 ± 0.08
0.09 ± 0.04
0.13 ± 0.04
0.29 ± 0.050
0.36 ± 0.06
Given that our fits contain between 195,994 and 496,468 data points (Table 1), it seems unlikely that the improvements in performance of our model could be explained by the additional free parameters (64 for our model vs. 16 for the original Naef and Magnasco model). Nonetheless, to rule out this possibility, we fitted both the single nucleotide model (N) (using the 100 free parameters and 16 free parameter version of the Naef and Magnasco model, equation 1 and 2, respectively) and the dinucleotide model (NN) with 64 free parameters (equation 4) to the Latin Square dataset using completely random probe sequences (generated with an equal probability of A, C, G and T). We also performed the same test on shuffled probe sequences in which the probe's base composition is not affected, but the position of each base has been changed due to the shuffling process. The results of this analysis are shown in additional file 1. We see that the R2 of the shuffled and random probe sequences are nearly identical, no matter which method is used. The presence of additional free parameters in our model, therefore, cannot by itself explain the improved performance over the Naef and Magnasco model. This strongly supports our argument that the gain in the r-squared values of the NN model comes from including dinucleotide information and does not arise trivially from the addition of free parameters.
Background adjustment using dinucleotide affinity model
Using a more accurate estimate of background noise should improve the quality of Affymetrix GeneChip data. Given the better fits observed using the dinucleotide affinity model, we expected it to improve the analysis results to some degree when applied to control datasets. We tested the downstream effects of using this model on the quality of microarray data. We chose to implement the model within GCRMA , since it already has the single nucleotide model implemented in its background correction procedure, and therefore the two models could be directly compared.
where O is the optical noise, N is the background noise of non-specific binding, and S is the signal generated from specific binding between the probe and its intended target. The parameter φ reflects the fact that for some probe pairs, the MM signal may contain specific signal. The background components log(N PM ) and log(N MM ) are assumed to follow a bivariate distribution with means of μ pm = h(α PM ) and μ mm = h(α MM ), where h is a smoothing function and α (probe affinity) is defined by equation 1. In this paper, we make these same assumptions, but we derive α using equation 4.
We reasoned that GCRMA with background correction using the dinucleotide model, which we will subsequently refer to as GCRMA-NN in this paper, would perform better than the native GCRMA model. It is important to clarify that GCRMA offers two options for background correction, the first of which uses a precomputed α (called reference affinity) from the authors' own non-specific binding (NSB) experiments, while the second computes α directly from the data (called local affinity). In the following figures, we compare GCRMA-NN (where α is computed directly from the data using equation 4) to GCRMA-L (GCRMA with local affinity) and GCRMA-R (GCRMA with reference affinity).
Latin square dataset
We obtained expression measures for the Human Genome U133 Latin square dataset after processing it with GCRMA-R, GCRMA-L and GCRMA-NN. The three expression measures were evaluated using two approaches. The first approach is based on AffyComp , a performance evaluation tool for preprocessing algorithms (see below). The second approach is based on the number of true positives captured for all the 14 2× comparisons of the Latin square dataset at a cutoff of four false positives after using the cyber t test . Cyber t is a popular variant of the t test, in which a weighted standard deviation replaces the conventional standard deviation and an adjusted number of degrees of freedom is used instead of the conventional degrees of freedom.
AffyComp scores for GCRMA-L, GCRMA-R and GCRMA-NN
null log-fc IQR
null log-fc 99.9%
Signal detect slope
Signal detect R2
weighted avg AUC
Examining Table 2 shows that the increase comes mainly from the AUC for low intensity targets (low AUC entry in Table 2). The low intensity genes make up most of the genes in a typical Affymetrix experiment  and are also the hardest to detect. Algorithms that perform inference generally can detect large changes involving highly expressed genes. It is much more difficult to detect changes in the more frequently observed genes that produce low intensities on the array. GCRMA-NN enhanced the detection of low intensity targets, while maintaining similar values for the medium and high intensity ones. The enhancement in detecting low intensity targets is also evident in the form of an increase in the low detection slope (low.slope entry in Table 2).
Golden spikein dataset
Background estimation and correction are important steps in analyzing the data generated by GeneChip arrays. Improving algorithms for these steps increases the amount of true "signal" that we can detect from microarrays. Understanding background noise on GeneChip arrays, especially the part contributed by NSB signal, requires a deeper understanding of the behavior of on-chip hybridization. Given that we lack a detailed physical model of on-chip hybridization derived from first principles, an empirical model that estimates the specific and non-specific signal based on the data on the array and probe sequence is a useful tool for understanding the on-chip hybridization process.
Nucleic acid hybridization in solution is well approximated by the nearest neighbor model , which describes duplex formation as a function of the two adjacent nucleotides and their stacking orientation. This approach was used by Zhang et al.  to model the on-chip specific and nonspecific hybridization using the free energy formation for the adjacent nucleotides. Zhang et al. concluded that the on-chip hybridization parameters are different than the solution ones. Using a different approach to background correction, Naef and Magnasco  used single nucleotides to assign an overall affinity score for a probe based on its sequence away from the energy contributions of the dinucleotide pairs. This approach was used to perform background correction for the GCMRA algorithm  while the Zhang et al approach was used to create the algorithm PerfectMatch . PerfectMatch estimates the signal and the background at the same step while GCRMA estimates background noise first then proceed to signal estimation. PerfectMatch is, therefore, much more computationally demanding than GCRMA as the parameter space searched by PerfectMatch is vast and is sampled with Monte Carlo methods. Direct comparison of GCRMA and PerfectMatch has proven controversial. Such a comparison is beyond the scope of this report, and can be found elsewhere [9, 13, 25].
In this report we combine some elements of GCRMA and PerfectMatch. We replace the single nucleotide model of Naef and Magnasco with a model in which the affinity of each probe is a function of its dinucleotide composition. Because we use GCRMA's approach of separating estimates of background and signal, we can use a linear model and avoid the Monte Carlo simulation approach of PerfectMatch . Our approach is therefore both computationally more efficient and guarantees the best fit to the data. This approach enables us to examine the contribution of different dinucleotides at different positions to the raw probe signal (Fig. 2), rather than assigning one weight function to all the dinucleotides, as is done with PerfectMatch . This allows our model to capture several important features of the background data such as the effect of the first versus the second nucleotide on probe affinity (e.g. CA vs. CG), and the effect of the stacking orientation (AC vs. CA). In general, we find that the dinucleotide approach has more power than the single nucleotide approach over a wide range of datasets (Table 1).
The mechanism that determines why particular dinucleotides affect probe affinities the way they do is, in some cases, unclear. However, we observe that the NN model bears some similarities to the models of both Naef and Magnasco and Zhang et al. All three models emphasize the importance of the probe middle region; this is probably due to the surface attachment, as well as to the relative instability of the free end in RNA-DNA hybridization. The effect of the stacking orientation is in agreement with the findings of Zhang et al. . The AN versus CN (where N refers to any of the four nucleotides: A, C, G, T; AN for example means AA, AC, AG and AT) asymmetry (Fig. 2A and 2B) is in agreement with Naef and Magnasco . When comparing these affinity curves to the original Naef and Magnasco result, it is important to recognize that the NN model considers the affinity of dinucleotides rather than single nucleotides. Therefore, we do not necessarily expect to see the same asymmetry within CN or AN, i.e. there will be no asymmetry between CA and CC (Fig. 2B), or between AA and AC (Fig. 2A). The NN model, however, does show unexpected behavior for the GN and TN dinucleotides. While both G and T show slight asymmetry in the Naef and Magnasco model, the effect of these two nucleotides is magnified in the NN model. GN contributes positively to the signal but not when the second nucleotide is C (Fig. 2C). TN contributes negatively but not when the second nucleotide is A (Fig. 2D). This trend is partially explained by the fact that T forms fewer hydrogen bonds than G, therefore contributing negatively, while the G has stronger binding, thus contributing positively. This trend is not consistent, and appears to be dependent on the adjacent nucleotide. It could also be due to the biotin label present on the RNA target sequence.
When applied to two control datasets, GCRMA-NN showed improved performance (Figs. 4, 5) especially on low intensity targets (Table 2; Fig. 3). We argue that this is due to better background correction for these targets; a higher percentage of low intensity signal will be made up of background, so it is therefore not surprising that better background correction will make more of a difference on low intensity targets. The detection of low intensity targets represent the most significant challenge to microarray analysis algorithms, which makes any enhancement in the detection of these targets significant.
Incorporating dinucleotide information into a previously described probe affinity model increases the fit of the model by 5–10%. The dinucleotide affinities highlight the importance of the stacking orientation on probe behavior. This is in agreement with the physical models that describe hybridization binding affinities.
The results presented here show that the affinity of any single nucleotide is affected by its neighbor, in addition to its location along the probe. Considering the second nucleotide offers more insights into the on-chip behavior of the four bases in relation to each other. Such insights are important to develop a better understanding of the on-chip hybridization process and therefore better analysis procedures. The model described here enhances the performance of an existing widely-used preprocessing algorithm for GeneChip data. We expect the same model to enhance the performance of preprocessing algorithm for other types of arrays, in particular those used for SNP analysis.
The U133 Latin square dataset
This dataset is composed of 14 experiments (three technical replicates for each experiment) in which 42 transcripts are spiked at a concentration range of 0.125–512 pM following a Latin square design. The dataset files were downloaded from Affymetrix web site . For AffyComp analysis, all probesets were included. For the 14 2× comparisons the following probesets were excluded following Affymetrix recommendations: 209374_s_at, 205397_x_at, 208010_s_at. In addition, we excluded any probesets with a name starting with AFFX- that was not included in the 42 true positive spikeins.
The Golden spikein dataset
This dataset has more spikein genes than the Latin Square dataset, but consists of only six microarrays, 3 C (control) and three S (spikein) . The S pool contains cRNA at concentration equal to or higher than the C pool . Each pool was hybridized to the Affymetrix Drosophila array (three technical replicates for each hybridization). Probesets measuring spikein transcripts were determined based on the analysis of . We considered all probeset that measure differentially expressed genes to be true positives (a total of 1353 probesets).
Several issues have been raised concerning the use of the Golden spikein dataset in validating GeneChip preprocessing algorithms [26–28]. However, the analysis of Pearson  shows clearly that the Golden spikein dataset can be used to validate and compare the performance of GeneChip preprocessing algorithms.
The single nucleotide model was implemented in Perl , the dinucleotide model was implemented in Java. All the models were fitted using the least squares method. The fitted parameters for the dinucleotide model for each of the two datasets were used to generate an affinity.info matrix for that dataset. This affinity.info matrix was used in GCRMA analysis later on. Affinity.info matrix generation was done using a local R script following the steps found in GCRMA source code (see http://webpages.uncc.edu/~rgharaib/nnfit). The Java code for the dinucleotide model is provided at http://webpages.uncc.edu/~rgharaib/nnfit/FitNN.zip.
Expression summaries were generated using the full model of GCRMA version 2.8.1. The commands used to generate the summaries for GCRMA-NN, GCRMA-L and GCRMA-R can be found at http://webpages.uncc.edu/~rgharaib/nnfit. The affinity.info matrix for the U133 Latin square dataset is provided as http://webpages.uncc.edu/~rgharaib/nnfit/U133NNAffinity.RData, and the Golden spikein dataset affinity.info matrix is provided as http://webpages.uncc.edu/~rgharaib/nnfit/GoldenSpikeinNNAffinity.RData.
Affyomp analysis was done using a locally installed AffyComp 1.14.0 package. All expression summaries were converted back from the log scale to the original scale and formatted to a comma-delimited text files using a local Perl script. Metrics generation for the expression summaries was done using a local R script following the directions of the package maintainers. The following metrics were used to evaluate the performance of each algorithm (definitions are according to Affycomp website ): Median SD is the median standard deviation across replicates. It measures the consistency of the algorithm; the lower the median SD the more consistent the algorithm. Null log-fc IQR and null log-fc 99.9% are the interquartile range and the 99.9th percentile of the log fold changes from probesets, for genes that should not change. A perfect score is 0 for both metrics. Signal detect slope is the slope obtained from regressing expression values on nominal concentrations in the spikein data. Signal detect R2 is the R squared obtained from regressing expression values on nominal concentrations in the spikein data. Low.slope, med.slope and high.slope are as in signal detect slope, but for probesets targeting low, medium and high spikeins, respectively. Obs-intended-fc and Obs-(low)int-fc slopes are slopes obtained from regressing observed log fold changes against nominal log fold changes for all probesets, and for those with nominal concentration less than 2 pM, respectively. Low, med and high AUC reflect the area under the ROC curve (with up to 100 false positives) for spikeins with low, medium and high intensities, standardized so that optimum is 1, respectively. Weighted avg AUC is the weighted average of the previous three ROC curves with weights related to amount of data in each class (low, medium and high).
ROC curve and cyber t analysis
ROC curve generation was implemented in Java and cyber t analysis was done in R. Detailed description of the implementation and the analysis can be found here .
This research was supported in part by NIH 1R01GM072619-01 (C.J.G.) and by the UNC-Charlotte GASP program (R.Z.G.).
- Lockhart DJ, Dong H, Byrne MC, Follettie MT, Gallo MV, Chee MS, Mittmann M, Wang C, Kobayashi M, Horton H, Brown EL: Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol 1996, 14(13):1675–1680. 10.1038/nbt1296-1675View ArticlePubMedGoogle Scholar
- Dalma-Weiszhausz DD, Warrington J, Tanimoto EY, Miyada CG: The affymetrix GeneChip platform: an overview. Methods in enzymology 2006, 410: 3–28. 10.1016/S0076-6879(06)10001-4View ArticlePubMedGoogle Scholar
- Chudin E, Walker R, Kosaka A, Wu S, Rabert D, Chang T, Kreder D: Assessment of the relationship between signal intensities and transcript concentration for Affymetrix GeneChip(R) arrays. Genome Biol 2001, 3(1):research0005.0001-research0005.0010. 10.1186/gb-2001-3-1-research0005View ArticleGoogle Scholar
- Forman JE, Walton ID, Stern D, Rava RP, Trulson MO: Thermodynamics of duplex formation and mismatch discrimination on photolithographically synthesized oligonucleotide arrays. ACS Symp Ser 1998, 682: 206–228.View ArticleGoogle Scholar
- Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostat 2003, 4(2):249–264. 10.1093/biostatistics/4.2.249View ArticleGoogle Scholar
- Allison DB, Cui X, Page GP, Sabripour M: Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet 2006, 7(1):55–65. 10.1038/nrg1749View ArticlePubMedGoogle Scholar
- Nielsen HB, Gautier L, Knudsen S: Implementation of a gene expression index calculation method based on the PDNN model. Bioinformatics 2005, 21(5):687–688. 10.1093/bioinformatics/bti078View ArticlePubMedGoogle Scholar
- Li C, Wong WH: Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc Natl Acad Sci USA 2001, 98(1):31–36. 10.1073/pnas.011404098PubMed CentralView ArticlePubMedGoogle Scholar
- Irizarry RA, Wu Z, Jaffee HA: Comparison of Affymetrix GeneChip expression measures. Bioinformatics 2006, 22(7):789–794. 10.1093/bioinformatics/btk046View ArticlePubMedGoogle Scholar
- Wu Z, Irizarry R, Gentleman R, Murillo FM, Spencer F: A Model-Based Background Adjustment for Oligonucleotide Expression Arrays. Journal of the American Statistical Association 2004, 99: 909–917. 10.1198/016214504000000683View ArticleGoogle Scholar
- Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP: Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res 2003, 31(4):e15. 10.1093/nar/gng015PubMed CentralView ArticlePubMedGoogle Scholar
- Qin LX, Beyer RP, Hudson FN, Linford NJ, Morris DE, Kerr KF: Evaluation of methods for oligonucleotide array data via quantitative real-time PCR. BMC bioinformatics 2006, 7: 23. 10.1186/1471-2105-7-23PubMed CentralView ArticlePubMedGoogle Scholar
- Wu Z, Irizarry RA: Preprocessing of oligonucleotide array data. Nat Biotechnol 2004, 22(6):656–658. author reply 658. 10.1038/nbt0604-656bView ArticlePubMedGoogle Scholar
- Vardhanabhuti S, Blakemore SJ, Clark SM, Ghosh S, Stephens RJ, Rajagopalan D: A comparison of statistical tests for detecting differential expression using Affymetrix oligonucleotide microarrays. Omics 2006, 10(4):555–566. 10.1089/omi.2006.10.555View ArticlePubMedGoogle Scholar
- Naef F, Magnasco MO: Solving the riddle of the bright mismatches: labeling and effective binding in oligonucleotide arrays. Phys Rev E Stat Nonlin Soft Matter Phys 2003, 68(1 Pt 1):011906.View ArticlePubMedGoogle Scholar
- Choe SE, Boutros M, Michelson AM, Church GM, Halfon MS: Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset. Genome Biol 2005, 6(2):R16. 10.1186/gb-2005-6-2-r16PubMed CentralView ArticlePubMedGoogle Scholar
- Schuster E, Blanc E, Partridge L, Thornton J: Estimation and correction of non-specific binding in a large-scale spike-in experiment. Genome Biology 2007, 8(6):R126. 10.1186/gb-2007-8-6-r126PubMed CentralView ArticlePubMedGoogle Scholar
- SantaLucia J Jr: A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. Proc Natl Acad Sci USA 1998, 95(4):1460–1465. 10.1073/pnas.95.4.1460PubMed CentralView ArticlePubMedGoogle Scholar
- SantaLucia J Jr, Hicks D: The thermodynamics of DNA structural motifs. Annu Rev Biophys Biomol Struct 2004, 33: 415–440. 10.1146/annurev.biophys.32.110601.141800View ArticlePubMedGoogle Scholar
- The human genome U133 Latin Square dataset[http://www.affymetrix.com/support/technical/sample_data/datasets.affx]
- Cope L, Irizarry R, Jafee HW, Speed TP: A benchmark for Affymetrix GeneChip expression measures. Bioinformatics 2003, 1(1):1–13.Google Scholar
- Baldi P, Long AD: A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics 2001, 17: 509–519. 10.1093/bioinformatics/17.6.509View ArticlePubMedGoogle Scholar
- Bloomfield VA, Crothers DM, Tinoco I: Nucleic acids : structures, properties, and functions. Sausalito, Calif.: University Science Books; 2000.Google Scholar
- Zhang L, Miles MF, Aldape KD: A model of molecular interactions on short oligonucleotide microarrays. Nat Biotechnol 2003, 21(7):818–821. 10.1038/nbt836View ArticlePubMedGoogle Scholar
- Zhang L, Wu C, Carta R, Baggerly K, Coombes KR: Response to Preprocessing of oligonucleotide array data. Nat Biotechnol 2004, 22(6):658. 10.1038/nbt0604-658View ArticlePubMedGoogle Scholar
- Irizarry RA, Cope LM, Wu Z: Feature-level exploration of a published Affymetrix GeneChip control dataset. Genome Biol 2006, 7(8):404. 10.1186/gb-2006-7-8-404PubMed CentralView ArticlePubMedGoogle Scholar
- Dabney A, Storey J: A reanalysis of a published Affymetrix GeneChip control dataset. Genome Biol 2006, 7: 401. 10.1186/gb-2006-7-3-401PubMed CentralView ArticlePubMedGoogle Scholar
- Gaile DP, Miecznikowski JC: Putative null distributions corresponding to tests of differential expression in the Golden Spike dataset are intensity dependent. BMC Genomics 2007, 8: 105. 10.1186/1471-2164-8-105PubMed CentralView ArticlePubMedGoogle Scholar
- Pearson RD: A comprehensive re-analysis of the Golden Spike data: towards a benchmark for differential expression methods. BMC Bioinformatics 2008, 9: 164. 10.1186/1471-2105-9-164PubMed CentralView ArticlePubMedGoogle Scholar
- Gharaibeh RZ, Fodor AA, Gibas CJ: Software note: using probe secondary structure information to enhance Affymetrix GeneChip background estimates. Comput Biol Chem 2007, 31(2):92–98. 10.1016/j.compbiolchem.2007.02.008PubMed CentralView ArticlePubMedGoogle Scholar
- R Development Core Team: R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2006.Google Scholar
- Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 2004, 5: R80. 10.1186/gb-2004-5-10-r80PubMed CentralView ArticlePubMedGoogle Scholar
- Fodor AA, Tickle TL, Richardson C: Towards the uniform distribution of null P values on Affymetrix microarrays. Genome Biol 2007, 8(5):R69. 10.1186/gb-2007-8-5-r69PubMed CentralView ArticlePubMedGoogle Scholar
- Armstrong SA, Staunton JE, Silverman LB, Pieters R, den Boer ML, Minden MD, Sallan SE, Lander ES, Golub TR, Korsmeyer SJ: MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet 2002, 30(1):41–47. 10.1038/ng765View ArticlePubMedGoogle Scholar
- Meredith AL, Wiler SW, Miller BH, Takahashi JS, Fodor AA, Ruby NF, Aldrich RW: BK calcium-activated potassium channels regulate circadian behavioral rhythms and pacemaker output. Nat Neurosci 2006, 9(8):1041–1049. 10.1038/nn1740PubMed CentralView ArticlePubMedGoogle Scholar
- Pyott SJ, Meredith AL, Fodor AA, Vazquez AE, Yamoah EN, Aldrich RW: Cochlear function in mice lacking the BK channel alpha, beta1, or beta4 subunits. J Biol Chem 2007, 282(5):3312–3324. 10.1074/jbc.M608726200View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.