Multivariate hierarchical Bayesian model for differential gene expression analysis in microarray experiments
- Hongya Zhao^{1}Email author,
- Kwok-Leung Chan^{1},
- Lee-Ming Cheng^{1} and
- Hong Yan^{1, 2}
https://doi.org/10.1186/1471-2105-9-S1-S9
© Zhao et al; licensee BioMed Central Ltd. 2008
Published: 13 February 2008
Abstract
Background
Identification of differentially expressed genes is a typical objective when analyzing gene expression data. Recently, Bayesian hierarchical models have become increasingly popular to solve this type of problems. These models show good performance in accommodating noise, variability and low replication of microarray data. However, the correlation between different fluorescent signals measured from a gene spot is ignored, which can diversely affect the data analysis step. In fact, the intensities of the two signals are significantly correlated across samples. The larger the log-transformed intensities are, the smaller the correlation is.
Results
Motivated by the complicated error relations in microarray data, we propose a multivariate hierarchical Bayesian framework for data analysis in the replicated microarray experiments. Gene expression data are modelled by a multivariate normal distribution, parameterized by the corresponding mean vectors and covariance matrixes with a conjugate prior distribution. Within the Bayesian framework, a generalized likelihood ratio test (GLRT) is also developed to infer the gene expression patterns. Simulation studies show that the proposed approach presents better operating characteristics and lower false discovery rate (FDR) than existing methods, especially when the correlation coefficient is large. The approach is illustrated with two examples of microarray analysis. The proposed method successfully detects significant genes closely related to the experimental states, which are verified by the biological information.
Conclusions
The multivariate Bayesian model, compatible with the dependence between mean and variance in the univariate Bayesian model, relaxes the constant coefficient of variation assumption between measurements by adding a covariance structure. This model improves the identification of differentially expressed genes significantly since the Bayesian model fit well with the microarray data.
Keywords
Background
DNA microarrays offer a powerful and effective technology to monitor the alterations of gene expression for thousands of genes simultaneously. This technology has been widely applied to the exploration of quantitative changes in gene expression in a variety of areas including diseases and toxicological studies [1–4]. One of the key tasks of microarray analysis is to investigate the expression patterns from the different experiment designs so that differentially expressed (DE) genes can be identified [5, 6].
In this paper, we consider the analysis of a two-color cDNA microarray experiment. Briefly, mRNA contained in each of two cell populations is extracted, reverse-transcribed into cDNA, and labelled with either Cy3 (green) or Cy5 (red) dyes. Cy3 and Cy5 preparations are combined and deposited on the microarray, where labelled molecules hybridize to the spots containing their complementary sequence. The amount of hybridization to each spot is quantified by scanning the array with a laser beam and observed the intensities of light emitted [7]. A pair of measurements, separately for the two dyes, are observed as x_{ gi }and y_{ gi }(g = 1,⋯,N; i = 1,⋯,n) for gene g on array i, where N is the number of genes represented on the microarray and n is the number of replicated arrays.
Given the microarray expression data, a common task is to determine which genes are differentially expressed under the two conditions. There has been a considerable amount of work in this area [8–26]. The simplest way to ascertain a gene's differential expression is based on a fold change criteria, defined by the log-ratio (log_{2}(x_{ gi }/y_{ gi })). The straightforward fold-change method widely used by biologists takes into account only the genes whose fold changes are more than 2-fold as differentially expressed genes. The 2-fold rule is too simple to deal with the issues raised by the complicated error in DNA microarray data analysis [8–12].
Traditional statistical methods may not produce reliable results when they are used directly to determine differentially expressed genes. Firstly, it is common to have thousands of genes on one chip with relatively few replications in microarray experiments. Thus, the variance estimates of gene expression data are often unreliable with the small sample size. The common approach using t- or F-statistic is not applicable since it strongly depends on the sample size and normality of the expression data [8–10]. It is known that microarray data may not follow a normal distribution or even be symmetrical and the sample size is generally small [12–16]. Modified t-statistic is suggested by adding a small constant to the gene-specific variance estimate [17]. The method makes the gene-specific variance estimates shrink towards a common variance. Recently, the hierarchical Bayesian models are employed to variance regulation by estimating moderate variances of individual genes [18–26]. The adjusted variances are calculated with the weighted averages of the gene-specific sample variances and pooled variances across all genes. With the additional combination of variances, the performance of these methods is improved significantly in identifying the significance of gene expression.
Motivated by these error relations, we propose a novel multivariate Bayesian framework for microarray analysis. The multivariate Bayesian model, compatible with the dependence between mean and variance in the univariate Bayesian model, can relax the constant assumption between measurements by adding a covariance structure. Due to the computational complexity within the Bayesian framework, we apply the modified generalized likelihood ratio test (GLRT) proposed by Benjamini and Hochberg [27, 28] to detect gene expression patterns. When the Bayesian model is in accordance with the microarray data, the power of true identification of differentially expressed genes can be improved substantially.
In this paper, we describe the multivariate Bayesian hierarchical model for gene expression data analysis, and present the generalized likelihood ratio test (GLRT) procedures with the p-value adjustment to identify differentially expressed genes. The sample size of microarray data play an important role in replicated microarray experiments. So in our simulation study, we first explore the effect of the number of replications in our methodology and suggest that the number of replicated chips is not less than 4. We also compare our methods with existing ones, such as fold change, modified t-test and LNN model. The new methodology shows good performance based on operating characteristics. In the analysis of the real microarray data, our method is proven to be powerful to identify more significant genes.
Results
Multivariate hierarchal Bayesian model and inference
Based on the LNN hierarchical model [26], we develop a multivariate model to relax the constant CV assumption between measurements by adding covariance. The model is also compatible with the complicated structure of variance in microarray data. The model is first described in this section, and then the GLRT is employed to infer the expression pattern.
and ${m}_{g}={\displaystyle \sum _{i=1}^{n}{z}_{gi}/n}=({m}_{g1},{m}_{g2})\text{'}$, the estimation of mean expression of gene g over n replications. Obviously, the posterior combines the information from the prior and the data in a sensible way.
Given these estimates ${\widehat{\mu}}_{g}^{0}$, the global parameters $\widehat{\alpha}$ can be estimated by maximizing the likelihood function in Equation (5).
It is proven that κ_{ g }approximately follows the χ^{2} distribution with one degree of freedom on the null hypothesis [28]. If κ_{ g }is larger than some critical value κ of χ^{2}(1), we would not reject the alternative H_{1}, that is to say, gene g would be identified as DE gene, otherwise as an equivalently expressed (EE) gene. However, it is essential to control some erroneous rejections and acceptances in testing situation. In the context of microarray, the false discovery rate (FDR) has emerged as a practical object to be controlled in multiple testing [30, 31]. The FDR is defined as the expectation of type I errors among the rejected null hypothesis, that is, the average of the ratios of the number of false positives to the number of DE genes identified. The scheme of Benjamini and Hochberg (BH-method) is applied to adjust p-value in the testing of microarray data [21, 27] (see section "multiple testing").
Simulation studies
The purpose of our simulation study is to determine the effect of sample size in our model and compare the proposed method with classical statistics for microarray data analysis. We simulate the expression data with N = 2000 genes and n = 6 replications generated using our model. Different expression patterns are simulated by adjusting the element values of ${\mu}_{g}^{0}$. For example, EE genes are generated with the same value ${\mu}_{g1}^{0}={\mu}_{g2}^{0}$; DE genes are obtained with different values uniformly sampled from different intervals to make ${\mu}_{g1}^{0}\ne {\mu}_{g2}^{0}$. The probability of differential expression is set to p = 0.05 for the binomial distribution to select the DE genes.
Microarray data are typically "large N and small n", that is, the number of samples is much smaller than the number of genes. Especially with the emergence of replicated microarray, the number of replication is always discussed in microarray analysis [9–11]. Multiple testing is always employed in microarray analysis [30, 31]. However, multiple testing is generally distorted by the dimension curse, which makes parameter estimates biased with a smaller number of sample sizes. On the other hand, a larger number of genes appear to compensate partially for the destabilizing effect of the sample size, especially for the estimation of the common parameters of all genes. So we should explore the effect of the sample size in our methods. We simulate the replicated measurements with the previous steps, only changing the number of replication from n = 2 to 12. Then we estimate the corresponding parameters of our hierarchical model and calculate the statistic κ_{ g }(g = 1,⋯,2000) respectively for the 11 data sets.
In the simulation studies, we also compare our methodology with existing methods for microarray data analysis, such as the fold-change, t-test and LNN model. The fold-change method makes use of direct comparison of intensities, in which the error structure is ignored. The two-sample t-test overcomes the limitation by assuming the normality of expression data, but it is affected by the normality assumption and sample size. The LNN Bayesian model is developed to address these shortcomings. We improve the LNN model using the multivariate Bayesian model by considering the correlation between two measurements of each spot. GLRT is applied to test the hypotheses within the multivariate Bayesian framework.
Operating characteristics in simulation study of ρ = 0.1, 0.5 and 0.9.
Corr. Coef. | Method | Sensitivity | Specificity | PPV | NPV | FDR |
---|---|---|---|---|---|---|
0.1 | GLRT | 0.742 | 0.998 | 0.960 | 0.987 | 0.040 |
LNN | 0.667 | 0.991 | 0.826 | 0.980 | 0.174 | |
t-test | 0.557 | 0.987 | 0.692 | 0.978 | 0.308 | |
fold change | 0.309 | 0.999 | 0.968 | 0.966 | 0.032 | |
0.5 | GLRT | 0.945 | 0.998 | 0.966 | 0.997 | 0.034 |
LNN | 0.635 | 0.986 | 0.873 | 0.972 | 0.127 | |
t-test | 0.516 | 0.968 | 0.435 | 0.977 | 0.565 | |
fold-change | 0.252 | 1 | 1 | 0.966 | 0 | |
0.9 | GLRT | 0.980 | 0.999 | 0.990 | 0.999 | 0.010 |
LNN | 0.568 | 0.988 | 0.877 | 0.987 | 0.123 | |
t-test | 0.406 | 0.972 | 0.436 | 0.969 | 0.564 | |
fold change | 0.139 | 1 | 1 | 0.956 | 0 |
Results from microarray experiments
Any artificial scenario inevitably is biased regarding the underlying model and only reflects certain aspects of biological reality. Therefore, the proposed method is tested in on two real datasets to verify its performance in real world applications. The first dataset contains the gene expression profiles of adenocarcinoma and normal tissues [32]. The data was gathered on the following website http://microarray.princeton.edu/oncology/carcinoma.html. In the microarray experiment, n = 18 pairs of colon adenocarcinoma and normal colon samples were studied and N = 7457 cDNAs and ESTs are represented on the oligonucleotide array. We apply our method to the microarray data to identify the differentially expressed genes in colon adenocarcinoma. The parameters α and Θ are estimated, and the estimate of correlation coefficient is $\widehat{\rho}$ = 0.80. The GLRT κ_{ g }s of Equation (8) are calculated for the inference controlling FDR α = 0.001 and 374 DE genes are identified using our multivariate Bayesian formulation. However, in [32] 47 down-regulated and 19 up-regulated genes in adenocarcinoma are listed whose ratios are more than 4-fold and p-values associated was also marginally greater than 0.001. Comparatively, we have discovered that all 47+19 = 66 genes in [32] are detected with high confidence using our method. Furthermore, our gene list also contains many gene products that are related to 66 genes in [32], such as Ckshs2, MGSA, matrilysin, and diverse products related to proliferation and metabolic rate. Some genes related to guanylin and colon mucosa antigen are also identified as significant genes with our model. Therefore, our results include not only the genes that are already known to be expressed abnormally in colon cancer, but also other genes confirmed by biological experiments [32].
The proposed method is also illustrated with another example of microarray data analysis where the objective is to identify differentially expressed genes in mouse liver after treatment with a toxic metal (Cadmium). In our microarray experiment, n = 6 hybridizations are repeated for N = 1824 genes and we obtained 6 pairs of red and green intensity for each gene, ${z}_{gi}=({z}_{gi}^{red},{z}_{gi}^{green})\text{'}$ (g = 1,⋯,1824; i = 1,⋯,6). Data normalization is essential and we still denote the normalized data with z_{ gi }[33]. As shown in Figures 1 and 2, the error structure of our microarray data depends on the means and correlations between the intensities measured from different dyes. Hence we apply the multivariate Bayesian framework to our microarray data. The parameters α and Θ are estimated, in which the estimate of correlation coefficient is $\widehat{\rho}$ = 0.92. Then the GLRT κ_{ g }s of Equation (8) are calculated for inference and the BH-method is performed to adjust the p-values controlling the FDR α = 0.01 in multiple testing. The critical value is calculated to be 10.85, which means the genes in the following set are inferred as DE genesJ(α) = {g : λ_{ g }≥ κ = 10.85},
Using this criterion, 183 genes are identified. The two-sample t-test detects 44 specific genes controlling the FDR = 0.01 while the fold-change only detects 6 genes. In fact, the above mentioned 6 genes from the fold-change and 44 from the t-test are all included in J(α). Furthermore, the fold-change does not provide the estimation of the FDR. We have applied another commonly used approach, called the significance analysis of microarray (SAM) [11]. When we adjust the parameters especially Δ to detect 183 genes in which 82% belongs to J(α), it gives a higher FDR about 2.81% than ours. Comparatively, our method provides a more powerful tool for identification of DE genes while keeping a lower FDR.
Although the DNA microarray technology is very effective for understanding alterations in genome-wide patterns of gene expression, there may be situations in which we need more evidence to determine which genes are truly differentially expressed from the statistical results and further biological analysis may be required to verify the candidate genes. In our study, we also perform another biological test, the reverse-transcription polymerase chain reaction (RT-PCR) to confirm the DE genes. We have found that the relative expression of Ctsc (cathepsin C), Dnase2 (deoxyribonuclease II), Mt-1 (Metallothionein-I) and A2m (alpha-2-macroglobulin) after the normalization are up-regulated in triplicate analysis. Based on gene ontology (GO) analysis http://www.geneontology.org/[34], they are highly related to the transcriptional regulatory of prostease inhibitor activity (GO: 0030414) and detoxification of copper ion (GO:0010273). This implies that there is a good correlation between the microarray experiment, RT-PCR, as well as the Bayesian method we have proposed.
Discussion
The DNA microarray technology has important applications in gene expression data analysis. However, the potential sources of random and systematic measurement errors are a critical issue in statistical analysis. It is impossible to propose a statistical model that reflects all sources of errors. Therefore, a good model should capture the most essential features of the data. Currently, the Bayesian methods provide a practical and effective tool for microarray analysis. We have explored the multivariate Bayesian framework to identify DE genes in replicated microarray experiments. More inherent characteristics of expression data are accommodated in the proposed model that is flexible and adaptable to the measurements of each spot. DE genes can be inferred by the GLRT adjusted by BH-method controlling the FDR. In comparison with other methods, the operational characteristics of our method are better than the intuitive fold change, the t-test and the LNN model. Furthermore, our method produces lower FDR and higher efficiency of identification.
As to the inference, the number of hypotheses would increase significantly with the number of conditions. For example, there are 5 hypotheses to infer under 3 conditions, equivalent expression, altered expression in one condition and distinct expression in each condition. Thus, only some patterns of interest should be tested with the GLRT calculated on the specific constraints.
With the widespread applications to microarray data analysis, more sophisticated Bayesian methods are needed to solve more statistical problems, such as normal assumption, gene independence and parameter estimation. Normality and independence are regarded as the devices deducing the probability distribution function, but we believe more improvement can be made, especially in terms of dependence.
Conclusions
We have presented a multivariate Bayesian model for differential gene expression data analysis. In addition to the gene-specific variance, this model takes into account the covariance between the pair of measurements to relax the constant assumption of correlation coefficient in the common used hierarchical models. Our model provides a more realistic and flexible estimate for the variance of gene expression data under limited replicates. Based on the multivariate hierarchical model, the multiple GLRT takes into account the power of gene-specific variance, latent gene variance and covariance. In our examples above, the results obtained from our model show better operating characteristics, especially when the correlation coefficient of gene expression within one spot is significant. This indicates that the power of identification of differentially expressed genes can be improved if the Bayesian model is developed in accordance with the statistical properties of microarray data.
Methods
Toxic microarray experiment
Cadmium (Cd) is a ubiquitous environmental toxic pollutant with a well established toxicity. Chronic exposure or even low concentration of Cd has been shown to result in a variety of pathological disorders such as cancers, anemia, osteoporosis, renal and hepatic dysfunction. A microarray experiment was designed in the biomedical laboratory of the department of biology, Hong Kong Baptist University. They explore the genes that are differentially expressed with the toxic treatment, using duel colors (Cy3 and Cy5) DNA microarray to compare the treatment group with CdCl_{2} and control group with NaCl. In our microarray experiment, eight male adult mice, ICR strain, were randomly separated into four groups and denoted as C1T1 and C2T2. In each group one serves as the control which the other one is for treatment. The mice in control and treatment group were given a single intraperitoneal injection of 0.3 ml 0.9% NaCl or the same volume of 2 mg/kg CdCl2 respectively. After 48 hours, mice were sacrificed and the livers were collected. Total RNA were extracted from each liver using the Trizol reagent. The total RNA samples were reverse transcribed to cDNA in the presence of fluorescent (Cy3) or (Cy5) dye. Usually, the treatment group is labelled with Cy3 while the control group is labelled with Cy5. Probes were then hybridized onto the UCLA M07 microarray arrays overnight at 65°C. After two subsequent washings in 2 × SSC, 0.1% SDS and 0.2 × SSC buffer, all the hybridized chips were scanned using ScanArray 5000 confocal laser scanner (Packard BioChip BioScience Technology) and images were further analyzed by the QuantArray Quantitative Microarray Analysis Software. C1T1 and C2T2 groups were tested in three individual hybridization experiments and thus 6 hybridized chips are replicated measured. After image analysis, there is one pair of red and green fluorescence intensities (after background correction) observed from each spot, and 6 replicated pairs for each gene. We applied the logarithm transformation to the measurements as commonly used in microarray analysis. Before statistical analysis, the microarray data have to be normalized and filtered to removing some variation of expression levels in fluorescence intensities [32]. After data processing, we denote n replicated pairs of observation component of g th gene as Z_{ g }= (z_{g 1},⋯,z_{ gn })', in which the green and red log-intensities of g th gene of i th replication is z_{ gi }= (x_{ gi }, y_{ gi })' (g = 1,⋯,1824; i = 1,⋯,6).
Lognormal-Normal (LNN) model
where IG denotes the inverse Gamma (IG) distribution and μ_{0}, λ_{0}, ν_{0}, ${\sigma}_{0}^{2}$ are the hyperparameters. Notice that the dependence between μ_{ g }and ${\sigma}_{g}^{2}$ is implied with the conjugate prior π(μ_{ g }, ${\sigma}_{g}^{2}$) whose posterior probability has the same functional form. All measurements x_{ gi }and y_{ gi }in this framework are assumed to arise independently and identically from the same distributional class.
Multiple testing
Number of errors in N multiple test
# not rejected (negative) | # rejected (positive) | Total | |
---|---|---|---|
# True H_{0} (EE) | TN | FP | N _{0} |
# Non-true H_{0} (DE) | FN | TP | N _{1} |
R _{0} | R _{1} | N |
In microarray analysis, the FDR is defined as the expectation of the ratio of rejected null hypotheses which are erroneously rejected, that is, the average of the ratio of the number of false positives to the number of genes identified as DE. Because of typical large N and small n in microarray data, the type I errors increase when many hypothesis are tested and each test has a specified type I error probability.
Obviously, it is intuitive to test in the univariate setting to minimize type II errors rates under the prespecified type I error rate. As to the case under multiple testing, we have different procedures. Some definitions about type I error rate are described, such as FDR, FWER or PCER in [11]. Benjamini and Hochberg's p-value adjustment provided a more powerful procedure to control FDR [9, 11, 27, 30, 31]. Based on the approximate χ^{2} distribution of κ_{ g }, we can apply the method for the significant testing to identify the DE genes. The algorithm of theBH-method is described as:
Step 1: Order the p-value corresponding to testing N hypotheses of H_{g 0}: p_{(1)} ≤ p_{(2)} ≤ ⋯ p_{(N)}.
Step 2: Define the Bonferroni type multiple-testing ${p}_{(g)}\le \frac{g}{N}\alpha $, where α is the value of the controlled FDR. Let m be the largest g to satisfy the inequations.
Step 3: Reject all H_{(g)0}(g = 1,2,⋯,m), that is to say, genes indexed by (1), (2),...,(m) might be identified as the DE genes.
Besides the FDR in the multiple testing, there are other statistics to assess the significance [9, 27]. In microarray data analysis, sensitivity is defined as the fraction of the true DE genes correctly identified as DE, i.e. TP/N_{1}; specificity is defined as the fraction true EE genes correctly identified as EE, i.e. TN/N_{0}; PPV of H_{gk} is the fraction of the DE genes that give a positive result, i.e. TP/R_{1}; and NPV is TN/R_{0}.
Declarations
Acknowledgements
We would like to thank Prof. Wong, R.N.S. and Dr. Yue, P.Y.K. from the department of biology of Hong Kong Baptist University for providing us the microarray data. This work is supported by the Hong Kong Research Grant Council (Project CityU 122506).
This article has been published as part of BMC Bioinformatics Volume 9 Supplement 1, 2008: Asia Pacific Bioinformatics Network (APBioNet) Sixth International Conference on Bioinformatics (InCoB2007). The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/9?issue=S1.
Authors’ Affiliations
References
- Brown PO, Botstein D: Exploring the new world of the genome with DNA microarrays. Nat Genet 1999, 21(1 Suppl 1):33–37. 10.1038/4462View ArticlePubMedGoogle Scholar
- Schena M, Shalon D, Davis RW, Brown PO: Quantitative monitoring of gene-expression patterns with a complementary-DNA microarray. Science 1995, 270: 467–470. 10.1126/science.270.5235.467View ArticlePubMedGoogle Scholar
- Schena M, Heller RA, Theriault TP, Konrad K, Lachenmeier E, Davis RW: Microarrays: Biotechnology's Discovery Platform for Functional Genomics. Trends in Biotechnology 1998, 16: 301–306. 10.1016/S0167-7799(98)01219-0View ArticlePubMedGoogle Scholar
- Sham P, Bader JS, Craig I, O'Donovan M, Owen M: DNA pooling: a tool for large-scale association studies. Nature Reviews Genetics 2002, 3: 862–871. 10.1038/nrg930View ArticlePubMedGoogle Scholar
- Amaratunga D, Cabrera J: Exploration and Analysis of DNA Microarray and Protein Array Data. New Jersey: Wiley; 2004.Google Scholar
- Yang YH, Speed T: Design and analysis of comparative microarray experiments. In Statistical Analysis of Gene Expression Microarray Data. Edited by: Speed T. Boca Raton, Florida: Chapman & Hall; 2003:35–91.Google Scholar
- Shalon D, Smith SJ, Brown PO: A DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridization. Genome Res 1996, 6: 639–645. 10.1101/gr.6.7.639View ArticlePubMedGoogle Scholar
- Chen Y, Dougherty E, Bittner M: Ratio-based decisions and the quantitative analysis of cDNA microarrays images. J Biomedical Optics 1997, 2: 364–367. 10.1117/12.281504View ArticlePubMedGoogle Scholar
- Dudoit S, Yang YH, Callow MJ, Speed TP: Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Statist Sinica 2002, 12: 111–139.Google Scholar
- Efron B, Tibshirani R, Storey JD, Tusher V: Empirical Bayes analysis of a microarray experiment. J American Statistical Association 2001, 96: 1152–1160.View ArticleGoogle Scholar
- Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci 2001, 98: 5116–5121. 10.1073/pnas.091062498PubMed CentralView ArticlePubMedGoogle Scholar
- Yang YH, Xiao Y, Segal MR: Identifying differentially expressed genes from microarray experiments via statistic synthesis. Bioinformatics 2005, 21: 1084–93. 10.1093/bioinformatics/bti108View ArticlePubMedGoogle Scholar
- Dean N, Raftery AE: Normal uniform mixture differential gene expression detection for cDNA microarrays. BMC Bioinformatics 2005, 6: 173. 10.1186/1471-2105-6-173PubMed CentralView ArticlePubMedGoogle Scholar
- Durbin BP, Hardin JS, Hawkins DM, Rocke DM: A variance-stabilizing transformation for gene-expression microarray data. Bioinformatics 2002, 18(Suppl 1):S105-S110.View ArticlePubMedGoogle Scholar
- Huber W, von Heydebreck A, Sultmann H, Poustka A, Vingron M: Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 2002, 18(Suppl 1):S96-S104.View ArticlePubMedGoogle Scholar
- Newton MA, Kendziorski CM, Richmond CS, Blattner FR, Tsui KW: On differential variability of expression ratios: Improving statistical inference about gene expression changes from microarray data. Journal of Computational Biology 2001, 8: 37–52. 10.1089/106652701300099074View ArticlePubMedGoogle Scholar
- Baldi P, Long AD: A Bayesian framework for the analysis of microarray expression data: regularised t-test and statistical inferences of gene expression changes. Bioinformatics 2001, 17: 509–519. 10.1093/bioinformatics/17.6.509View ArticlePubMedGoogle Scholar
- Newton MA, Noueiry A, Sarkar D, Ahlquist P: Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics 2004, 5: 155–176. 10.1093/biostatistics/5.2.155View ArticlePubMedGoogle Scholar
- Ibrahim JG, Chen MH, Gray RJ: Bayesian models for gene expression with DNA microarray data. J Amer Statist Assoc 2002, 97: 88–99. 10.1198/016214502753479257View ArticleGoogle Scholar
- Lewin A, Richardson S, Marshall C, Glazier A, Aitman T: Bayesian modelling of differential gene expression. Biometrics 2005, 62: 10–18. 10.1111/j.1541-0420.2005.00394.xView ArticleGoogle Scholar
- Gottardo R: Statistical analysis of microarray data: a Bayesian approach. Biostatistics 2003, 4: 597–620. 10.1093/biostatistics/4.4.597View ArticlePubMedGoogle Scholar
- Broet P, Richardson S, Radvanyi F: Bayesian hierachical model for identifying changes in gene expression from microarray experiments. Journal of Computational Biology 2002, 9: 671–683. 10.1089/106652702760277381View ArticlePubMedGoogle Scholar
- Lo K, Gottardo R: Flexible empirical Bayes models for differential gene expression. Bioinformatics 2006, 23: 328–335. 10.1093/bioinformatics/btl612View ArticlePubMedGoogle Scholar
- Sartor MA, Tomlinson CR, Wesselkamper SC, Sivaganesan Siva, Leikauf GD, Medvedovic M: Intensity-based hierarchical Bayes method improves testing for differentially expressed genes in microarray experiments. BMC Bioinformatics 2006, 7: 538. 10.1186/1471-2105-7-538PubMed CentralView ArticlePubMedGoogle Scholar
- Manda SOM, Walls RE, Gilthorpe MS: A full Bayesian hierarchical mixture model for the variance of gene differential expression. BMC Bioinformatics 2007, 8: 124. 10.1186/1471-2105-8-124PubMed CentralView ArticlePubMedGoogle Scholar
- Kendziorski CM, Newton MA, Lan H, Gould MN: On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Statistics in Medicine 2003, 22: 3899–3914. 10.1002/sim.1548View ArticlePubMedGoogle Scholar
- Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society 1995, 57: 289–300.Google Scholar
- Wang S, Ethier S: A generalized likelihood ration test to identify differentially expressed genes from microarray data. Bioinformatics 2004, 20: 100–104. 10.1093/bioinformatics/btg384View ArticlePubMedGoogle Scholar
- Lee PM: Bayesian Statistics: an introduction. Arnold: London and Wiley: New York; 1997.Google Scholar
- Delongchamp R, Bowyer J, Chen J, Kodell R: Multiple-testing strategy for analyzing cDNA array data on gene expression. Biometric 2004, 60: 774–782. 10.1111/j.0006-341X.2004.00228.xView ArticleGoogle Scholar
- Storey JD: The positive false discovery rate: a Bayesian interpretation and the q-value. Ann Statist 2003, 31: 2013–2035. 10.1214/aos/1074290335View ArticleGoogle Scholar
- Notterman DA, Alon U, Sierk AJ, Levine AJ: Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissues examined by oligonucleotide arrays. Cancer Research 2001, 61: 3124–3130.PubMedGoogle Scholar
- Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP: Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucl Acids Res 2002, 30: e15. 10.1093/nar/30.4.e15PubMed CentralView ArticlePubMedGoogle Scholar
- Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C, Richter J, Rubin GM, Blake JA, Bult C, Dolan M, Drabkin H, Eppig JT, Hill DP, Ni L, Ringwald M, Balakrishnan R, Cherry JM, Christie KR, Costanzo MC, Dwight SS, Engel S, Fisk DG, Hirschman JE, Hong EL, Nash RS, Sethuraman A, Theesfeld CL, Botstein D, Dolinski K, Feierbach B, Berardini T, Mundodi S, Rhee SY, Apweiler R, Barrell D, Camon E, Dimmer E, Lee V, Chisholm R, Gaudet P, Kibbe W, Kishore R, Schwarz EM, Sternberg P, Gwinn M, Hannick L, Wortman J, Berriman M, Wood V, de la CN, Tonellato P, Jaiswal P, Seigfried T, White R: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res 2004, 32: D258-D261. 10.1093/nar/gkh066View ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.