Volume 9 Supplement 9
Testing for treatment effects on gene ontology
© Lee et al; licensee BioMed Central Ltd. 2008
Published: 12 August 2008
In studies that use DNA arrays to assess changes in gene expression, it is preferable to measure the significance of treatment effects on a group of genes from a pathway or functional category such as gene ontology terms (GO terms, http://www.geneontology.org) because this facilitates the interpretation of effects and may markedly increase significance. A modified meta-analysis method to combine p-values was developed to measure the significance of an overall treatment effect on such functionally-defined groups of genes, taking into account the correlation structure among genes. For hypothesis testing that allows gene expression to change in both directions, p-values are calculated under the null distribution generated by a Monte Carlo method.
As a test of this procedure, we attempted to distinguish altered pathways in microarray studies performed with Mitochips, oligonucleotide microarrays specific to mitochondrial DNA-encoded transcripts. We found that our analytic method improves the specificity of selection for altered pathways, due to incorporation of the inter-gene correlation structure in each pathway. It is thus a practical method to measure treatment effects on GO groups. In many actual applications, microarray experiments measure treatment effects under complicated design structures and with small sample sizes. For such applications to real data of limited statistical power, and also in computer simulations, we demonstrate that our method gives reasonable test results.
The advent of DNA microarray technology has revolutionized genomic research and medicine because of its ability to simultaneously determine the expression levels of thousands of genes. However, the interpretation of large amounts of microarray gene expression data, and the ability to derive biologically meaningful conclusions from such data, have always been daunting tasks for statisticians. Because of the high volume and complex characteristics of microarray data, much of the initial work on their analysis has focused on development of data mining or data reduction methods to identify differentially expressed genes. Typically, the p-value of a test statistic is calculated for each gene, the genes are ranked according to these p-values, and a pre-specified significance criterion, such as the false discovery rate, is used to determine a cut-off which creates a category of differentially expressed genes [1–3].
Attempts to interpret individual genes in a list of significant genes are demanding and laborious. Therefore, recent efforts have focused on discovery of biological pathways rather than on individual gene function. Gene ontology terms (GO terms, http://www.geneontology.org) reflect gene groupings based on molecular function, biological process, or cellular structure/organelle. The interpretation of differentially expressed GO groups is generally simpler than the presentation of a list of statistically significant genes, and more resistant to erroneous conclusions that can arise from microarray artefacts.
Several statistical methods that combine the analysis of differential gene expression with biological databases have been proposed and implemented in computer packages for a more rapid interpretation of genome-wide expression data . However, most such methods are based on a series of univariate statistical tests and do not properly account for the complex structure of gene interactions. The statistical significance of a GO group is commonly assessed by comparing the number of statistically significant genes in the group to the number expected by chance using Fisher's exact test, which is based on the hypergeometric distribution . Fisher's exact test is used to compare these proportions to assess overrepresentation of significant genes in functional categories. This approach is not amenable to correction for correlations among p-values, since the test inherently assumes exchangeability among genes, an assumption which is not valid under arbitrary or actual correlation structures [6, 7].
Hotelling's T2 Statistic and permutation methods address the correlation structure among genes. Hotelling's T2 statistic is not applicable when the sample size is smaller than the number of genes in a GO term . Permutation methods, although quite valuable under appropriate conditions, are severely compromised by limited numbers of permutable sample pairs. In many cases, the design of microarray studies has a rather complicated structure intended to manage technical variation associated with differences among probes, dyes, and reagent batches by creating treatment blocks within these sources of variation . Such cases are not suited to permutation methods.
A modified meta-analysis method was developed by Delongchamp et al.  to combine p-values, and thus to measure the significance of an overall treatment effect on a group of genes, while taking into account the inter-genic correlation structure. The method is based on the fact that p-values follow a uniform distribution under the null hypothesis. Inverse-normal transformed p-values have a normal distribution and their sum over a set of genes also would follow a normal distribution, provided that the component p-values are independent. The test we have developed to measure the significance of overall treatment effect on genes within a GO category is based on a modification of this statistic, to reflect the actual correlation structure among genes sharing a GO term.
In this paper, we extend the method from a simple one-class t-test with the null hypothesis H0 : μ = 0 to a test for pair-wise contrast in a fixed-effects linear model. In the following sections, we describe in detail the extension of the methodology, with validation through computer simulations and application to two toxicogenomics studies designed to evaluate treatment effects on the levels of mRNA transcripts involved in mitochondrial function. We thus demonstrate that this methodology provides a practical approach to testing the significance of the treatment effects on gene classes defined by GO terms, and by extension on any other prior categorization of genes into functional subsets. Because many microarray experiments measure treatment effects under complicated design structures and with small sample sizes , we used a simulation study to determine whether the method gives reasonable results under these conditions.
Specific applications to toxicogenomics studies showed that the methodology has improved specificity in choosing significantly altered pathways or functional categories, and may thus assist in the understanding of molecular mechanisms of mitochondrial toxicity in the liver induced by anti-HIV drugs [11, 12] and in assessing effects on mitochondrial function of weight-reducing dietary supplements, such as usnic acid .
Measurement of a treatment effect for each gene
In many toxicogenomic studies, the significance of a treatment effect is tested under the null hypothesis H0 : cβ = 0, where cβ is a pair-wise contrast among treatments. Under the null hypothesis, has a t-distribution with n - r degrees of freedom, and the p-value to assess the significance of a treatment is calculated from this statistic.
Test for a gene group
A modified meta-analysis method of combining p-values was developed to measure the significance of an overall treatment effect on any group of genes by a one-class t-test . The p-value calculated from the null hypothesis is a random variable with uniform distribution, which can be transformed to a suitable probability distribution. Inverse-normal transformed p-values, z k = Φ-1 (1 - p k ) ~N(0, 1), k = 1, ⋯, m have a normal distribution and their sum, , is also normally distributed when p-values are independent. Here, p k represents a p-value for a gene in a GO group comprising m genes. The p-value for the sum of inverse-transformed p-values, , gives the overall significance of the treatment effect on the GO group. We refer to this as the naïve estimate because it assumes independence among p-values.
where rs,tis the s-th row and t-th column element of R. Therefore, is the appropriate p-value which corrects the naïve p-value, .
It follows that , where is the average of off-diagonal elements of R. The correction depends only on the average correlation, , and the correction tends to give a reduced significance when > 0. When R is unknown, we estimated the covariance, to provide an average correlation coefficient for the correction.
The correlation structure of p-values is different for a two-sided t-test, which allows gene expression changes in both directions, than for a one-sided situation. A two-tailed test, in which p = 2(1 - Φ(|z|)), requires a different correction method, since the correlation among |z k |, k = 1, ⋯, m differs from a one-sided test in which p = 1 - Φ(z). The null distribution of the summary statistics 1'|z| can be generated through Monte Carlo sampling from the null distribution of z, MVN(0, cov(z)). When z1, ⋯, z n are random samples from MVN(0, cov(z)), the p-value for the observed value, Ψ = 1'|z|, is computed as , where I(A) is an indicator function which gives 1 if A is true, or 0 otherwise. Here, cov(z) has to be estimated from the data. The estimated correlation, and its variation, , which has for off-diagonal elements, are used to estimate cov(z).
The derivation of the method presented in the previous section is based on the known correlation matrix of the vector of dependent variable Y. When the correlation matrix is not known, we use an estimate of the correlation matrix. In reality, the correlation matrix is always unknown. The proposed method produces an approximately correct p-value for a group of genes. To demonstrate that the method gives not perfect but acceptably correct p-values, we present simulation results in this section. The validation is done by checking the cumulative distribution of p-values from the proposed methods under the null distribution. The p-values must have a uniform distribution, which should form a diagonal line in the following figures if p-values are correctly calculated.
The simulation is conducted under a fairly common set of conditions for microarray studies, comprising three treatments with three samples (arrays) per treatment. Samples are generated from N(μ i , Σ) for each treatment, i = 1, 2 3,. In this simulation, samples for two treatment have the same average values, μ1 = μ2 = 1, and samples for the other treatment have twice that average value μ3 = 2. P-values for the pair-wise contrast are calculated under the null hypothesis, H0 : μ1 = μ2. A GO term is composed of m = 20 genes which have correlation structure generated randomly between 0.36 and 0.55. The standard deviations, σ i , i = 1, ⋯,m for the genes, which are the diagonal elements of Σ, are generated randomly between 0.01 and 0.25. We iterated this procedure at least 10,000 times to observe the distribution of calculated p-values.
We present two real-world examples based on data from Mitochip, a mitochondria-specific mouse oligonucleotide microarray which was developed by Dr. Varsha Desai at the National Center for Toxicological Research . Mitochip measures the levels of mRNA for 542 mitochondrial and nuclear genes associated with mitochondrial structure and function. Each Mitochip includes 9 housekeeping genes and 9 Arabidopsis plant genes to serve as positive and negative control genes, respectively. We considered 317 relevant GO groups related to mitochondrial functions, based on a database from Mouse Genome Informatics (MGI, http://www.informatics.jax.org).
Experimental design for the AZT and 3TC effects on mouse-liver gene expression.
AZT 240 mg/kg/d
AZT+3TC 160+100 mg/kg/d
AZT+3TC 160+100 mg/kg/d
Oxidative phosphorylation is a key mitochondrial function that requires the electron transport assembly of four protein complexes (I, II, III, IV) to catalyze sequential oxidation/reduction reactions, and complex V to generate ATP. Several clinical and animal studies have investigated the effect of nucleoside reverse transcriptase inhibitors (NRTIs), analogs such as AZT, on mitochondrial respiratory chain complexes. These studies suggest that there is a deficit in one of the components of complexes I and IV in skeletal muscle of children perinatally exposed to antiretroviral nucleoside analogues .
Effects of AZT and 3TC on oxidative phosphorylation and apoptosis.
Usnic acid is a lichen metabolite used as a weight-loss dietary supplement due to its uncoupling action on mitochondria. However, its use has been associated with severe liver disorders in many individuals. Animal studies conducted thus far have evaluated effect of usnic acid on mitochondria, primarily by measuring the rate of oxygen consumption and/or ATP generation. Generation of ATP requires tight coupling of electron transport with oxidative phosphorylation, maintained through a proton gradient across the inner mitochondrial membrane. An important finding of the study is a lack of usnic acid effect on complex V, despite a significant up-regulation of all four complexes of the electron transport chain. Usnic acid is a known uncoupler that is highly lipophilic in both neutral and anionic forms due to its numerous carbonyl groups that absorb the negative charge of the anion by resonance stabilization. This lipophilicity of usnic acid and the usniate anion allows easy passage of both entities through the mitochondrial membranes by passive diffusion into the matrix where it is ionized, releasing a proton into the matrix. The resulting usniate anion can then diffuse back into the inter-membrane space where it binds to the proton on the acidic side of the inner membrane to re-form usnic acid which can then diffuse back into the matrix. The resulting cycle causes proton leakage that eventually can dissipate the proton gradient across the inner membrane, disrupting the tight coupling between electron transport and ATP synthesis. This model would explain the absence of gene-expression changes associated with complex V in usnic acid-treated mice, despite the increased electron transport by complexes I – IV. It may also explain the decline in ATP level in spite of increased oxygen consumption.
Effects of usnic acid on phosphorylation and apoptosis.
In many studies that use microarray data, the number of samples is small as in the first example shown above. While the number of samples in the simulations is only 3 for each group, the distribution of corrected p-values approximates a uniform distribution. The estimation for the correlation, , might not be very close to the true correlation, R. However, the correction methods that depend only on the average correlation, , are more robust because the estimation of is more robust.
For the one-sided test, the correction for the correlation depends only on , the average of off-diagonal elements of the correlation matrix. The corrections using or are equivalent for the one-sided test method. Important points in choosing a correlation estimate for the two-sided test are the following; 1) The correlation estimate should be robust for small sample sizes, and 2) The correlation estimate should preserve . satisfies these two conditions.
When the direction of gene expression change is pre-specified, the one-sided test is a good choice since it is easy and fast to calculate p-values. However, the two-sided test is the one we have to use in most cases, because it is usually not possible to pre-specify how individual genes will respond to treatment in the exploratory context. When we have a small number of samples to estimate the correlation, the method gives a robust result. Since misrepresents the true correlation, and gives biased p-values, the method works better for larger sample sizes. This is seen in Figure 3, where the method is better than the method. We hesitate to present a specific threshold sample size as sufficient for a converged correlation estimate, since it varies with respect to several conditions, such as the number of genes in a group, the variation of the elements of the correlation matrix, etc. The simulation result in Figure 3 shows that both methods give quite accurate p-values compared to the p-values from the true correlation matrix, R. The Hotelling T2 test is the best choice whenever it is applicable.
In Table 1, the distribution of animals from different treatment groups (A-E) in three batches (1–3) gives no permutable pairs. In this case, randomization methods are not applicable. Even though randomization methods inherently take into account the correlation structure among genes, they may not be practical when the design of the experiment is complicated and the number of samples per group is small, reducing the numbers of permutable pairs.
Randomization methods that permute class labels can adjust p-values for the correlation structure among genes. Randomization methods choose a summary statistic (e.g. enrichment score (ES) in , average z-score in ), which reflects the degree to which a set of genes is enriched. When the significance of the summary statistic is measured by permuting class labels, the method preserves gene-gene correlations and when applicable, would give similar result to the presented method. Randomization methods that permute gene labels, such as Fisher's exact test, do not preserve the correlation structure and misrepresent the group's significance.
We have presented a method to test the significance of expression changes within a group of genes, while considering the correlation structure among genes in each group. This method will enable the rapid detection of microarray evidence indicating altered cell functions or pathways, and will facilitate the interpretation of microarray outcomes. Application of the method to real data shows that it is an improved, practical method to evaluate the effects of treatments on functional classes of genes such as those based on Gene Ontology descriptors.
TL was supported by an Oak Ridge Institute of Science and Education (ORISE) fellowship at NCTR.
Disclaimer: The findings and conclusions in this report are those of the authors and do not necessarily represent the views of the FDA.
This article has been published as part of BMC Bioinformatics Volume 9 Supplement 9, 2008: Proceedings of the Fifth Annual MCBIOS Conference. Systems Biology: Bridging the Omics. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/9?issue=S9
- Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and power-ful approach to multiple testing. Journal of the Royal Statistical Society Series B 1995, 57: 289–300.Google Scholar
- Storey JD, Taylor JE, Siegmund D: Strong control, conservative point estimation, and simultaneous conservative consistency of false discovery rates: A unified approach. Journal of the Royal Statistical Society, Series B 2004, 66: 187–205. 10.1111/j.1467-9868.2004.00439.xView ArticleGoogle Scholar
- Allison DB, Gadbury GL, Heo M, Fernandez JR, Lee C-K, Prolla TA, Weindruch R: A mixture model approach for the analysis of microarray gene expression data. Computational Statistics and Data Analysis 2002, 39: 1–20. 10.1016/S0167-9473(01)00046-9View ArticleGoogle Scholar
- Khatri P, Draghici S: Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 2005, 21: 3587–3595. 10.1093/bioinformatics/bti565PubMed CentralView ArticlePubMedGoogle Scholar
- Draghci S, Khatri P, Martins RP, Ostermeier GC, Krawetz SA: Global functional profiling of gene expression. Genomics 2003, 81: 98–104. 10.1016/S0888-7543(02)00021-6View ArticleGoogle Scholar
- Tian L, Greenberg SA, Kong SW, Altschuler J, Kohane IS, Park PJ: Discovering statistically significant pathways in expression profiling studies. PNAS 2005, 102: 13544–13549. 10.1073/pnas.0506577102PubMed CentralView ArticlePubMedGoogle Scholar
- Pavlidis P, Qin J, Arango V, Mann JJ, Sibille E: Using the gene ontology for microarray data mining: a comparison of methods and application to age effects in human prefrontal cortex. Neurochemical Research 2004, 29: 1213–1222. 10.1023/B:NERE.0000023608.29741.45View ArticlePubMedGoogle Scholar
- Kong SW, Pu WT, Park PJ: A multivariate approach for integrating genome-wide expression data and biological knowledge. Bioinformatics 2006, 22: 2373–80. 10.1093/bioinformatics/btl401PubMed CentralView ArticlePubMedGoogle Scholar
- Delongchamp RR, Velasco C, Desai VG, Lee T, Fuscoe JC: Designing toxicogenomics studies that use DNA array technology. Bioinformatics and Biology Insights 2007, in press.Google Scholar
- Delongchamp RR, Lee T, Velasco C: A method for computing the overall statistical significance of a treatment effect among a group of genes. BMC Bioinformatics 7(Suppl 2):S11. 2006 Sep 6 2006 Sep 6 10.1186/1471-2105-7-S2-S11Google Scholar
- Desai VG, Lee T, Delongchamp RR, Moland CL, Branham WS, Fuscoe JC, Leakey JEA: Development of mitochondria-specific mouse oligonucleotide microarray and validation of data by real-time PCR. Mitochondrion 2007,7(5):322–9. 10.1016/j.mito.2007.02.004View ArticlePubMedGoogle Scholar
- Desai VG, Lee T, Delongchamp RR, Leakey JEA, Lewis SM, Lee F, Moland CL, Branham WS, Fuscoe JC: NRTI-induced expression profile of mitochondrial genes in the mouse liver. Mitochondrion 2008,8(2):181–195. 10.1016/j.mito.2008.01.002View ArticlePubMedGoogle Scholar
- Joseph A, Lee T, Moland CL, Branham WS, Fuscoe JC, Leakey JEA, Allaben W, Lewis SM, Ali AA, Desai VG: Effect of usnic acid on mitochondrial functions as measured by mitochondria-specific oligonucleotide microarray in liver of B6C3F 1 mice. Biochemical Pharmacology 2008. submitted submittedGoogle Scholar
- Cherry CL, Wesselingh SL: Nucleoside analogues and HIV: the combined cost to mitochondria. J Antimicrob Chemother 2003, 51: 1091–1093. 10.1093/jac/dkg203View ArticlePubMedGoogle Scholar
- Barret B, Tardieu M, Rustin P, Lacroix C, Chabrol B, et al.: For the French perinatal cohort study group. Persistent mitochondrial dysfunction in HIV-1-exposed but uninfected infants: clinical screening in a large prospective cohort. AIDS 2003, 17: 1769–1785. 10.1097/00002030-200308150-00006View ArticlePubMedGoogle Scholar
- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS 2005, 102: 15545–15550. 10.1073/pnas.0506580102PubMed CentralView ArticlePubMedGoogle Scholar
- Efron B, Tibshirani R: On testing the significance of sets of genes. Tech report 2006. [http://www-stat.stanford.edu/~tibs/GSA/]Google Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.