Reporting FDR analogous confidence intervals for the log fold change of differentially expressed genes
© Jung et al; licensee BioMed Central Ltd. 2011
Received: 24 February 2011
Accepted: 15 July 2011
Published: 15 July 2011
Gene expression experiments are common in molecular biology, for example in order to identify genes which play a certain role in a specified biological framework. For that purpose expression levels of several thousand genes are measured simultaneously using DNA microarrays. Comparing two distinct groups of tissue samples to detect those genes which are differentially expressed one statistical test per gene is performed, and resulting p-values are adjusted to control the false discovery rate. In addition, the expression change of each gene is quantified by some effect measure, typically the log fold change. In certain cases, however, a gene with a significant p-value can have a rather small fold change while in other cases a non-significant gene can have a rather large fold change. The biological relevance of the change of gene expression can be more intuitively judged by a fold change then merely by a p-value. Therefore, confidence intervals for the log fold change which accompany the adjusted p-values are desirable.
In a new approach, we employ an existing algorithm for adjusting confidence intervals in the case of high-dimensional data and apply it to a widely used linear model for microarray data. Furthermore, we adopt a concept of different relevance categories for effects in clinical trials to assess biological relevance of genes in microarray experiments. In a brief simulation study the properties of the adjusting algorithm are maintained when being combined with the linear model for microarray data. In two cancer data sets the adjusted confidence intervals can indicate significance of large fold changes and distinguish them from other large but non-significant fold changes. Adjusting of confidence intervals also corrects the assessment of biological relevance.
Our new combination approach and the categorization of fold changes facilitates the selection of genes in microarray experiments and helps to interpret their biological relevance.
When simultaneously testing a large number of hypotheses, a high number of false positive test results is expected. This applies particularly in the case of high-dimensional data, where the number m of features is much larger than the available sample size n. Therefore, raw p-values are adjusted in order to control a false positive rate, for example the false discovery rate (FDR). The FDR was introduced by Benjamini and Hochberg  as the expected proportion of false positives among all positive test decisions. A prime example of high-dimensional data are gene expression levels from DNA microarray experiments. A frequent microarray study design is the comparison of gene expression levels among two distinct groups of tissue samples, for example from wild types and mutated subjects , resulting in several thousand p-values, one per gene. These raw p-values are then increased by an adjusting algorithm to reduce the number of false positive detections. Additionally, expression changes between the two groups are quantified by some difference statistic such as the log fold change. The reason for a difference statistic instead of a ratio statistic is that gene expression data are usually log-transformed by one of several data preprocessing steps. Published microarray experiments, where the log fold change is accompanied by a confidence interval, are rare, and the interval limits are usually not adjusted for multiplicity. These unadjusted confidence intervals are not comparable to FDR-adjusted p-values. One reason for that might be that FDR-adjustment procedures are based on ordered p-values (so called step-up or step-down procedures), while confidence intervals cannot be ordered according their level of significance. Therefore, an algorithm presented by Benjamini and Yekutieli  for adjusting confidence limits analogous to the p-values is also based on the order of their corresponding p-values. A similar algorithm was introduced by Jung et al.  who studied adjusted confidence intervals for the fold change of protein expression levels. Although the latter algorithm produces adjusted confidence intervals which match their related adjusted p-values in the sense that they lead to the same test decision, it has the drawback of gene-specific confidence levels. In contrast, the algorithm of Benjamini and Yekutieli  uses the same adjusted confidence level for all genes.
In order to evaluate the performance of their algorithm for adjusting confidence intervals, Benjamini and Yekutieli  introduced the false coverage-statement rate (FCR), which, in the context of DNA microarray experiments, is defined as the expected proportion of true log fold changes not covered by their confidence interval among all genes that have been detected as differentially expressed. They use this rate instead of the conditional coverage probability (CCP), i.e. the portion of true log fold changes covered by their confidence interval among all genes detected as differentially expressed. Both, FCR and CCP, are defined to be zero in the case that the number of genes detected as differentially expressed is zero, too. It can be shown that the CCP is dependent of the size of the fold changes, while the FCR is independent. Thus, studying the non-coverage of confidence intervals is more reasonable than studying their coverage.
Before building confidence intervals, genes have to be selected by statistical tests. A popular method framework for detecting differentially expressed genes among two distinct groups of samples is given by the linear models proposed by Smyth . These models pick up the ideas of Lönnstedt and Speed  who recommended to use a moderated t-statistic for testing the differential expression of each gene. They argue that a very small variance is expected for some genes, when testing thousands of genes simultaneously, though the difference of group means is inconsiderable for these particular genes. As a consequence, the classical t-statistic will become unreasonably large for these genes. Therefore, Smyth  employed an empirical Bayes approach where a prior distribution for the variance of genes is assumed and the observed standard errors of the estimated model coefficients are shrinked towards these prior values. In the case of two groups, the coefficient of the related linear model can be taken as an estimate for the log fold change. As a new approach, we use these estimates as well as their shrunken standard errors to construct confidence intervals for the fold change, and employ the algorithm of Benjamini and Yekutieli  to adjust these intervals to control the FCR. With this, we account for a problem mentioned by Efron  and Ghosh , that is the confidence intervals proposed by Benjamini and Yekutieli  tend to be too wide. Combining an adjusting algorithm with a multiple testing procedure was also proposed in the paper of Jung et al. . This former proposal had, however, the above mentioned drawback of non-uniform confidence levels and only uses simple confidence intervals based on the assumption of normal distribution. Thus, we present now an extended methodology.
Building confidence intervals for the log fold change adjusted by this new combination method is a helpful step for assessing genes which might be biologically relevant. In order to further categorize genes according their potential biological relevance, we adopt a concept developed to assess the clinical relevance of observed effects in clinical trials . According to this concept, genes are classified individually into one of four relevance categories, based on the location of their confidence intervals relative to the zero log fold change and a relevance threshold.
The outline of this article is as follows. In the methods section we describe the linear model of Smyth  for the case of a two group comparison. Next, the construction of unadjusted and adjusted confidence intervals is detailed, followed by a short description of their implementation in the free software R . Afterwards, the concept of relevance categories is explained. In the results section, we present the results of a brief simulation study with which we evaluate the behavior of the CCP and the FCR in a two group comparison. The benefit of incorporating adjusted confidence intervals for the log fold change and of categorizing genes by their potential biological relevance is further illustrated by two examples of microarray data, featuring gene expression levels observed in lung and rectal carcinomas, respectively. Finally we close with a discussion and some conclusions.
Selection of differentially expressed genes
where are the shrunken standard errors as mentioned in the introduction. The moderated t-statistic can be shown to follow a t-distribution with augmented degrees of freedom f*. Estimation of β j and the determination of f* is explicitly detailed in Smyth .
Consider the object fit to be the output of the model fit for a two-group comparison using the limma-package for the free software R. In the R-environment, the estimated coefficients, , can be obtained by
> beta = fit$coefficients[,2]
The vector of standard errors can be created by the following line
> se = sqrt(fit$s2.post) * sqrt(fit$cov. coefficients [2,2])
At last, the degrees of freedom f* are obtained by
> dof = fit$df.prior + fit$df.residual
With these data vectors, one can easily implement the unadjusted and adjusted confidence intervals detailed in the subsection below. An example R-code is also provided in additional file 1
A detailed overview of other adjusting methods can be found in Dudoit et al. .
Construction and adjustment of confidence intervals
where t1 - α/2denotes the (1 - α/2)-quantile of the t-distribution with f* degrees of freedom.
where k denotes the largest k such that .
Assessment of biological relevance
Log fold change statistically significant but not biologically relevant: The confidence interval lies completely between zero and ρ.
Log fold change statistically significant but probably not biologically relevant: The lower confidence limit is larger than zero, the log fold change lies between zero and ρ, and the upper confidence limit exceeds ρ.
Log fold change statistically significant and probably biologically relevant: The lower confidence limit is larger than zero, log fold change and upper confidence limit exceedρ.
Log fold change statistically significant and biologically relevant: The whole confidence intervals exceeds the threshold ρ.
It was also pointed out by Victor  that knowledge about the location of a confidence interval relative to a threshold allows a more differentiated interpretation of test results. In the context of selecting genes in a two-group microarray experiment, potentially biologically interesting gene could be missed, if only those genes which are significantly larger than a threshold (i.e., those of category D) are selected. To illustrate this, we will perform a pathway analysis subsequent to categorization. More concrete, we study the association between the genes in the different categories and biological pathways defined by Gene Ontology (GO) terms . A GO term covers information about cellular components, biological processes and molecular functions. Using for example Fisher's exact test, GO-analysis studies whether a certain biological function is associated with more genes among the selected ones than would be expected. We will perform GO-analysis using the R-package topGO.
where ⊗ denotes the Kronecker product and Jm/5a (m/5 × m/5)-matrix with all elements being equal 1. In order to include different variances for the genes and to obtain the covariance matrix ∑, the diagonal elements of the (m × m)-matrix were than exchanged by a vector of length m with elements increasing evenly from 1 to 2. In each simulation run, the FCR and the CCP were determined.
In contrast, when studying the behavior of the FCR (Figure 1, right), it can be observed, that this rate does not seriously depend on the size of the fold change. Furthermore, the rate maintains a level of 5% in essence. Of course, the rate clearly falls below this level when using the more conservative BY-method. Thus, it can also be seen that coverage rate and non-coverage rate are not equivalent when being regarded for genes that are detected as differentially expressed. In summary, the behavior of the FCR shows that among all genes detected as differentially expressed, only a small portion of confidence intervals does not cover its true fold change.
Example data set on rectal cancer
Example data set on lung adenocarcinoma
The misleading character of fold changes without confidence intervals can be observed by another example. Beer et al.  studied gene expression levels in samples of 86 patients with lung adenocarcinoma.
Assessment of biological relevance
For 7858 genes of this experiment pathway annotation in terms of Gene Ontologies was available. For each relevance category we tested whether significantly more genes were associated with the different GO terms than would be expected in comparison with the other categories. GO analysis identified 130 GO terms which were significantly associated with genes in category D, 64 with genes in category C, 10 and 23 with genes in categories B and A. I.e., the number of significantly associated GO terms increased with the relevance of the category.
In many microarray experiments the relevance threshold for the log fold change is only loosely established and not fixed before the experiment takes place. Experimenters can employ plots like those shown in Figure 4 to assess how many genes fall in which relevance category given a certain threshold. This might help to find an adequate threshold, which can than be used in related future experiments.
In practice, gene selection in DNA microarray experiments is based on several aspects. It is common that such a group comparison yields several hundreds of significant genes, even with FDR-adjusted p-values. This result of a multiple testing procedure can, however, only be seen as a first screening. Particularly, gene selection is usually not only based on pure statistical results, i.e. whether a gene is significantly differentially expressed between two distinct groups of samples. Among this bulk of significant genes, those are selected for further laboratory validation which are known to be related to molecular pathways associated with the studied biological system. In addition, the strength of the expression change, in terms of the log fold change, is considered for selecting relevant features. In this context, the potentially divergent statements of p-value and log fold change can be confusing for laboratory decision makers. The studied examples in this article have shown, that genes with a large fold change can lack in significance, and vice versa.
One solution to interpret test results and fold changes together are volcano plots, where the log p-values are plotted versus log fold changes (see for example Cui and Churchill ). Using volcano plots one can easily see which genes are significant on the one hand and highly regulated on the other hand. However, this information gets lost in a tabular representation of test results. Furthermore, a volcano plot cannot be read easily when points overlap. Therefore, we decided to employ confidence intervals which express by their length the high variation behind some genes with a large fold change and can thus help to solve the misleading impression of high relevance.
The four categories of relevance, as proposed by Jones  may further assist experimenters to rate the selected genes. In this context, one could eventually replace the term 'biologically relevant' by 'biologically interesting effect size', because larger fold changes might not necessarily have a large impact within a molecular pathway. Performing a GO analysis onto the rectal cancer data data subsequent to differential testing and relevance categorization we could assign pathway information to the different relevance categories. If just volcano plots were employed to assess the biological relevance of significant genes one could perhaps miss interesting pathway information associated with genes of class C ('probably biologically relevant'). With a volcano plot only pathways associated with genes in class D ('biologically relevant') would be taken into account. Thus, the categorization principle opens up insight into gene-pathway associations about those genes which are not highly significant but which may not be completely unimportant as well. Regarding the description of the four categories it should, however, be mentioned that the word 'probably' is only appropriate if the distribution of the log fold change estimates is symmetric. Symmetry is needed to ensure that the estimate exceeds the true value 50% of the time.
In summary, our improved combination approach is more adequate for microarray data than a similar approach described previously. Together with the proposed categorization of fold changes it facilitates the selection of genes in microarray experiments and helps to interpret their biological relevance. Although, some mathematical shortcomings of using the FCR have been discussed in several published comments (in the same volume as the article of Benjamini and Yekutieli), the practical use of the adjusting algorithm becomes more evident in our examples.
Acknowledgements and Funding
The authors thank Gordon Smyth for discussions and help with the 'limma'-package and the reviewers for additional references. This work was supported by the Deutsche Forschungsgemeinschaft (KFO 179) and by the German Federal Ministry of Education and Research program Medical Systems Biology (BreastSys).
- Benjamini Y, Hochberg Y: Controlling the false discovery rate: A practical and powerful approach to multiple testing. J Roy Stat Soc B 1995, 75: 289–300.Google Scholar
- Gaedcke J, Grade M, Jung K, Camps J, Jo P, Emons G, Gehoff A, Sax U, Schirmer M, Becker H, Beissbarth T, Ried T, Ghadimi BM: Mutated KRAS Results in Overexpression of DUSP4, a MAPKinase Phosphatase, and SMYD3, a Histone Methyltransferase, in Rectal Carcinomas. Genes Chromosomes Cancer 2010, 49: 1024–1034. 10.1002/gcc.20811PubMed CentralView ArticlePubMedGoogle Scholar
- Benjamini Y, Yekutieli D: False Discovery Rate-Adjusted Multiple Confidence Intervals for Selected Parameters. J Am Stat Assoc 2005, 100: 71–81. 10.1198/016214504000001907View ArticleGoogle Scholar
- Jung K, Poschmann G, Podwojski K, Eisenacher M, Kohl M, Pfeiffer K, Meyer HE, Stühler K, Stephan C: Adjusted Confidence Intervals for the Expression Change of Proteins Observed in 2-Dimensional Difference Gel Electrophoresis. J Proteomics Bioinform 2009, 2: 78–87. 10.4172/jpb.1000064View ArticleGoogle Scholar
- Smyth G: Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments. Stat Appl Genet Mol Biol 2004, 3: Article 3.Google Scholar
- Lönnsted I, Speed TP: Replicated Microarray Data. Stat Sinica 2002, 12: 31–46.Google Scholar
- Efron B: Prediction and Effect Size Estimation. In Large-Scale Inference. New York: Cambridge University Press; 2010:211–241.View ArticleGoogle Scholar
- Ghosh D: Empirical Bayes Method for Estimation and Confidence Intervals in High Dimensional Problems. Statistica Sinica 2009, 19: 125–143.Google Scholar
- Jones PW: Interpreting thresholds for a clinically significant change in health status in asthma and COPD. Eur Respir J 2002, 19: 396–404.Google Scholar
- R Development Core Team:R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria; 2010. [http://www.R-project.org]Google Scholar
- Benjamini Y, Yekutieli D: The control of the false discovery rate in multiple testing under dependency. Ann Stat 2001, 29: 1165–1188. 10.1214/aos/1013699998View ArticleGoogle Scholar
- Dudoit S, Shaffer JP, Blodrick JC: Multiple Hypothesis Testing in Microarray Experiments. Stat Sci 2003, 18: 71–103. 10.1214/ss/1056397487View ArticleGoogle Scholar
- Kieser M, Hauschke D: Assessment of clinical relevance by considering point estimates and associated confidence intervals. Pharm Stat 2005, 4: 101–107. 10.1002/pst.161View ArticleGoogle Scholar
- Victor N: On Clinically Relevant Differences and Shifted Nullhypotheses. Method Inform Med 1987, 26::109–116.Google Scholar
- Alexa A, Rahnenfuehrer J, Lengauer T: Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 2006, 22: 1600–1607. 10.1093/bioinformatics/btl140View ArticlePubMedGoogle Scholar
- Lips EH, van Eijk R, de Graaf EJR, Oosting J, de Miranda NFCC, van de Velde CJ, Eilers PHC, Tollenaar RAEM, van Wezel T, Morreau H: Integrating chromosomal aberrations and gene expression profiles to dissect rectal tumorigenesis. BMC Cancer 2008, 28: 314.View ArticleGoogle Scholar
- Beer DG, Kardia SLR, Huang CH, Giordano TJ, Levin AM, Misek DE, Lin L, Chen G, Gharib TG, Thomas DG, Lizyness ML, Kuick R, Haysaka S, Taylor JMG, Iannettoni MD, Orringer MB: Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med 2002, 8: 816–824.PubMedGoogle Scholar
- Cui X, Churchill GA: Statistical tests for differential expression in cDNA microarray experiments. Genome Biology 2003, 4: 210. 10.1186/gb-2003-4-4-210PubMed CentralView ArticlePubMedGoogle Scholar
- Lewin A, Richardson S, Marshall C, Glazier A, Aitman T: Bayesian Modeling of Differential Gene Expression. Biometrics 2006, 62: 1–9.View ArticlePubMedGoogle Scholar
- Bochkina N, Richardson S: Tail Posterior Probability for Inference in Pairwise and Multiclass Gene Expression Data. Biometrics 2007, 63: 1117–1125. 10.1111/j.1541-0420.2007.00807.xView ArticlePubMedGoogle Scholar
- van de Wiel MA, Kyung KI: Estimating the False Discovery Rate Using Nonparametric Deconvolution. Biometrics 2007, 63: 806–815. 10.1111/j.1541-0420.2006.00736.xView ArticlePubMedGoogle Scholar
- McCarthy DJ, Smyth G: Testing significance relative to a fold-change threshold. Bioinformatics 2009, 25::765–771. 10.1093/bioinformatics/btp053View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.