A statistical method for the conservative adjustment of false discovery rate (qvalue)
 Yinglei Lai^{1}Email author
https://doi.org/10.1186/s1285901714746
© The Author(s) 2017
Published: 14 March 2017
Abstract
Background
qvalue is a widely used statistical method for estimating false discovery rate (FDR), which is a conventional significance measure in the analysis of genomewide expression data. qvalue is a random variable and it may underestimate FDR in practice. An underestimated FDR can lead to unexpected false discoveries in the followup validation experiments. This issue has not been well addressed in literature, especially in the situation when the permutation procedure is necessary for pvalue calculation.
Results
We proposed a statistical method for the conservative adjustment of qvalue. In practice, it is usually necessary to calculate pvalue by a permutation procedure. This was also considered in our adjustment method. We used simulation data as well as experimental microarray or sequencing data to illustrate the usefulness of our method.
Conclusions
The conservativeness of our approach has been mathematically confirmed in this study. We have demonstrated the importance of conservative adjustment of qvalue, particularly in the situation that the proportion of differentially expressed genes is small or the overall differential expression signal is weak.
Keywords
Background
Microarray and sequencing technologies have been widely used in genomewide expression experimental for biological and medical studies [1–5]. After screening a large number of genes simultaneously, we expect to achieve new biological discoveries. In these situations, an important statistical concept is multiple hypothesis testing, in which many statistical tests are conducted at the same time. Then, a detection of gene with relatively small pvalue may be actually a false discovery. Since the introduction of microarray technology, the concept of false discovery rate (FDR) and its related statistical methods have been well developed [6, 7]. qvalue is a statistical method for the estimation of FDRs [8]. It has been widely used in the analysis of microarray and sequencing data.
Since qvalue is an estimation method, it is possible that it underestimates true FDRs. Then, an undesirable consequence is that many genes detected with low qvalue cannot be validated in a followup experiment. Therefore, it is important to avoid underestimations of FDRs. However, there is a lack of statistical method to address this important issue. Furthermore, in many situations, qvalues are calculated based on pvalues that are evaluated based on a permutation procedure (due to unknown data distributions or nontraditional test statistics). Then, these pvalues are also estimated. This increases the complexity of FDR underestimations. It is necessary to adjust the underestimation of FDRs in a comprehensive approach.
In this study, we proposed a statistical method for the conservative adjustment of qvalue, which is one of the most frequently used procedure for estimating FDRs. We first reviewed the concepts of multiple hypothesis testing, FDR and qvalue. Then, we introduced a concept of conservative adjustment. Based on the theory of rankbased quantiles, we described a nonparametric approach and we conducted simulation and application studies to illustrate its usefulness.
Methods
Multiple hypothesis testing and false discovery rate
A summary in the situation of multiple hypothesis testing
True null  False null  Total  

Negative  U  W  m−R 
Positive  V  S  R 
Total  m _{0}  m−m _{0}  m 
The traditional familywise error rate (FWER) provides a strong control on V [9]. Since FWER is too conservative (for example, requiring extremely small pvalue threshold), it is usually difficult to claim statistical significance. The false discovery rate (FDR) has been proposed to empirically evaluate the proportion of false positives among the claimed positives: V/R [6]. The concept of FDR and its related estimations have been widely used in the analysis of genomewide expression data collected by microarray or RNA sequencing technologies. Particularly, qvalue [8] is one of the most widely used method for FDR estimation.
qValue
Conservative adjustment of qvalue
When the permutation method [10] is used to evaluate pvalues, it is possible to obtain underestimated results. We have proposed a conservative adjustment of permutation pvalues to address this issue [11].

\(\hat {\pi }_{0}\) for π _{0},

\(\hat {\alpha }\) for α,

(#{T≥t}) for m Pr(T≥t).
From above, notice that m Pr(T≥t) must still be empirically estimated even when α can be theoretically determined. To address the underestimation of FDR (from qvalue), our solution is to consider a conservative adjustment of q(t).
If (1−a _{0})(1−a _{1})(1−a _{2})≥1−a, then we claim that \(\hat {\pi }_{0c} \hat {\alpha }_{c}(t) / \hat {\gamma }_{c}(t)\) satisfies the requirement for \(\hat {q}_{c}(t)\), given π _{0},α(t)>0 and \(\mathbf {Pr}[\hat {\gamma }_{c}(t)>0]=1\). The mathematical proof is given as an Appendix 1.
Remark
\(\hat {q}_{c}(t)\) is actually an upper confidence limit of \(\tilde {q}(t)\). However, due to the difficulty in the accurate estimation of π _{0} (π _{0} is usually conservatively estimated), it is difficult to calculate a lower confidence limit of \(\tilde {q}(t)\) in practice.
Conservative estimation of π _{0}
When \(\hat {\pi }_{0c}\) is the convest method (or other similar methods). Then, we discuss \(\hat {\alpha }_{c}(t)\) and \(\hat {\gamma }_{c}(t)\), which are closely related to some rankbased quantiles.
Conservative adjustment of rankbased quantiles
Note that, if the observed test scores are not included in the pool of permuted test scores, then the permutation pvalue of T=t will be \(\hat {\alpha }(T^{0}_{(k)})=(rk+1)/r\), where \(T^{0}_{(k)} \ge t\) is the closest order statistic to t. Therefore, the permutation pvalue of T=t can also be considered as a rankbased quantile estimator.
Since \(\hat {\gamma }\) and \(\hat {\alpha }\) are both random variables, it is possible for them to underestimate their target parameters. The above discussion shows that, in practice, \(\hat {\gamma }\) and \(\hat {\alpha }\) are actually both rankbased quantile estimators. Based on the theory of order statistics [15], we have proposed a conservative adjustment for this type of estimator [11]. Such an adjustment requires no parametric assumption on the distribution of test statistic and the solution can be expressed by a normalised incomplete beta function. Therefore, based on this adjustment, we can otain \(\hat {\alpha }_{c}(t)\) and \(\hat {\gamma }_{c}(t)\) such that \(\mathbf{Pr}[\hat {\alpha }_{c}(t) \ge \alpha (t)] \ge 1a_{1}\) and \(\mathbf{Pr}[\hat {\gamma }_{c}(t) \le \gamma (t)] \ge 1a_{2}\).
Results
A simulation study
To understand how likely the qvalue method underestimates the true false discovery rate, we conducted a simulation study. We choose the normal distributions for simulating expression data and the Student’s t for differential expression. In this way, we could evaluate the true false discovery rate theoretically.
Ten thousand genes were simulated for two sample groups with sample size 5 for each group (10 for total sample size). For nondifferentially expressed genes, the expression data were simulated from N(0,1) for both groups. Then, we considered four scenarios. For the first simulation scenario, the proportion of differentially expressed genes was 10%; the expression data for differentially expressed genes were simulated from N(0,1) and N(1,1) for the first and the second sample groups, respectively (Δ=1). For the second simulation scenario, the proportion of differentially expressed genes was 10%; the expression data for differentially expressed genes were simulated from N(0,1) and N(2,1) for the first and the second sample groups, respectively (Δ=2). For the third simulation scenario, the proportion of differentially expressed genes was 20%; the expression data for differentially expressed genes were simulated from N(0,1) and N(1,1) for the first and the second sample groups, respectively (Δ=1). For the last simulation scenario, the proportion of differentially expressed genes was 20%; the expression data for differentially expressed genes were simulated from N(0,1) and N(2,1) for the first and the second sample groups, respectively (Δ=2).
For all different scenarios, the Student’s ttest was used for differential expression analysis. To evaluate pvalues, we performed all possible (126) permutations for each simulated gene and pooled all 1,260,000 permuted test scores together as the empirical null distribution. (In practice, the underlying data distributions are unknown and the permutation procedure is widely used.) For each scenario, we conducted 100 repetitions to understand the variations in the simulation results.
An artificial illustration
A conservative adjustment of false discovery rate (FDR) can be useful in practice, especially before the experimental validation of genes identified from a genomewide expression study. For example, based on a microarray or RNAseq study, one may want to validate a few genes with qvalue less than 10%. However, it may be surprising that much less genes can be confirmed after the RTPCR validation. (The validation result is beyond our expectation based on 5% estimated FDR.) This hypothetical situation would be an example of underestimation of FDR.
Three applications
We applied our method to two genomewide expression data sets. The first one was a microarray data set and it was collected for a diabetes study. It is wellknown that differential expression signals are usually weak in diabetes studies. When the sample size is not relatively large, it is usually difficult to detect true differentially expressed genes. (Due to the inflated false positive rates from the multiple hypothesis testing for a large number of genes, genes with seemingly small FDRs are likely noise genes). It is interesting to understand the adjustment effects from our method for this scenario. The second one was a RNA sequencing (RNAseq) data set and it was collected for a prostate cancer study. It is also wellknown that differential expression signals are usually strong in cancer studies. Genomewide expression data for different types of cancer have been increasingly collected in The Cancer Genome Atlas project [3]. The current sample sizes in many TCGA cancer studies are relatively large. Then, it is also interesting to understand the adjustment effects from our method for such as scenario.
In practice, it is not feasible to calculate the theoretical true FDRs. The curve of estimated FDR vs. number of identified genes is widely used to for a summary of differential expression analysis. (In this curve, the yvalue is a specific FDR and the xvalue is the related number of genes with the specific FDR). Our application results can also be summarized in term of this type of curve.
For the second RNAseq genomewide expression data set for a prostate cancer study [3], there were 52 normal subjects and 445 tumor subjects (at the time of data download for this analysis). There are 20,531 genes. Since the sample size was large and the RNAseq expression profiles were counttype data, we used the nonparametric Wilcoxon rank sum test with its theoretical pvalues calculation. Therefore, there was no need for an adjustment of pvalues (i.e. a _{1}=0). Then, we set a _{0}=0 (since π _{0} is already conservatively estimated by the convest method [14]) and a _{2}=1−0.95 (then a=0.05). Figure 4 b shows that the qvalues (estimated FDRs) can be extremely low and lots of genes can be detected. After our conservative adjustment, Fig. 4 b shows that the curve of adjusted qvalues almost overlaps with the curve of original qvalues. This comparison implies that many genes were truly differentially expressed genes and the detection of these genes by low qvalues could be highly confident. After checking literature for top ranked genes, many of them have been studied to be either directly or indirectly related to prostate cancer or general cancer diseases (details not shown).
For the third microarray genomewide expression data set for a pancreaticislet study [17], there were 7 normal patients and 5 type 2 diabetic patients. There are 44,928 genes and ESTs. Based on all possible permutations and the related Student’s ttest calculations. We set a _{0}=0 (since π _{0} is already conservatively estimated by the convest method [14]), \(a_{1}=1\sqrt {0.95}\) and \(a_{2}=1\sqrt {0.95}\) (then a=0.05). Figure 4 c shows that the qvalues (estimated FDRs) can only be as low as approximately 0.4 (about 20 genes). However, after our conservative adjustment, Fig. 4 c shows that all the conservatively adjusted qvalues are greater than 0.5. This comparison implies that more top ranked genes were likely noise genes. The only two genes with qvalue less than 0.4 are mRNAs for ARNT and APCDD1. Although ARNT has been widely studied for its association with diabetes diseases, no literature was found to support the association between APCDD1 and diabetes diseases.
Discussion
For our method, there are three components that can be adjusted separately. The first component is on π _{0} estimation. Since the estimators for π _{0} are usually conservative (especially for the convest method [14]), we do not suggest any adjustment for this component according to our experience. The last component is on the number of identified genes. It can be adjusted based on the theory of order statistic. The second component is on pvalues. When theoretical pvalue can be obtained, it is not necessary to adjust this component. For permutation pvalues, an adjustment can be also performed based on the theory of order statistic. Notice that the number of permutations is also important. In practice, we need to determine it before data analysis. When the sample size is relatively small, we can enumerate all possible permutations. When the sample size is relatively large, we can set the number of permutations as large as possible according to the computing power.
A clear advantage of our approach is that there are rigorous mathematical theories to support it. Furthermore, no distribution assumptions are required for our conservative adjustment. However, the adjustment may be overconservative. If we could further improve the control of upper/lower bounds (as shown in our mathematical proof), then less conservative adjustment of qvalue could be achieved. Furthermore, an independence assumption is required. It is wellknown that genes work with each other during molecular and cellular processes. It would be an interesting future research topic to address the dependence among genes. Therefore, it will be our future research topics to investigate possible better upper/lower bounds for the conservative adjustment of qvalue, as well as the impact of dependence on the conservative adjustment of qvalue.
Conclusions
In this study, we proposed a statistical method for the conservative adjustment of qvalue, which is widely used to estimate false discovery rate (FDR) in practice. We provided a mathematical proof to confirm the conservativeness of our approach. We conducted simulation studies to understand how likely the qvalue method would underestimate FDRs. From our simulation results, both the proportion of differentially expressed genes and the overall differential expression signal were two important factors. When both of them were relatively small/weak, it was likely to identify genes with underestimated FDRs. Our first application was based on a microarray diabetes study data set with relatively small sample size (and weak differential expression signals). Our third application was based on a microarray pancreatic islet study data set with relatively small sample size (and also weak differential expression signals). The results were consistent with the conclusion from our simulation studies. Our second application was based on a RNAseq prostate cancer study data set with relatively large sample size (and strong differential expression signals). According to the results, the conservatively adjusted qvalues were close to the originally unadjusted qvalues.
Appendix 1
Remark
For any random variable X, Y and a constant c, Pr(X Y≥cX≥1)≥Pr(Y≥c) since Y≥c⇒X Y≥c given X≥1 ({Y≥c} is a subset of {X Y≥c} when X≥1).
Appendix 2
Notes
Declarations
Acknowledgements
Not applicable.
Declarations
This article has been published as part of BMC Bioinformatics Volume 18 Supplement 3, 2017. Selected articles from the 15th Asia Pacific Bioinformatics Conference (APBC 2017): bioinformatics. The full contents of the supplement are available online https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume18supplement3.
Funding
This work was partially supported by the NIH grant GM092963 (Y.Lai). The publication costs were funded by the Department of Statistics at The George Washington University.
Availability of data and material
Rfunctions for calculating conservatively adjusted qvalues are provided in an Appendix 1.
Authors’ contributions
YL conceived of the study, developed the methods, performed the statistical analysis, and drafted the manuscript.
Competing interests
The author declares that they have no competing interests.
Consent for publication
The author agrees the consent for publication.
Ethics approval and consent to participate
Not applicable.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
 Schena M, Shalon D. Davis RW and Brown PO. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 1995; 270:467–70.View ArticlePubMedGoogle Scholar
 Lockhart D, Dong H, Byrne M, Follettie M, Gallo M, Chee M, Mittmann M, Wang C, Kobayashi M. Horton H, and Brown E. Expression monitoring by hybridization to highdensity oligonuleotide arrays. Nat Biotechnol. 1996; 14:1675–80.View ArticlePubMedGoogle Scholar
 The Cancer Genome Atlas Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008; 455:1061–8.View ArticleGoogle Scholar
 Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science. 1344; 320.Google Scholar
 Wilhelm BT, Marguerat S, Watt S, Schubert F, Wood V, Goodhead I, Penkett CJ, Rogers J, Bahler J. Dynamic repertoire of a eukaryotic transcriptome surveyed at singlenucleotide resolution. Nature. 2008; 453:1239–43.View ArticlePubMedGoogle Scholar
 Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R Stat Soc Series B. 1995; 57:289–300.Google Scholar
 Ghosh D. Incorporating the empirical null hypothesis into the BenjaminiHochberg procedure. Stat Appl Genet Mol Biol. 2012; 11(Online):1544–6115.Google Scholar
 Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc Natl Acad Sci U S A. 2003; 100:9440–5.View ArticlePubMedPubMed CentralGoogle Scholar
 Wright SP. Adjusted Pvalues for simultaneous inference. Biometrics. 1992; 48:1005–13.View ArticleGoogle Scholar
 Dudoit S, Shaffer JP, Boldrick JC. Multiple hypothesis testing in microarray experiments. Stat Sci. 2003; 18:71–103.View ArticleGoogle Scholar
 Lai Y. Conservative adjustment of permutation pvalues when the number of permutations is limited. Int J Bioinforma Res Appl. 2007; 3:536–46.View ArticleGoogle Scholar
 Markitsis A, Lai Y. A censored beta mixture model for the estimation of the proportion of nondifferentially expressed genes. Bioinformatics. 2010; 26:640–6.View ArticlePubMedPubMed CentralGoogle Scholar
 Cheng Y, Gao D, Tong T. Bias and variance reduction in estimating the proportion of truenull hypotheses. Biostatistics. 2015; 16:189–204.View ArticlePubMedGoogle Scholar
 Langaas M, Ferkingstad E, Lindqvist B. Estimating the proportion of true null hypotheses, with application to DNA microarray data. J R Stat Soc Series B. 2005; 67:555–72.View ArticleGoogle Scholar
 Balakrishnan N, Cohen AC. Order statistics and inference. San Diego: Academic Press, INC.; 1991.Google Scholar
 Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J, Puigserver P, Carlsson E, Ridderstrale M, Laurila E, Houstis N, Daly MJ, Patterson N, Mesirov JP, Golub TR, Tamayo P, Spiegelman B, Lander ES, Hirschhorn JN, Altshuler D, Groop L. PGC1 αresponse genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet. 2003; 34:267–73.View ArticlePubMedGoogle Scholar
 Gunton JE, Kulkarni RN, Yim SH, Okada T, Hawthorne WJ, Tseng YH, Roberson RS, Ricordi C, O’Connell PJ, Gonzales FJ, Kahn CR. Loss of ARNT/HIF1 β mediates altered gene expression and pancreaticislet dysfunction in human type 2 diabetes. Cell. 2005; 122:337–49.View ArticlePubMedGoogle Scholar