Metaanalysis based on weighted ordered Pvalues for genomic data with heterogeneity
 Yihan Li^{1} and
 Debashis Ghosh^{1}Email author
DOI: 10.1186/1471210515226
© Li and Ghosh; licensee BioMed Central Ltd. 2014
Received: 17 April 2014
Accepted: 11 June 2014
Published: 28 June 2014
Abstract
Background
Metaanalysis has become increasingly popular in recent years, especially in genomic data analysis, due to the fast growth of available data and studies that target the same questions. Many methods have been developed, including classical ones such as Fisher’s combined probability test and Stouffer’s Ztest. However, not all metaanalyses have the same goal in mind. Some aim at combining information to find signals in at least one of the studies, while others hope to find more consistent signals across the studies. While many classical metaanalysis methods are developed with the former goal in mind, the latter goal has much more practicality for genomic data analysis.
Results
In this paper, we propose a class of metaanalysis methods based on summaries of weighted ordered pvalues (WOP) that aim at detecting significance in a majority of studies. We consider weighted versions of classical procedures such as Fisher’s method and Stouffer’s method where the weight for each pvalue is based on its order among the studies. In particular, we consider weights based on the binomial distribution, where the median of the pvalues are weighted highest and the outlying pvalues are downweighted. We investigate the properties of our methods and demonstrate their strengths through simulations studies, comparing to existing procedures. In addition, we illustrate application of the proposed methodology by several metaanalysis of gene expression data.
Conclusions
Our proposed weighted ordered pvalue (WOP) methods displayed better performance compared to existing methods for testing the hypothesis that there is signal in the majority of studies. They also appeared to be much more robust in applications compared to the rth ordered pvalue (rOP) method (Song and Tseng, Ann. Appl. Stat. 2014, 8(2):777–800). With the flexibility of incorporating different pvalue combination methods and different weighting schemes, the weighted ordered pvalues (WOP) methods have great potential in detecting consistent signal in metaanalysis with heterogeneity.
Keywords
Fisher’s combined probability test Metaanalysis Ordered pvalues Weighted order statisticBackground
Metaanalysis has long been used to integrate data and/or results from multiple studies targeting the same questions. It is commonly used in many areas of statistical applications such as clinical studies and psychology experiments. In recent years, metaanalysis has been frequently adopted in genomic data analysis, due to the fast development of highthroughput technology and the vast amounts of data available in public databases.
Many metaanalysis methods have been developed throughout the years. Roughly speaking, there are two main approaches to metaanalysis methods Song and Tseng [1]. The first directly combines the pvalues from the studies, while the second attempts to model the data or the effect sizes from the combined studies. The former includes methods such as the Fisher’s combined probability test [2] and the Stouffer’s Ztest [3], as well as weighted variations of these classical tests. The latter includes a variety of fixed effects and random effects models, such as GeneMeta [4]. Each approach has its advantages and disadvantages. The pvalue combining methods are relatively flexible in that they require minimal information and assumptions from the studies. In this paper we will mainly focus on methods that directly combine the pvalues.
Most of the traditional metaanalysis methods (e.g. Fisher’s and Stouffer’s methods) aim at testing the alternative hypothesis that at least one of the studies is nonnull. While this aligns with earlier goals of metaanalyses to gain power in detecting signals by combining multiple studies, it is frequently not the case with genomic data. In metaanalysis of genomic studies for example, the goal is often to identify genes that are differentially expressed in a consistent pattern across multiple studies. The extreme of this would be to test for the alternative that the null can be rejected for all the studies. A solution to this extreme alternative dates back to the maxP method by Wilkinson [5]. However, the maxP method is often considered too conservative. A recent approach by Phillips and Ghosh [6] improves the power of testing for this disjunction of nulls when the rejection of all pvalues associated with a gene is required. In practical metaanalyses, the goal of rejecting all studies may be considered too extreme. Ideally, we would want to target at detecting consistent signals across studies while avoiding being overly exclusive. This issue has gained attention in recent years, and a number of authors have tried to address this problem. Benjamini and Heller [7] discussed a framework for testing partial conjunction hypotheses, where they test for the alternative that at least u out of n null hypotheses are false against the partial conjunction null that no more than n−u+1 of the null hypotheses are true. Song and Tseng [1] proposed the rth ordered pvalue (rOP) method that aims at testing the alternative hypothesis that there is signal in at least a given percentage of studies. Other methods exist that address this problem from different approaches, such as RankProd by Hong et al. [8] that looks for consistently highly ranked genes, and a weighted approach by Li and Ghosh [9] that weights genes by its expression consistency across studies.
We consider the problem of detecting signals in the majority of studies. Our approach adopts one aspect of the rOP method Song and Tseng [1] in that we also consider ordered pvalues. But instead of using a single rth ordered pvalue as the statistic (as rOP does), we combine all or a subset of the ordered pvalues while weighting them based on their order (weighted ordered pvalues, WOP). Pvalues closer to the median are highly weighted and the smallest/largest pvalues are downweighted. The idea is that among the collection of pvalues, the median pvalues are likely to be a better reflection of the behavior of the majority of studies than the smallest or largest pvalues. Olkin and Saner [10] discussed a trimmed Fisher’s procedure that leaves out a number of the smallest and/or largest pvalues from the calculation of Fisher’s statistic to remove the effect of possible aberrant extremes. In our consideration, we still keep the smallest/largest pvalues because they do carry certain information, but we downweight them because they may be relatively less relevant when considering the “majority” of studies. To reflect the upweighting of the medians and downweighting of the extremes, we calculated our weights based on the binomial distribution. To summarize the weighted ordered pvalues, we mainly considered Fisher’s statistic and Stouffer’s Ztest. We also explored some other statistics, such as generalized Fisher’s test by Lancaster [11]. In general, other summary statistics can be used under this framework as well.
While many weighted variations of Fisher’s statistic and Stouffer’s statistic have been developed throughout the years, most of them distribute weights according to the sample sizes and/or effect sizes of the studies, or other similar considerations (e.g. Mosteller and Bush [12], Won et al. [13], Makambi [14]). Li and Tseng [15] proposed an interesting adaptively weighted statistic, where the weights are used to maximize the significance of the summary statistic. However, to the best of our knowledge, none of the weighting schemes are based on the ordered pvalues, which is what makes our method unique. Xie et al. [16] discussed a metaanalysis approach using confidence distributions, where they incorporated the use of medians and kernel functions, but their approach is under a completely different framework from ours.
By incorporating more than one order statistic into our summary statistic, our method can be considered an expansion of the rth ordered pvalue (rOP) method Song and Tseng [1]. In general, both the rOP method and the original summary statistic we use (e.g. Fisher’s or Stouffer’s method) are special cases under the WOP framework  one having all the weight on a single ordered pvalue and the other having evenly distributed weights. However, our WOP methods have respective advantages over both the traditional summary statistics and the rOP method. Compared to the traditional summary statistics, the WOP methods better focus on detecting signal in a majority of studies. On the other hand, the WOP methods appear to be more robust compared to the rOP method. These observations are based upon results from simulation studies as well as data applications.
Methods
Hypothesis settings for metaanalysis
When r=1, H S_{1} is the classical setting of testing for nonzero effect size in at least one study against the conjunction of nulls. H S_{1} is the hypothesis setting that Fisher’s method, Stouffer’s Ztest and many other traditional methods test for. When r=K, H S_{ K } tests for the alternative that all the studies have nonzero effect size. For instance, the maxP method [5] tests for H S_{ K }. When 1<r<K,H S_{ r } provides a compromise between the two aforementioned hypothesis settings, and tests for at least a prespecified number of nonzero effects. For a given r, the rOP (rth ordered pvalue) method Song and Tseng [1] is used to test for H S_{ r }.
In this paper, we test for nonzero effect sizes in a majority of studies against the null that the effects sizes are zero in all studies. Thus we are testing against the conjunction of nulls while trying to focus on a certain subset of the nonnull space. In general, the hypothesis for our metaanalysis approach can be considered to fall under the H S_{ r } setting. Song and Tseng [1] suggested a few datadriven methods for selecting r. To prevent any potential issues of post hoc choices of r, we choose to fix r before any analysis is conducted. While it is hard to specify what the term “majority” exactly means, as a general rule, we choose to test the hypothesis setting H S_{ m }, where m=⌈K/2⌉, ⌈x⌉ being the smallest integer no less than x. Essentially we are targeting the alternative that at least half of the studies have nonzero effect sizes. Under our weighted ordered pvalues (WOP) framework, we are able to develop methods for other hypothesis settings with r ranging from m+1 to K. But in general we hope to provide a simple to use method without having to put too much effort in selecting a particular r, and therefore we will focus mostly on testing H S_{ m }. We will later show through data applications that using our WOP methods for testing H S_{ m } provides more robust results compared to using the rOP method with different choices of r.
A framework for weighted ordered Pvalues (WOP) methods
As mentioned by previous authors, many traditional pvalue combination methods can be expressed in the general form of ${T}^{\prime}={\sum}_{i=1}^{K}{w}_{i}^{\prime}H({p}_{i})$ (for example, see Zaykin [17]). For instance, Stouffer’s Ztest takes H(·) to be the inverse normal function, while Fisher’s method has H(p_{ i })=−2 log(p_{ i }). The difference between T^{′} and T, albeit subtle in notation, is the essence of the WOP framework. In the WOP framework, the weight w_{ i } is associated with the ith ordered pvalue p_{(i)}. In other words, the ranking of a pvalue in relation to the pvalues from the other studies determines its weighting. In traditional weighted pvalue combining methods, the weight assigned to a pvalue is associated with the characteristics of that particular study, be it the sample size, the effect size, or other features.
Allowing the weights to depend on the ordering of the pvalues opens up a whole new area of considerations when combining multiple pvalues. We can consider giving more weights to the pvalues that are closer to the center of the distribution of the pvalues since they might hold more credibility, and at the same time downweight the outlying pvalues. For instance, if the entirety of weights is placed on the rth ordered pvalue, the statistic reduces to the rOP method. However, using the WOP framework, if we highly weight the rth ordered pvalue but still distribute some weight to the other pvalues, the method can be viewed as a more robust version of the rOP method for testing H S_{ r }.
In this paper, we shall develop a few specific methods under the WOP framework. We consider weights based on the binomial distribution, which will be described in more detail in the next section. As for the pvalue combining methods, we shall focus on Fisher’s method, with H_{ F }(p_{(i)})=−2 log(p_{(i)}), and Stouffer’s method, with H_{ Z }(p_{(i)})=ϕ^{−1}(1−p_{(i)}), ϕ(·) being the standard normal distribution function.
Binomial weights and halfbinomial weights
In this section, we discuss two possible weighting schemes for the WOP framework. We mainly consider testing the alternative hypothesis that the effect sizes are nonzero in the majority of studies, which is the hypothesis setting H S_{ m }. We will also briefly discuss extending the weighting schemes to testing other hypothesis settings H S_{ r }, for m<r≤K.
Inspired by the rOP method, which uses the rth ordered pvalue for testing the hypothesis setting H S_{ r }, we consider placing the highest weight on the median pvalues for testing H S_{ m }. This makes intuitive sense, since if a consensus does exist among the studies, we have reason to believe that the behavior of the majority of studies should be best captured by the pvalues that are closer to the center of the distribution. Since we do not insist on nonzero effect sizes for every single study, we consider downweighting the largest pvalues among the studies. On the other hand, pvalue combining methods such as Fisher’s method are known to be very sensitive to single extremely small pvalues, thus the smallest pvalues should also be downweighted, to avoid a small number of extremely small pvalues biasing the results of the majority of studies. In summary, we would like our weighting scheme w_{ i }, as a function of i, to reflect a unimodal shape, with the highest weights being w_{ m } (when K is odd) or w_{m−1} and w_{ m } (when K is even), and such that w_{ i } decreases as i goes to 1 or K.
We will discuss more on the effects of these two different weighting schemes through simulation studies in later sections.
Implementation of the WOP methods
Under the null hypothesis, the p_{ k }’s are assumed to follow a uniform (0,1) distribution. For Stouffer’s Ztest, H_{ Z }(p_{ i })=ϕ^{−1}(1−p_{ i }) follows a standard normal distribution. Therefore the traditional weighted Ztest in the form of ${\sum}_{i=1}^{K}{w}_{i}^{\prime}{H}_{Z}({p}_{i})$ still follows a normal distribution. For Fisher’s method, H_{ F }(p_{ i })=−2 log(p_{ i }) has a chisquare distribution with two degrees of freedom. Therefore the distribution of the traditional weighted Fisher’s test ${\sum}_{i=1}^{K}{w}_{i}^{\prime}{H}_{F}({p}_{i})$ is essentially weighted sums of exponential distributions. The distribution of weighted sums of exponential variables is not as straightforward, though many authors have researched on both the exact and approximations of this distribution, a summary of which can be found in Olkin and Saner [10].
When we consider weighted ordered pvalues in the form of ${\sum}_{i=1}^{K}{w}_{i}H({p}_{(i)})$, however, the problem becomes much more complicated. Even for Stouffer’s method, the distribution of the sum of weighted ordered normal variables is not readily available. As for Fisher’s method, Olkin and Saner [10] studied the distribution of the trimmed Fisher’s statistic, which is ${\sum}_{i=1}^{K}2{w}_{i}\mathit{\text{log}}({p}_{(i)})$ for the special case of w_{ i }=0 for i=1,⋯,s_{1} and i=s_{2},⋯,K. They transformed the distribution of the sum of ordered chisquared variables back to a weighted sum of exponential variables without order. But as discussed in their paper, the exact distribution of weighted sums of exponential variables are generally impractical for practitioners, and we would therefore have to use approximation methods.
Considering the complexity of the exact distributions of weighted sums of ordered variables, as well as the fact that the uniformness of the original pvalues is not always guaranteed in practice, we recommend two methods for obtaining the pvalues for the WOP statistics: (1) permutation analysis, in the case that original data from all the studies are available; and (2) comparing to the numerical distribution, in the case that only the pvalues for each gene and each study are known.
We first explain the steps of obtaining the WOP pvalues through permutation analysis:

Let ${T}_{g}={\sum}_{i=1}^{K}{w}_{i}H({p}_{g(i)})$ denote the WOP statistic for gene g, where p_{g(i)} is the ith ordered pvalue of p_{g1},⋯,p_{ g K }.

Permute group labels in each study B times, and recalculate the pvalues for the permuted data ${p}_{\mathit{\text{gk}}}^{(b)}$, for 1≤k≤K, 1≤g≤G, 1≤b≤B.

Calculate the WOP statistics for the permuted pvalues ${T}_{g}^{(b)}$ for 1≤g≤G, 1≤b≤B.

The pvalue for the WOP statistic T_{ g } (1≤g≤G) is then computed as${p}_{g}^{T}=\frac{{\sum}_{b=1}^{B}{\sum}_{{g}^{\prime}=1}^{G}I\{{T}_{g}\ge {T}_{{g}^{\prime}}^{(b)}\}}{B\xb7G}.$
Once the pvalues for the WOP statistics for each gene are obtained, we may apply the BenjaminiHochberg (BH) method [18] on the ${p}_{g}^{T}$’s (1≤g≤G) to account for multiple testing across the genes and control the false discovery rate (FDR).
If the original data are not available, we can simulate the distribution of the WOP statistics numerically, by simulating U(0,1) random variables, the distribution of pvalues under the null distribution. The WOP statistics calculated from the data can then be compared to the numerical distribution to obtain the WOP pvalues. We simulated numerical distributions of the WOP statistics for testing H S_{ m } based on the Fisher’s and Stouffer’s combination methods with binomial and halfbinomial weighting schemes respectively, for study numbers ranging from 4 to 23. We conducted simulation studies to compare the WOP pvalues obtained either though comparing with the numerical distribution or by performing permutation analysis. Results show that the two methods provide perfectly correlated pvalues and that the number of rejections obtained from the two methods after applying the BenjaminiHochberg adjustment are very similar (data not shown). Therefore both methods are reliable choices for obtaining the WOP pvalues in practice. The numerical distribution provides an option for when the original data are not available and is also more timeefficient. The permutation analysis can be used if the uniformness of the original pvalues are questionable but that the original data is available.
Considerations of one or twosided tests
Previously we used twosided alternatives as an example when setting up the hypothesis. The hypothesis setup H S_{ r } can be similarly developed for onesided alternatives. In fact, the interpretation of the metaanalysis results is easier for onesided tests, since we do not need to worry about the concordance of the directions of the effect sizes as we do for twosided tests. Since the WOP methods directly combine the pvalues, the direction of the effect sizes are not taken into account for twosided tests. Thus a significant result from the WOP metaanalysis of twosided tests indicate that there are at least r studies with nonzero effect size, but without any implications about the concordance or discordance of the directions of the effects. This may not be an issue in the case that the direction of effect is not of great importance. However, in genomic data analysis, it is often desirable to distinguish between up and downregulated genes, and a result stating that the gene is differentially expressed across many studies but with possible opposite directions of expression change may be confusing. In these cases, it might be problematic to directly apply the WOP methods to the twosided pvalues. On the other hand, since both up and downregulated genes may be of interest at the same time, we cannot prespecify one particular onesided test for all genes. For such scenarios we recommend using the test of Pearson [19] in combination with the WOP methods. To do so, for each gene we need to conduct two WOP metaanalyses on onesided pvalues: one on the lefttailed pvalues for all studies, and the other on the righttailed pvalues for all studies. Let ${p}_{\mathit{\text{WOP}}}^{L}$ and ${p}_{\mathit{\text{WOP}}}^{R}$ be the WOP metaanalysis pvalues for the lefttailed and righttailed tests respectively. We shall then adopt the idea of Pearson’s test and define ${p}_{\mathit{\text{WOP}}}^{C}=min\{1,2min(\underset{\mathit{\text{WOP}}}{\overset{L}{p}},\underset{\mathit{\text{WOP}}}{\overset{R}{p}})\}$, where the superscript “C” stands for “concordant”. As discussed in Owen [20], the equation for obtaining ${p}_{\mathit{\text{WOP}}}^{C}$ provides a conservative pvalue for Pearson’s test. By adopting the Pearson’s test, the results are more interpretable. A significant result now indicates that the gene is consistently up or downregulated in at least r studies.
Results and discussion
A simulation study
We conducted a simulation study to compare the performances of the WOP methods with the original Fisher’s and Stouffer’s method, as well as with the rOP method Song and Tseng [1]. We shall also demonstrate the differences between the binomial and halfbinomial weighting schemes through the simulation.
We simulate the setting of a metaanalysis of differential expression studies, with 2000 genes and 7 studies. Out of the 2000 genes, 1650 genes are assumed to be not differentially expressed in any study, while 50 genes are assumed to be differentially expressed in 1,2,⋯,7 studies respectively. The sample sizes for the treatment and control groups are randomly generated for each study, varying from 5 to 20. Gene expression values are randomly generated from normal distributions. Control samples are generated from a N(0,1) distribution, as well as treatment samples that are not differentially expressed. Treatment samples that are differentially expressed are generated from a N(1,1) distribution. Twosample Ttests are used to obtain the pvalues p_{ g k } for each gene and each study. Our WOP methods aim at testing the hypothesis that the gene is differentially expressed in the majority of studies. In this case, m=4, corresponding to the hypothesis setting H S_{4}. The rOP statistic Song and Tseng [1] for testing H S_{4} is the 4th ordered pvalue. Note that the original Fisher’s and Stouffer’s method are supposed to test for H S_{1}. We used permutation analysis to obtain the WOP pvalues for binomial and halfbinomial weighted Fisher’s and Stouffer’s statistic. Pvalues for the rOP method were also computed by permutation analysis as recommended in Song and Tseng [1]. Pvalues for the original Fisher’s and Stouffer’s method are computed directly via their respective distributions. To obtain a list of significant genes, the BenjaminiHochberg procedure is applied to the pvalues with the FDR controlled at the 0.05 level. Results are averaged over 100 replications.
In summary, the binomial weighted WOP methods show improvement over the original Fisher’s or Stouffer’s method for testing differential expression in a majority of studies, with lower rejection rates for genes that are differentially expressed in a small number of studies, and just as high power for genes that are differentially expressed in almost all studies. On the other hand, the halfbinomial weighted WOP methods are more robust versions of the rOP method, having similar properties to the rOP method, but even lower rejections rates for categories 1 to 3 and higher power for categories 6 and 7. In practice, the binomial weighting scheme is recommended if the user wishes to have a larger pool of significant genes, and when the control of false positives is relatively less important. For better false positive control, the halfbinomial weighting scheme is recommended. We note that because of our hypothesis setup, false positives are not the same as type I error. A type I error would be rejecting a gene that is not differentially expressed in any studies. On the other hand, a false positive would be rejecting a gene that is differentially expressed in less than r studies.
An application to metaanalysis of a set of stem cell studies
As an application of the proposed methodology, we conduct metaanalysis on a set of microarray data studies from four stem cell papers: Chin et al. [21], Guenther et al. [22], Newman and Cooper [23] and Chin et al. [24]. We wish to find probesets that are differentially expressed between human induced pluripotent stem (hiPS) cells and human embryonic stem (hES) cells in the majority of studies. Some of the studies contain other samples such as human fibroblasts, but we only used samples from hiPS cells and hES cells. We included studies that had at least two samples for each group (hiPS and hES), giving us a total of 9 studies. All the studies used the Affymetrix Human Genome U133 Plus 2.0 Array platform, which contains 54675 probesets. We directly used the data preprocessed by the original contributors and did not perform any additional normalization, except for taking the log for data that were not already on the log2 scale. We performed a two sample Ttest between the hiPS cell samples and the hES cell samples to obtain the original pvalues for differential expression for each probeset and each study. The hypothesis setting for the metaanalysis is H S_{5}, i.e. we aim at testing the alternative that the probeset is differentially expressed in at least 5 out of the 9 studies. We applied the proposed WOP methods, in particular the binomial and halfbinomial weighted Fisher’s and Stouffer’s statistic to the pvalues. The pvalues for the WOP statistics are obtained by comparing the statistics to the corresponding numerical distributions. We also applied the original Fisher’s and Stouffer’s method, and the rOP method (in this case the 5th ordered pvalue). The pvalues for the rOP method are obtained via its theoretical distribution. To adjust for multiple testing, we applied the BenjaminiHochberg (1995) procedure afterwards, controlling the false discovery rate at the 0.05 level.
Number of significant probesets for the metaanalysis of stem cell studies by different methods
Method  Fisher’s  Stouffer’s  rOP 

Original (unweighted)  16508  11969  6330 
WOP: Binomial weighted  10309  8927  N/A 
WOP: Halfbinomial weighted  6170  5805  N/A 
Further applications and comparisons with the rOP method
To compare the performances of the WOP methods and the rOP method Song and Tseng [1] in real data application, we applied our WOP methods to the three microarray metaanalysis applications in Song and Tseng [1]. The first application consists of comparisons of two subtypes of brain tumors  anaplastic astrocytoma (AA) and glioblastoma multiforme (GBM), from 7 studies. The second application combines 9 studies comparing postmortem brain tissues between MDD patients and control samples. In the third application, 16 diabetes microarray studies consisting of different organisms and tissues were combined. See Song and Tseng [1] for more details on the contexts of these three metaanalysis applications.
To ensure that the results are directly comparable, for each metaanalysis we directly used the twosided pvalues for each gene and each individual study calculated in Song and Tseng [1]. In Song and Tseng [1], permutation analysis is used for the brain cancer studies and the MDD studies, while theoretical distributions are used to obtain results for the diabetes studies. We follow Song and Tseng [1] and also directly use the twosided pvalues for the permuted datasets provided by Song and Tseng [1] for the brain cancer studies and the MDD studies. See Song and Tseng [1] for more details on the preprocessing of the data and the calculation of the original pvalues.
Number of significant genes for the three metaanalyses by different methods
WOP Fisher  WOP Fisher  WOP Stouffer  WOP Stouffer  rOP  rOP  

binomial  halfbinomial  binomial  halfbinomial  r=m  selected r*  
Brain Cancer  2477  1887  2261  1805  1921  1469 
MDD  1070  930  1152  969  565  617 
Diabetes  1333  1016  1277  1004  912  636 
For comparison, to see how the results would differ for different choices of r for the WOP methods, we applied the WOP method to the MDD studies again  this time testing H S_{7}. We used the halfbinomial versions of the three weighting schemes for m<r≤K that were discussed earlier: w^{h b1}, w^{h b2} and w^{h b3}. We used Fisher’s summary statistic in the WOP method. The numbers of genes found by using the three weighting schemes are 760, 770 and 774 respectively. The three weighting schemes are fairly consistent with each other, since for each weighting scheme more than 90% of the genes found were also found by the other two weighting schemes. Comparing to the number of genes found by the halfbinomial weighted Fisher’s method for testing H S_{5}, which is 930, notice that the corresponding WOP methods for testing H S_{7} yield smaller numbers, conforming to our expectations. As discussed earlier, the ideal result would be that the genes detected for H S_{7} be a subset of the genes detected for H S_{5}. In reality, over 70% of the genes detected by WOP methods for H S_{7} were also detected for H S_{5} (the percentage being 72.6%, 71.4% and 74.2% for the three weighting schemes respectively). Recall that for the rOP method, only 43.6% of genes detected for r=7 were also detected for r=5. Even though the WOP methods are not perfect, we can still see the great improvement in robustness compared to the rOP method.
In summary, our observations confirm that the results of the rOP method are indeed heavily dependent on the choice of r. Whereas our WOP methods show much higher robustness. In particular, the WOP methods for testing H S_{ m } has shown superior robustness by being able to cover most of the genes detected by the rOP methods using different r. Since in practice it is not often clear which particular H S_{ r } should be tested, we believe the WOP methods for testing H S_{ m } is a better choice when the goal is to detect signal in the majority of studies.
Conclusions
Metaanalysis is a useful tool in integrating data from different sources to test a particular hypothesis. While this paper mainly discussed the application of metaanalysis on microarray differential expression studies, other areas of genomic studies have increasingly relied on the use of metaanalysis, such as genomewide association studies (GWAS). Some seminal studies in this area include Scott et al. [25] and Willer et al. [26]. Metaanalysis is also frequently used in clinical studies, psychological studies and statistical applications in other social sciences. More and more metaanalyses nowadays aim at detecting consistent findings across a number of studies. While most of the traditional metaanalysis methods test for significance in at least one of the studies, it is important to develop new metaanalysis methods that focus on testing for significance in the majority of studies.
The weighted ordered pvalue (WOP) method provides such a framework. It is unique in its use of weights that are based on the order of the pvalues. The rOP method Song and Tseng [1], which is also based on ordered pvalues, can be considered a very special case under the WOP framework, where all the weight is placed on one single ordered pvalue. The WOP methods do not require prespecification of r and is less sensitive to the choice of its value. The halfbinomial weighted WOP methods have been shown to be more robust and have better receiver operating characteristics compared to the corresponding rOP method.
As pointed out by a reviewer, one of our previously published metaanalysis methods also considers the issue of heterogeneity and utilizes weights. However, these two methods are quite different in concept. The method in [9] applies weights to the genes, essentially to rerank the genes by adding in information about heterogeneity. On the other hand, the WOP method applies weights to the multiple pvalues for each gene. Although, conceptually, we can first apply the WOP method to each gene and then use the method in [9] on top of that to rerank the genes.
One advantage of the WOP framework is its flexibility. The framework allows for different weighting schemes and summary statistics to be used. Even though this paper mainly focused on two particular weighting schemes based on the binomial distribution and two summary statistics (Fisher’s and Stouffer’s statistics), in general, other summary statistics and weighting schemes can be used. Future research can be done to try to optimize the weighting scheme to suit specific metaanalysis purposes.
Declarations
Acknowledgements
We acknowledge the support of the NSF grant ABI1262538.
Authors’ Affiliations
References
 Song C, Tseng GC: Hypothesis setting and order statistic for robust genomic metaanalysis. Ann Appl Stat. 2014, 8: 777800.View ArticlePubMed CentralPubMed
 Fisher RA: Statistical methods for research workers. 1925, Oliver and Boyd: Edinburgh
 Stouffer SA, Suchman EA, Devinney LC, Star SA, Williams RM: The American soldier: adjustment during army life. 1949, Princeton, NJ: Princeton University Press
 Choi JK, Yu U, Kim S, Yoo OJ: Combining multiple microarray studies and modeling interstudy variation. Bioinformatics. 2003, 19: 8490.View Article
 Wilkinson B: A statistical consideration in psychological research. Psychol Bull. 1951, 48: 156158.View ArticlePubMed
 Phillips D, Ghosh D: Testing the disjunction hypothesis using Voronoi diagrams with applications to genetics. Ann Appl Stat. 2014, 8: 801823.View Article
 Benjamini Y, Heller R: Screening for partial conjunction hypotheses. Biometrics. 2008, 64: 12151222.View ArticlePubMed
 Hong F, Breitling R, McEntee CW, Wittner BS, Nemhauser JL, Chory J: RankProd: a bioconductor package for detecting differentially expressed genes in metaanalysis. Bioinformatics. 2006, 22: 28252827.View ArticlePubMed
 Li Y, Ghosh G: Assumption weighting for incorporating heterogeneity into metaanalysis of genomic data. Bioinformatics. 2012, 28: 807814.View ArticlePubMed CentralPubMed
 Olkin I, Saner H: Approximations for trimmed Fisher procedures in research synthesis. Stat Methods Med Res. 2001, 10: 267276.View ArticlePubMed
 Lancaster H: The combination of probabilities: an application of orthonormal functions. Aus J Stat. 1961, 3: 2033.View Article
 Mosteller F, Bush RR: Selected quantitative techniques. Handbook of Social Psychology. 1954, Cambridge, MA: AddisonWesley,
 Won S, Morris N, Lu Q, Elston R: Choosing an optimal method to combine Pvalues. Stat Med. 2009, 28: 15371553.View ArticlePubMed CentralPubMed
 Makambi KH: Weighted inverse chisquare method for correlated significance tests. J Appl Stat. 2003, 30: 225234.View Article
 Li J, Tseng GC: An adaptively weighted statistic for detecting differential gene expression when combining multiple transcriptomic studies. Ann Appl Stat. 2011, 5: 9941019.View Article
 Xie M, Singh K, Strawderman WE: Confidence distributions and a unifying framework for metaanalysis. J Am Stat Assoc. 2011, 106: 320333.View Article
 Zaykin DV: Optimally weighted Ztest is a powerful method for combining probabilities in metaanalysis. J Evol Biol. 2011, 8: 18361841.View Article
 Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B. 1995, 57: 289300.
 Pearson K: On a new method of determining “goodness of fit”. Biometrika. 1934, 26: 425442.
 Owen AB: Karl Pearson’s metaanalysis revisited. Ann Stat. 2009, 37: 38673892.View Article
 Chin MH, Mason MJ, Xie W, Volinia S, Singer M, Peterson C, Ambartsumyan G, et al: Induced pluripotent stem cells and embryonic stem cells are distinguished by gene expression signatures. Cell Stem Cell. 2009, 5: 111123.View ArticlePubMed CentralPubMed
 Guenther MG, Frampton GM, Soldner F, Hockemeyer D, Mitalipova M, Jaenisch R, Young RA: Chromatin structure and gene expression programs of human embryonic and induced pluripotent stem cells. Cell Stem Cell. 2010, 7: 249257.View ArticlePubMed CentralPubMed
 Newman AM, Cooper JB: Labspecific gene expression signatures in pluripotent stem cells. Cell Stem Cell. 2010, 7: 258262.View ArticlePubMed
 Chin MH, Pellegrini M, Plath K, Lowry WE: Molecular analyses of human induced pluripotent stem cells and embryonic stem cells. Cell Stem Cell. 2010, 7: 263269.View ArticlePubMed CentralPubMed
 Scott LJ, Mohlke KL, Bonnycastle LL, Willer CJ, Li Y, Duren WL, Erdos MR, Stringham HM, Chines PS, Jackson AU, ProkuninaOlsson L, Ding CJ, Swift AJ, Narisu N, Hu T, Pruim R, Xiao R, Li XY, Conneely KN, Riebow NL, Sprau AG, Tong M, White PP, Hetrick KN, Barnhart MW, Bark CW, Goldstein JL, Watkins L, Xiang F, Saramies J, et al: A genomewide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science. 2007, 316: 13411345.View ArticlePubMed CentralPubMed
 Willer CJ, Sanna S, Jackson AU, Scuteri A, Bonnycastle LL, Clarke R, Heath SC, Timpson NJ, Najjar SS, Stringham HM, Strait J, Duren WL, Maschio A, Busonero F, Mulas A, Albai G, Swift AJ, Morken MA, Narisu N, Bennett D, Parish S, Shen H, Galan P, Meneton P, Hercberg S, Zelenika D, Chen WM, Li Y, Scott LJ, Scheet PA, et al: Newly identified loci that influence lipid concentrations and risk of coronary artery disease. Nat Genet. 2008, 40: 161169.View ArticlePubMed
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.