Comparative evaluation of geneset analysis methods
 Qi Liu†^{1},
 Irina Dinu†^{1},
 Adeniyi J Adewale^{1},
 John D Potter^{2} and
 Yutaka Yasui^{1}Email author
DOI: 10.1186/147121058431
© Liu et al; licensee BioMed Central Ltd. 2007
Received: 05 September 2007
Accepted: 07 November 2007
Published: 07 November 2007
Abstract
Background
Multiple dataanalytic methods have been proposed for evaluating geneexpression levels in specific biological pathways, assessing differential expression associated with a binary phenotype. Following Goeman and Bühlmann's recent review, we compared statistical performance of three methods, namely Global Test, ANCOVA Global Test, and SAMGS, that test "selfcontained null hypotheses" Via. subject sampling. The three methods were compared based on a simulation experiment and analyses of three realworld microarray datasets.
Results
In the simulation experiment, we found that the use of the asymptotic distribution in the two Global Tests leads to a statistical test with an incorrect size. Specifically, pvalues calculated by the scaled χ^{2} distribution of Global Test and the asymptotic distribution of ANCOVA Global Test are too liberal, while the asymptotic distribution with a quadratic form of the Global Test results in pvalues that are too conservative. The two Global Tests with permutationbased inference, however, gave a correct size. While the three methods showed similar power using permutation inference after a proper standardization of gene expression data, SAMGS showed slightly higher power than the Global Tests. In the analysis of a realworld microarray dataset, the two Global Tests gave markedly different results, compared to SAMGS, in identifying pathways whose gene expressions are associated with p53 mutation in cancer cell lines. A proper standardization of gene expression variances is necessary for the two Global Tests in order to produce biologically sensible results. After the standardization, the three methods gave very similar biologicallysensible results, with slightly higher statistical significance given by SAMGS. The three methods gave similar patterns of results in the analysis of the other two microarray datasets.
Conclusion
An appropriate standardization makes the performance of all three methods similar, given the use of permutationbased inference. SAMGS tends to have slightly higher power in the lower αlevel region (i.e. gene sets that are of the greatest interest). Global Test and ANCOVA Global Test have the important advantage of being able to analyze continuous and survival phenotypes and to adjust for covariates. A free Microsoft Excel AddIn to perform SAMGS is available from http://www.ualberta.ca/~yyasui/homepage.html.
Background
Some microarraybased gene expression analyses such as Significance Analysis of Microarray (SAM) [1] aim to discover individual genes whose expression levels are associated with a phenotype of interest. Such individualgene analyses can be enhanced by utilizing existing knowledge of biological pathways, or sets of individual genes (hereafter referred to as "gene sets"), that are linked via. related biological functions. Geneset analyses aim to discover gene sets the expression of which is associated with a phenotype of interest.
Many geneset analysis methods have been proposed previously. For example, Mootha et al. [2] proposed Gene Set Enrichment Analysis (GSEA), which uses the KolmogorovSmirnov statistic to measure the degree of differential gene expression in a gene set by a binary phenotype (see also [3]). Goeman et al. [4] presented Global Test, modeling differential gene expression by means of randomeffects logistic regression models. Goeman et al. [5] also extended their methods to continuous and survival outcomes. Mansmann and Meister [6] proposed ANCOVA Global Test, which is similar to Global Test but having the roles of phenotype and genes exchanged in regression models. Mansmann and Meister [6] pointed out that their ANCOVA Global Test outperformed Global Test, especially in cases where the asymptotic distribution of Global Test cannot be used. Dinu et al. [7] discussed some critical problems of GSEA as a method for geneset analysis and proposed an alternative method called SAMGS, an extension of SAM to geneset analysis. Goeman and Bühlmann [8] provided an excellent review of the methods, discussing important methodological questions of geneset analysis, and summarized the methodological principles behind the existing methods. An important contribution of their review was the distinction between testing "selfcontained null hypotheses" via. subject sampling and testing "competitive null hypotheses" via. gene sampling. They argue, and we agree, that the framework of the competitive hypothesis testing via. gene sampling is subject to serious errors in calculating and interpreting statistical significance of gene sets, because of its implicit or explicit untenable assumption of probabilistic independence across genes.
Although Global Test, ANCOVA Global Test, and SAMGS each test a selfcontained hypothesis on the association of expression patterns across a gene set with a phenotype of interest in a statistically appropriate manner, it is unclear how the three methods compare on performance in detecting underlying associations. In this paper, we compare the performance of the three methods via. simulation and realworld microarray data analyses, both statistically and biologically.
Results
Simulation experiment
where x_{ jk }is the gene expression for gene j in sample k, ${\overline{x}}_{j}$ and s_{ j }are the sample mean and standard deviation of gene j expression using all samples, respectively. All simulation analyses compared the mean expression of a geneset of interest between two groups, each with a sample of 10 observations.
First, we checked the size of the three tests, before and after the standardization, according to the following three scenarios of no differential expression between two groups: (1) randomly generate expression of 100 genes for the two groups from a multivariate normal distribution (MVN) with a mean vector μ and a diagonal variancecovariance matrix Σ, where the 100 elements of μ and the 100 diagonal elements of Σ were randomly generated as 100 independentlyandidenticallydistributed (i.i.d.) uniform random variables in (0,10) and 100 i.i.d. uniform random variables in (0.1, 10), respectively (i.e., no gene was differentially expressed between the two groups and expression was uncorrelated among the 100 genes); (2) exactly same as (1) except the variancecovariance matrix Σ of the MVN being changed to have a correlation of 0.5 between all pairs of the first 20 genes and also between all pairs of the second 20 genes; (3) exactly same as (2) with the correlation value changed from 0.5 to 0.9.
Second, we estimated the power of the three tests, before and after the standardization, by randomly generating a gene set of size 100, using the exactly same simulation setup of the sizeevaluation (2) above, but allowing the first 40 genes being differentially expressed. The mean expression of the 40 differentially expressed genes was randomly generated from Uniform(0,10) as in the sizeevaluation (2), but was subsequently modified by an addition and a subtraction of a constant γ, as in Mansmann and Meister [6], such that mean vectors μ_{ i }'s for the two groups (i = 1, 2) differ by 2γ, ${\mu}_{1j}{\mu}_{2j}={(1)}^{{I}_{j>20}}2\gamma $, for j = 1,..., 40. We considered a range of γ from 0 to 2 with an increment of 0.1. The 40 differentially expressed genes were set to have a correlation of 0.5, as in the sizeevaluation (2), but no correlation and a correlation of 0.9 were also considered.
In the comparison of size across the three tests, the size was estimated by the observed proportion of replications with a pvalue smaller than the correct size α. By definition, under the null hypothesis, a proportion α of the replications of an experiment is expected to yield a pvalue smaller than α. In order to assess the size, we ran 5000 replications and used α = 0.05. For each permutationbased pvalue, 1000 random permutations were carried out.
In the comparison of power across the three tests, the power was estimated by the observed proportion of the replications of an experiment in which the null hypothesis was correctly rejected. Given the fixed numbers of samples and genes with the fixed correlation structure in the simulation experiment, a larger effect size γ leads to higher power for a given αlevel. In estimating the power, we ran 1000 replications of an experiment for each γ value. We considered α at 0.05, 0.01, 0.005, 0.0025, and 0.001. For obtaining a permutationbased pvalue, 1000 random permutations were carried out.
Assessment of type I error probabilities
10 vs. 10 samples  25 vs. 25 samples  

Methods  Type of inference  0  0.5  0.9  0  0.5  0.9  
Before standardization  Global Test  The scaled χ^{2}  0.0982  0.0778  0.0722  0.0696  0.0700  0.0686 
Asymptotic  0.0006  0.0128  0.0298  0.0090  0.0328  0.0442  
Permutation  0.0496  0.0434  0.0464  0.0534  0.0554  0.0556  
ANCOVA Global Test  Asymptotic  0.0692  0.1034  0.0898  0.0576  0.0840  0.0736  
Permutation  0.0482  0.0462  0.0458  0.0526  0.0552  0.0562  
SAMGS  Permutation  0.0498  0.0462  0.0478  0.0514  0.0518  0.0556  
After standardization  Global Test  The scaled χ^{2}  0.1090  0.0844  0.0736  0.0734  0.0702  0.0698 
Asymptotic  <0.0001  0.0094  0.0276  0.0036  0.0320  0.0424  
Permutation  0.0524  0.0464  0.0458  0.0524  0.0528  0.0530  
ANCOVA Global Test  Asymptotic  0.0372  0.0848  0.0792  0.0474  0.0838  0.0730  
Permutation  0.0532  0.0462  0.0466  0.0544  0.0542  0.0544  
SAMGS  Permutation  0.0522  0.0468  0.0470  0.0526  0.0540  0.0542 
Realworld data analyses
Our next evaluation of the performance of the three methods used biologically, a priori defined gene sets and three microarray datasets considered in Subramanian et al. [3], download from GSEA webpage, [9]: 17 p53 wildtype vs. 33 p53 mutant cancer cell lines; 15 male vs. 17 female lymphoblastoid cells; 24 acute lymphoid leukemia (ALL) vs. 24 acute myeloid leukemia (AML) cells. For pathways/gene sets, we used Subramanian et al.'s geneset subcatalogs C1 and C2 from the same webaddress above on "Molecular Signature Database." The C1 catalog includes gene sets corresponding to human chromosomes and cytogenetic bands, while the C2 catalog includes gene sets that are involved in specific metabolic signaling pathways [3]. In Subramanian et al., Catalog C1 included 24 sets, one for each of the 24 human chromosomes, and 295 sets corresponding to the cytogenetic bands; Catalog C2 consisted of 472 sets containing gene sets reported in manually curated databases and 50 sets containing genes reported in various experimental papers. Following Subramanian et al. [3], we restricted the set size to be between 15 and 500, resulting in 308 pathways to be examined.
We compared the performance of the three methods before and after the standardization by listing the gene sets which had a pvalue ≤ 0.001 by any of the three methods.
Gene sets in the p53 dataset with Pvalue ≤ 0.001 by any of the three methods
Gene Set  Before standardization  After standardization  VSN  

Global  Ancova  SAMGS  Global  Ancova  SAMGS  Global  Ancova  SAMGS  
ATM Pathway*  <0.001  <0.001  <0.001  <0.001  0.002  <0.001  0.001  0.001  <0.001 
BAD Pathway**  <0.001  0.007  <0.001  <0.001  <0.001  <0.001  0.004  0.004  <0.001 
Calcineurin Pathway$  0.068  0.084  <0.001  0.007  0.002  <0.001  0.004  0.005  0.011 
Cell cycle regulator†  0.021  0.017  <0.001  0.002  0.001  <0.001  0.002  <0.001  0.003 
Hsp27Pathway**  0.047  0.044  <0.001  <0.001  0.001  <0.001  0.011  0.005  <0.001 
Mitochondria pathway**  0.002  0.002  <0.001  0.007  0.007  <0.001  0.013  0.006  <0.001 
p53 signaling pathway*  0.112  0.101  <0.001  0.003  0.003  0.001  0.006  0.005  0.006 
p53 _UP*  0.003  0.004  <0.001  <0.001  <0.001  <0.001  0.019  0.018  <0.001 
p53 hypoxiaPathway*  0.626  0.622  <0.001  <0.001  <0.001  <0.001  0.044  0.041  <0.001 
p53 Pathway*  0.142  0.150  <0.001  <0.001  <0.001  <0.001  <0.001  0.001  <0.001 
Raccycd Pathway†  0.177  0.181  <0.001  0.001  <0.001  <0.001  0.004  0.009  0.006 
Radiation_sensitivity*  0.119  0.135  <0.001  <0.001  <0.001  <0.001  0.014  0.020  <0.001 
SA_TRKA_RECEPTOR‡  0.254  0.252  <0.001  0.001  <0.001  <0.001  0.004  0.001  0.006 
bcl2family & reg. network**  0.102  0.100  0.001  0.001  0.005  <0.001  0.010  0.014  0.001 
Cell cycle arrest†  0.099  0.099  0.001  0.027  0.018  0.005  0.003  0.005  0.007 
Ceramide Pathway**  0.002  0.006  0.001  0.004  0.004  <0.001  0.001  0.001  <0.001 
CR_DEATH*  0.001  0.004  0.008  0.029  0.017  0.004  0.143  0.108  0.005 
The same pattern was found in the analyses of the malevs.female lymphoblastoid dataset and the ALLvs.AML dataset (See Figures S1, S2, S3 and S4 in Additional file 1, comparing the results from the three methods). Before the standardization, pvalues from Global Test and ANCOVA Global Test differed greatly from pvalues from SAMGS. The pvalues of Global Test and ANCOVA Global Test changed markedly after the standardization and were very close to those of SAMGS. After the standardization, in the malevs.female analysis, 21 gene sets had a pvalue < 0.15 by one or more of the methods; 17 of these had a SAMGS pvalue smaller than, or equal to, those of Global Test and ANCOVA Global Test. In the ALLvs.AML analysis, all sets were statistically significant with pvalues < 0.001 by all three tests: which is unlikely to be of any biological significance.
Discussion
From the simulation results, we suggest that, when Global Test and ANCOVA Global Test are used for the analysis of microarray data, permutations should always be used for the calculation of statistical significance. In the documentation included with the Global Test R package, Goeman et al. noted that the asymptotic distribution with a quadratic form is the recommended method for large sample sizes and it can be slightly conservative for small samples. In our simulation study, we used 10 and 25 samples for each of the two groups. In each situation, the asymptotic method with a quadratic form gave conservative pvalues, although the difference between asymptotic and permutationbased methods did decrease when the sample size increased. Goeman et al. also noted that the scaled χ^{2} method can be slightly anticonservative, especially for large gene sets. Our simulation study showed that the scaled χ^{2} method can be markedly anticonservative. This is in accord with the manual of Global Test, which recommends against using the scaled χ^{2} approximation.
We found that performance of the two Global Tests changed greatly before and after standardization, but SAMGS performance remained unchanged. This can be explained by: (1) the invariance of ttest statistics under shifting and rescaling of data, that is relevant to SAMGS; (2) ANCOVA's explicit assumption that all genes in the set to have an equal variance, a violation of which would clearly affect the performance of ANCOVA Global Test; and (3) Global test's assumption that the regression coefficients come from the same normal distribution, an assumption that is met by the standardization of gene expression. Therefore, some sort of standardization that makes the variances of gene expression similar across genes is needed before using Global Test and ANCOVA Global Test. SAMGS employs a constant in the denominator of its tlike test statistic to address the small variability in some of the gene expression measurements and, thus, effectively standardizes expression across genes; neither Global Test nor the ANCOVA Global Test addresses this characteristic of microarray data. Both Goeman et al. [4] and Mansmann and Meister [6] have stated that an appropriate normalization is important. Note that not many normalization methods would standardize the expression across genes. It is only after applying zscore standardization (1) or the VSN normalization, that the results of the three methods became congruent. The similarity between Global Test and Global ANCOVA Test has already been commented upon in [6]. The similarity between SAMGS and Global Test may be inferred from the construction of the latter as a weighted sum of squared transformed tstatistics [12], which is similar to the SAMGS test statistic.
It should be noted that Global Test allows four different types of phenotype variables: binary; multiclass; continuous; and survival. ANCOVA Global Test allows binary, multiclass, and continuous phenotypes. The ability to handle different classes of phenotypes is a very important advantage of Global Test and ANCOVA Global Test over SAMGS. It is also possible to use Global Test and ANCOVA Global Test while adjusting for covariates (e.g., potential confounders). If covariates are incorporated, the two tests assess whether the geneexpression profile has an independent association with the phenotype that is above and beyond what is explained by the covariates. The ability to adjust for covariates is another important advantage of Global Test and ANCOVA Global Test over SAMGS.
We focused on pvalues in this paper because we were comparing the three methods that test "selfcontained null hypotheses" via. subject sampling. To account for multiple comparisons when multiple gene sets are tested, one might consider False Discovery Rate (FDR) instead of Type I error probability. For example, SAM uses a qvalue, an upper limit of the FDR, for each gene, which could be extended here to each geneset using the method of Storey [13]. The qvalues of the 17 gene sets listed in Table 2 are displayed in Additional file 2.
We have considered, but did not report detailed comparison results of two other methods, Tian et al. [14] and Tomfohr et al. [15], that test selfcontained hypotheses via. subject sampling, in addition to the three methods we highlighted above. Tian et al. [14] tests the significance of a gene set by taking the mean of tvalues of genes in the gene set as a test statistic and evaluating its significance by a permutation test. Tomfohr et al. [15] reduces the gene set's expression into a single summary value by taking the first principal component of expressions of genes in the gene set and performs a permutationbased ttest of the single summary. The two methods gave appreciably different results when compared to Global Test and ANCOVA Global Test, and SAMGS. Of the 17 gene sets in Table 2 for the p53 analysis, for instance, Tian et al. and Tomfohr et al. identified only eight and one gene sets, respectively, with pvalue < 0.10: the ATM pathway, for example, was identified by Global Test, ANCOVA Global Test, and SAMGS with pvalue ≤ 0.001, while the methods of Tian et al. and Tomfohr et al. gave pvalue = 0.61 and 0.99, respectively. The main reasons for their large discrepancies from the results of the three highlighted methods are as follows. Tian et al. sums up the tvalues for all the genes in a gene set, which will result in cancellation of large positive tvalues and large negative tvalues. Among the 11 upregulated and 8 downregulated genes in the ATM pathway, for example, two upregulated genes had large positive tvalues (about 2 or greater) and three downregulated genes had large negative tvalues (about – 2 or smaller): these large positive and negative tvalues cancel each other when summing up all tvalues in the Tian et al. test statistic, leading to reduced power for detecting gene sets that contain both significantly upregulated genes and significantly downregulated genes. The method of Tomfohr et al. summarizes the Sdimension geneexpression vector of genes in the gene set S by the first principal component without considering the phenotype: if the direction of the first principal component does not correspond to the direction that separates the two phenotypes, their method does not capture the differential expressions even when they exist, leading to markedly reduced power.
Although we focused on the comparison of the "selfcontained null hypothesis" approaches, it is also of methodological interest to see how "competitive null hypothesis" approaches compare. We, therefore, applied three "competitive null hypothesis" approaches to the analysis of the p53 dataset: Gene Set Enrichment Analysis (GSEA) [2]; the Significance Analysis of Function and Expression (SAFE) [16]; and Fisher's exact test [17]. The results are shown in Additional file 3. The results from the three "competitive null hypothesis" approaches were greatly different from those of SAMGS and the Global Tests. Most of the gene sets identified as being significantly associated with the p53 mutation by SAMGS and Global Tests were not identified as such by the three "competitive null hypothesis" approaches. The only gene set additionally identified as being significantly associated with the p53 mutation (with p < 0.001) was HUMAN_CD34_ENRICHED_TF_JP: for this gene set, the Fisher's exact test pvalue was < 0.001, but all the other five methods gave pvalues > 0.37. Known biological functions of p53 are clearly more consistent with the results of the "selfcontained null hypothesis" approaches. The differences observed between "selfcontained null hypothesis" and "competitive null hypothesis" approaches can be attributable, at least partly, to the fact that the significance of a gene set depends only on the genes in the set under the "selfcontained null hypothesis" testing, while, under the "competitive null hypothesis" testing, the significance of a gene set depends not only on the genes in the set but also on all the other genes in the array.
In summary, the primary advantage of SAMGS may be the slightly higher power in the low αlevel region that is of highest scientific interest, whereas, despite the need for appropriate standardization, Global Test and the ANCOVA Global Test can be used for a variety of phenotypes and incorporate covariates in the analysis.
Conclusion
In conclusion, Global Test and ANCOVA Global Test require appropriate standardization of gene expression measurements across genes for proper performance. Standardization of these two methods and the use of permutation inference make the performance of all three methods similar, with a slight power advantage in SAMGS. Global Test and the ANCOVA Global Test can be used for a variety of phenotypes and incorporate covariates in the analysis.
Methods
In this section, we describe the three geneset analysis methods. The phenotype of interest is assumed to be binary.
1) Global Test
where R = (1/m)XX', μ = h^{1}(α), and μ_{2} is the second central moment of Y under the null hypothesis. It can be shown that Q is asymptotically normally distributed (a quadratic form which is nonnegative). However, when the sample size is small, a better approximation to the distribution of Q is a scaled χ^{2} distribution. The pvalue can, therefore, be calculated based on an approximate distribution of the test statistic, i.e., the asymptotic distribution with a nonchisquared distributed quadratic form or the scaled χ^{2} distribution, or permutations of samples.
2) ANCOVA Global Test
The null hypothesis of the Global Test is in the form of P(YX) = P(Y). The ANCOVA Global Test changes the roles of gene expression pattern X and phenotype Y, and the null hypothesis becomes P(XY = 1) = P(XY = 2), or, for each gene j in a gene set of interest, μ_{1j}= μ_{2j}, where μ_{ ij }is the mean expression of gene j in phenotype group i, i = 1,2. A linear model of the form, μ_{ ij }= μ + α_{ i }+ β_{ j }+ γ_{ ij }, with group effects α, gene effects β, and the genegroup interaction γ, is then used to test the null hypothesis. The conditions Σα_{ i }= Σβ_{ j }= Σ_{ i }γ_{ ij }= Σ_{ j }γ_{ ij }= 0 ensure identifiability of the parameters. The null hypothesis under the parameterization of the linear model is H_{0}: α_{ i }= γ_{ ij }= 0. The test statistic is the Ftest statistic for linear models: $F=\{(SS{R}_{{\text{H}}_{\text{0}}}SS{R}_{{\text{H}}_{\text{1}}})/(d{f}_{{\text{H}}_{\text{1}}}d{f}_{{\text{H}}_{\text{0}}})\}/\{SS{R}_{{\text{H}}_{\text{1}}}/d{f}_{{\text{H}}_{\text{1}}}\}$, where SSR_{ H }and df_{ H }denote the sum of squares and degrees of freedom, respectively, under the hypothesis H. The pvalue can be calculated by a permutation distribution of the F statistic or an asymptotic distribution of the test statistic.
3) SAMGS
A permutation distribution of the SAMGS statistic is used to calculate the pvalue. We note that even though the recalculation of s_{0} is needed for each permutation, practically the implication is small, and both SAM and SAMGS excel addins do not recalculate s_{0}.
Each of the three methods provides a statistically valid test of the null hypothesis of no differential gene expression across a binary phenotype.
For the purpose of methodological comparisons, we also applied three "competitive null hypothesis" approaches to the analysis of the p53 dataset: Gene Set Enrichment Analysis (GSEA) [2]; the Significance Analysis of Function and Expression (SAFE) [16]; and Fisher's exact test [17]. Both GSEA and SAFE employ a twostage approach to access the significance of a gene set. First, genespecific measures are calculated that capture the association between expression and the phenotype of interest. Then a test statistic is constructed as a function of the genespecific measures used in the first step. The significance of the test statistics is assessed by permutation of the response values. For GSEA, the Pearson correlation is used in the first step, according to Mootha et al. [2] and the Enriched Score is used in the second step. For SAFE, the student tstatistic is used in the first step and the Wilcoxon ranksum test is used in the second step, both of these being the default options. For the Fisher's exact test, the list of significant genes is obtained from SAM [1]. An FDR cutoff of 0.3 assigned significance to 5% of the genes in the entire gene list.
Availability and requirements
Project name: Comparison of statistical methods for gene set analysis based on testing selfcontained hypotheses via. subject sampling.
Project home page: http://www.ualberta.ca/~yyasui/homepage.html
Operating system(s): Microsoft Windows XP
Programming language: R 2.4.x and Microsoft Excel 2003 or 2007
Notes
Abbreviations
 SAMGS:

Significance Analysis of Microarray for Gene Sets
Declarations
Acknowledgements
ID, AJA, and YY are supported by the Alberta Heritage Foundation for Medical Research and YY is supported by the Canada Research Chair Program and the Canadian Institutes of Health Research.
Authors’ Affiliations
References
 Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 2001, 98: 5116–5121. 10.1073/pnas.091062498PubMed CentralView ArticlePubMedGoogle Scholar
 Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J, Puigserver P, Carlsson E, Ridderstrale M, Laurila E, Houstis N, Daly MJ, Patterson N, Mesirov JP, Golub TR, Tamayo P, Spiegelman B, Lander ES, Hirschhorn JN, Altshuler D, Groop LC: PGC1alpharesponsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 2003, 34: 267–273. 10.1038/ng1180View ArticlePubMedGoogle Scholar
 Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP: Gene set enrichment analysis: a knowledgebased approach for interpreting genomewide expression profiles. Proc Natl Acad Sci USA 2005, 102: 15545–15550. 10.1073/pnas.0506580102PubMed CentralView ArticlePubMedGoogle Scholar
 Goeman JJ, van de Geer SA, de Kort F, van Houwelingen HC: A global test for groups of genes: testing association with a clinical outcome. Bioinformatics 2004, 20: 93–99. 10.1093/bioinformatics/btg382View ArticlePubMedGoogle Scholar
 Goeman JJ, Oosting J, CletonJansen AM, Anninga JK, van Houwelingen HC: Testing association of a pathway with survival using gene expression data. Bioinformatics 2005, 21: 1950–1957. 10.1093/bioinformatics/bti267View ArticlePubMedGoogle Scholar
 Mansmann U, Meister R: Testing differential gene expression in functional groups. Goeman's global test versus an ANCOVA approach. Methods Inf Med 2005, 44: 449–453.PubMedGoogle Scholar
 Dinu I, Potter JD, Mueller T, Liu Q, Adewale AJ, Jhangri GS, Einecke G, Famulski KS, Halloran P, Yasui Y: Improving GSEA for Analysis of Biologic Pathways for Differential Gene Expression across a Binary Phenotype. BMC Bioinformatics 2007, 8: 242. 10.1186/147121058242PubMed CentralView ArticlePubMedGoogle Scholar
 Goeman JJ, Bühlmann P: Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics 2007, 23: 980–987. 10.1093/bioinformatics/btm051View ArticlePubMedGoogle Scholar
 Gene Set Enrichment Analysis[http://www.broad.mit.edu/gsea]
 Huber W, von Heydebreck A, Sultmann H, Poustka A, Vingron M: Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 2002, 18(Suppl 1):S96–104.View ArticlePubMedGoogle Scholar
 Huber W, von Heydebreck A, Sueltmann H, Poustka A, Vingron M: Parameter estimation for the calibration and variance stabilization of microarray data. Stat Appl Genet Mol Biol 2003, 2: Article3.PubMedGoogle Scholar
 Goeman JJ, Van de Geer SA, van Houwelingen HC: Testing against a high dimensional alternative. J R Statist Soc B 2006, 68: 477–493. 10.1111/j.14679868.2006.00551.xView ArticleGoogle Scholar
 Storey JD: A direct approach to false discovery rates. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2002, 64: 479–498. 10.1111/14679868.00346View ArticleGoogle Scholar
 Tian L, Greenberg SA, Kong SW, Altschuler J, Kohane IS, Park PJ: Discovering statistically significant pathways in expression profiling studies. Proc Natl Acad Sci USA 2005, 102: 13544–13549. 10.1073/pnas.0506577102PubMed CentralView ArticlePubMedGoogle Scholar
 Tomfohr J, Lu J, Kepler TB: Pathway level analysis of gene expression using singular value decomposition. BMC Bioinformatics 2005, 6: 225. 10.1186/147121056225PubMed CentralView ArticlePubMedGoogle Scholar
 Barry WT, Nobel AB, Wright FA: Significance analysis of functional categories in gene expression studies: a structured permutation approach. Bioinformatics 2005, 21: 1943–1949. 10.1093/bioinformatics/bti260View ArticlePubMedGoogle Scholar
 Draghici S, Khatri P, Martins RP, Ostermeier GC, Krawetz SA: Global functional profiling of gene expression. Genomics 2003, 81: 98–104. 10.1016/S08887543(02)000216View ArticlePubMedGoogle Scholar
 le Cessie S, van Houwelingen HC: Testing the fit of a regression model via score tests in random effects models. Biometrics 1995, 51: 600–614. 10.2307/2532948View ArticlePubMedGoogle Scholar
 HouwingDuistermaat JJ, Derkx BH, Rosendaal FR, van Houwelingen HC: Testing familial aggregation. Biometrics 1995, 51: 1292–1301. 10.2307/2533260View ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.