Gene set analysis using variance component tests

Huang, Yen-Tsung; Lin, Xihong

doi:10.1186/1471-2105-14-210

Methodology article
Open access
Published: 28 June 2013

Gene set analysis using variance component tests

Yen-Tsung Huang¹ &
Xihong Lin²

BMC Bioinformatics volume 14, Article number: 210 (2013) Cite this article

7096 Accesses
17 Citations
2 Altmetric
Metrics details

Abstract

Background

Gene set analyses have become increasingly important in genomic research, as many complex diseases are contributed jointly by alterations of numerous genes. Genes often coordinate together as a functional repertoire, e.g., a biological pathway/network and are highly correlated. However, most of the existing gene set analysis methods do not fully account for the correlation among the genes. Here we propose to tackle this important feature of a gene set to improve statistical power in gene set analyses.

Results

We propose to model the effects of an independent variable, e.g., exposure/biological status (yes/no), on multiple gene expression values in a gene set using a multivariate linear regression model, where the correlation among the genes is explicitly modeled using a working covariance matrix. We develop TEGS (Test for the Effect of a Gene Set), a variance component test for the gene set effects by assuming a common distribution for regression coefficients in multivariate linear regression models, and calculate the p-values using permutation and a scaled chi-square approximation. We show using simulations that type I error is protected under different choices of working covariance matrices and power is improved as the working covariance approaches the true covariance. The global test is a special case of TEGS when correlation among genes in a gene set is ignored. Using both simulation data and a published diabetes dataset, we show that our test outperforms the commonly used approaches, the global test and gene set enrichment analysis (GSEA).

Conclusion

We develop a gene set analyses method (TEGS) under the multivariate regression framework, which directly models the interdependence of the expression values in a gene set using a working covariance. TEGS outperforms two widely used methods, GSEA and global test in both simulation and a diabetes microarray data.

Background

Genome-wide analysis using microarray data, including RNA expression, DNA copy number and epigenetic DNA methylation, has become a popular tool in genomic research. Single gene/marker analysis provides a quick and convenient tool to identify top genes that might be associated with phenotypic trait. However, it is subject to a large number of false positives due to a large number of comparisons, and does not fully take into account that some genes have similar biological functions and work together.

Microarray gene expressions or genetic markers usually have natural groupings based on biological knowledge. For example, multiple genes belong to the same biological pathway or network; or contiguous copy number-detecting probes belong to the same gene or cytoband. Incorporating the prior knowledge or annotation about the grouping underlying the genome-wide data can make the results more interpretable. Note that the grouping may not necessarily come from biology. It can also be a cluster of genes identified using clustering methods. In this paper, these natural or statistical groupings are loosely called a gene set, which refers to a set of genes, or a set of markers or simply a set of probes.

Numerous approaches for gene set analyses have been proposed [1], including the overrepresentation analysis [2], the univariate tests [3], the multivariate tests [4, 5], the global test [6], and gene set enrichment analysis (GSEA) [7, 8] and its variant [9]. The overrepresentation-type analysis has been found to suffer from methodological problems, which may lead to confusing results [10]. The global test and GSEA improve over the overrepresentation-type analysis. The global test regresses the phenotype on gene expressions in a gene set and tests for regression coefficients. GSEA performs a modified Kolmogorov-Smirnov test by comparing a gene set with the rest of the genes in the genome. However, the test statistics used in both methods ignore the correlation among the genes in a gene set and hence are subject to loss of statistical power, as genes in a gene set are often correlated and function together. The univariate test does not account for the correlation and loses power when the interdependence within the gene set is high, compared with the multivariate tests [11].

We propose in this paper to test for the effect of a gene set using a variance component test in multivariate regression model, where the correlation among genes in a gene set is explicitly taken into account. We term this test TEGS (Test for the Effect of a Gene Set). Specifically, we regress the gene expressions in a gene set on an independent variable, such an exposure or biological state variable, e.g., smoking (yes/no) or lung cancer status (yes/no), using multivariate regression, where the correlation among genes in a gene set is modeled using a working covariance matrix. As the number of genes might be large in a gene set, we develop a variance component score test for testing the effects of the exposure/biological state on the overall gene set profile by assuming regression coefficients follow a common distribution.

We show that TEGS includes the global test of Goeman, et al (2004) as a special case when correlation among the genes in a gene set is ignored. We conduct simulation studies to evaluate the finite sample performance of TEGS and compare it with the global test and GSEA. We apply the proposed method to analysis of the Type II Diabetes data set [7].

Methods

The model

Suppose that there are n subjects and subject i has p continuous outcomes Y_i1,Y_i2,…,Y_ip. In gene set analysis, the p outcomes indicate the expression values of p genes in a gene set, and x_i is an independent variable, e.g., exposure/biological state variable, such as mutation status: 1 if mutant and 0 if wild-type; or disease status (yes/no) for subject i. We consider the multivariate linear model

Y_{ij} = α_{j} + x_{i} β_{j} + ε_{ij}, i = 1, 2, ..., n and j = 1, 2, ..., p

(1)

where the errors, ε_i = (ε_i1,ε_i2,...,ε_ip)^T are assumed to be independent across different subjects and follow a multivariate normal distribution with mean 0 and true covariance Σ, which is often unknown, and α_j is the average expression value of gene j for those with x = 0. Covariates can be incorporated in the model (1) by expanding α_j to be $\sum_{k = 1}^{K} α_{jk} z_{ik}$ where K is the number of covariates plus one (i.e., the intercept), z_ik is the covariate k of subject i, z_i1 is 1, and α_jk is the regression coefficient of the covariate k for the gene j. However, because the data we are dealing with has small n and large p, we would need the ridge regression to estimate α_jk. If x_i is binary, e.g., disease status, β_j is the mean difference of the expression levels of gene j between the two disease groups. Model (1) can be written in matrix notation by stacking data of n subjects and p gene expressions as

\begin{array}{c} Y = J α + X β + ε, \end{array}

(2)

where $Y = {(Y_{1}^{T}, \dots, Y_{n}^{T})}^{T}$ is an n p × 1 vector, Y_i = (Y_i1, Y_i2,…,Y_ip)^T, $ε = {(ε_{1}^{T}, \dots, ε_{n}^{T})}^{T}$ , J = (I_p, ⋯,I_p)^T,X = (x₁I_p, ⋯,x_nI_p)^T, α = (α₁,α₂, ⋯,α_p)^T, β = (β₁,β₂, ⋯,β_p)^T.

Gene set analysis using TEGS: A variance component score test

The null hypothesis H₀ : β = 0 indicates that x_i has no effect on the mean of gene expression profile Y_i in a gene set. A traditional multivariate test for H₀[4] is based on a p-degree of freedom test and hence has limited power when the size of the gene set p is large, especially in the presence of a large number of null genes. To overcome this problem and improve test power, we assume the regression coefficients β_j follows an arbitrary common distribution with mean 0 and variance τ. The model (2) becomes a linear mixed model [12]. The null hypothesis H₀ : β = 0 is equivalent to the null hypothesis for the variance component H₀ : τ = 0. To test for H₀ : τ = 0, one can perform a variance component score test [13].

Specifically, following Lin (1997), simple calculations show that the score for the variance component τ under the induced linear mixed model is

{(Y - J α)}^{T} Σ_{n}^{- 1} {XX}^{T} Σ_{n}^{- 1} (Y - J α) - tr (Σ_{n}^{- 1} {XX}^{T}),

(3)

where Σ_n = diag(Σ,⋯,Σ) is an n p × n p block diagonal matrix. As the second term does not depend on data, we use the first term to construct the test statistic

Q_{T} = {(Y - J \hat{α})}^{T} Σ_{n}^{- 1} {XX}^{T} Σ_{n}^{- 1} (Y - J \hat{α}),

(4)

where $\hat{α}$ is the maximum likelihood estimator of α under H₀. One can easily show that under H₀, $\hat{α} = {(J^{T} Σ_{n}^{- 1} J)}^{- 1} J^{T} Σ_{n}^{- 1} Y = \bar{Y}$ , where $\bar{Y} = n^{- 1} \sum_{i = 1}^{n} Y_{i}$ is simply the sample mean. Hence Equation (4) can be written as

Q_{T} = Y^{T} (I - H) Σ_{n}^{- 1} {XX}^{T} Σ_{n}^{- 1} (I - H) Y,

where H = n⁻¹JJ^T. As Q_T is quadratic in Y. Some calculations show that Q_T follows a mixture of chi-square distribution $\sum_{j} λ_{j} χ_{1, j}^{2}$ , where the weights λ_j are the eigenvalues of the matrix $X^{T} (I - H) Σ_{n}^{- 1} (I - H) X$ .

The test statistic Q_T depends on the true covariance matrix Σ of Y_i, which is often unknown in practice and requires estimation of a large number of parameters. Although sample covariance can be used to estimate Σ, it is not stable when the size of the gene set p is large or moderate and sample size is small. We hence propose the use of a working covariance V for ε_i in (1) [14], which has a simpler structure and depends on a small number of parameters. We derive a variance component test for H₀ : τ = 0 assuming ε_i has a covariance V, which might misspecify the true covariance Σ. Under this working model, similar calculations show that the variance component score statistic for H₀ : τ = 0 is

Q = Y^{T} (I - H) V_{n}^{- 1} {XX}^{T} V_{n}^{- 1} (I - H) Y,

(5)

where V_n = diag(V,⋯,V). We term the variance component test using Q TEGS (Test for the Effect of a Gene Set).

Examples of working covariance V include working independence (Indpt), which assumes the genes are independent in a gene set; factor analysis covariances assuming two factors (F-2); adaptive factor analysis covariance with the estimated number of factors explaining up to 80% variability (F-adpt), compound symmetry (CpSym), which assumes the same pair-wise correlation among genes; and unstructured sample covariance (Unstr).

The unstructured sample covariance is estimated using the residuals ${\hat{ε}}_{ij}$ obtained by performing separate simple linear regression of individual gene expressions Y_ij on x_i in (1). When x_i is binary, e.g., disease=yes/no, ${\hat{ε}}_{ij}$ is simply the jth centered outcome using the group specific means. When the number of genes in a gene set is large and the sample size is small, the standard empirical unstructured sample covariance estimator is unstable. We hence stabilize it using a ridge estimator by adding the 5th percentiles of sample variances to the diagonal of the empirical covariance estimator. Estimation for the compound symmetry covariance and the factor analysis covariance was based on the ridge unstructured covariance estimator. Specifically, for the compound symmetry covariance estimator, the pair-wise covariance is estimated as the sample mean of the off-diagonal elements of the ridge unstructured covariance estimator. The two-factor and adaptive-factor covariances are estimated by singular value decomposition of the ridge unstructured covariance estimator.

We discuss in Section Null distribution of TEGS estimation of the p-value using the TEGS test statistic Q. We perfromed simulation studies to investigate the performance of size and power using different working covariances in a wide range of scenarios and compare TEGS with that using Q_T, which is based on the true covariance matrix of Y_i and is the optimal test statistic within the TEGS statistic family, but cannot be calculated in practice as the true covariance of Y_i is unknown.

Special case of two group comparison and relationship of TEGS with the global test

Consider the setting of testing for the effect of a binary exposure/disease status on expressions in a gene set, i.e., x_i is binary (0/1), some calculations show that the TEGS statistic Q in (5) can be simplified as

\begin{array}{l} Q & = {\bar{Y} (1) - \bar{Y} (2)}^{T} V_{n}^{- 1} V_{n}^{- 1} {\bar{Y} (1) - \bar{Y} (2)} \\ = {\{\frac{n_{1} n_{2}}{n}\}}^{2} \sum_{j = 1}^{p} {\{\sum_{k = 1}^{p} v^{jk} [{\bar{Y}}_{k} (1) - {\bar{Y}}_{k} (2)]\}}^{2}, \end{array}

(6)

where $\bar{Y} (1)$ and $\bar{Y} (2)$ are the sample mean of the outcome vector for group 1 and 2, and ${\bar{Y}}_{k} (1)$ and ${\bar{Y}}_{k} (2)$ (k = 1,⋯,p) are their components, v^jk is the (j,k)th element of V⁻¹. This suggests that the TEGS statistic Q compares the weighted average of the outcome-specific mean differences of the gene expression profiles between the two groups.

If one assumes working independence V = I, the TEGS test statistic Q in (6) becomes

\begin{array}{l} Q_{ind} & = {\bar{Y} (1) - \bar{Y} (2)}^{T} {\bar{Y} (1) - \bar{Y} (2)} \\ = {\{\frac{n_{1} n_{2}}{n}\}}^{2} \sum_{k = 1}^{p} {{\bar{Y}}_{k} (1) - {\bar{Y}}_{k} (2)}^{2} . \end{array}

(7)

It can be easily shown that the TEGS statistic that uses the working independence covariance among gene expressions in a gene set (Q_ind) is the same as the global test of Goeman, et al (2004). Although the global test is equivalent to the TEGS with working independence, it is still derived under the model (1) where the true covariance Σ is not necessarily independent.

Specifically, the global test is derived as the variance component test under the logistic regression model of the binary disease status x_i on the p gene expressions

logit (π_{i}) = γ_{0} + γ_{1} Y_{i 1} + \dots + γ_{p} Y_{ip} .

(8)

where π_i = P r(x_i = 1|Y_i1,⋯,Y_ip) is the probability of disease given the gene expression profiles in a gene set. Under the logistic model (8), to test for the null hypothesis of no gene set effect on disease status H₀ : γ₁ = ⋯ = γ_p = 0, Goeman, et al (2004) assumed the coefficients γ_j are independent and follow an arbitrary distribution with mean 0 and variance τ. The logistic model (8) hence becomes a logistic mixed model [15]. It follows that the null hypothesis H₀ : γ₁ = ⋯ = γ_p = 0 is identical to H₀ : τ = 0. Goeman, et al (2004) derived the variance component score test for H₀ : τ = 0 and termed it as the global test. One can easily show that the global test is identical to Q_ind in (7), apart from a term that does not depend on Y.

A comparison of (6) and (7) shows that TEGS has the flexibility to account for different correlations among gene expressions in a gene set by comparing the weighted differences of the means of gene expressions between two groups, while the global test, which is the same as the TEGS assuming working independence among gene expressions, ignores correlation among gene expressions. One hence would expect that TEGS that accounts for within gene set correlation is likely to be more powerful than the global test.

Another testing procedure that is closely related to TEGS is the Sequence Kernel Association Test (SKAT), a method developed to analyze SNP (single nucleotide polymorphism) or sequence data in genome-wide association studies [16]. It has been shown that the global test is equivalent to the SKAT with linear kernel [16, 17]. Thus, the TEGS with working independence is equivalent to the SKAT with linear kernel. However, TEGS with other working correlations and SKAT with other kernels do not have an obvious correspondence.

Null distribution of TEGS

As the TEGS statistic Q in (5) is a quadratic function of Y, we have shown that it follows a mixture of chi-square distributions, where the weights depend on the true covariance Σ and the working covariance V. We propose two methods to estimate the p-value of TEGS.

Permutation

One approach to calculate the p-value for the TEGS statistic Q, is based on permutation, where we permute the x_i’s, and calculate Q for each permuted dataset and compare the observed value of Q with those calculated based on the permuted samples. Note that for each permutation, V need to be re-estimated given an assumed structure, e.g., under independence, unstructured, exchangeable, as V is the covariance conditional on the x. If the sample size is large (i.e. >100), one may use the Monte Carlo approach proposed by Lin [18].

Scaled χ²approximation

The second approach is to compute the p-value for the TEGS statistic Q is to use the Satterthwaite method [19] to approximate the null distribution of Q, which is a mixture of χ² distributions. The Satterthwaite method approximates the null distribution of Q by a scaled χ² distribution $κ χ_{ν}^{2}$ , where κ is the scale parameter and ν is the degree of freedom. The values of κ and ν can be estimated by matching the first two moments of Q under H₀ with those of the the scaled χ² distribution as

κ = \frac{Var (Q)}{2 E (Q)}, ν = \frac{2 {[E (Q)]}^{2}}{Var (Q)} .

We estimate the mean and variance of Q under the null using permutation and denote the p-value estimated using this approach as $p_{κ χ_{ν}^{2}}$ . Using the Satterthwaite approximation, we are able to calculate small p-values based on a smaller number of permutations than the first method.

Normal mixture approximation

In order to achieve better precision of smaller p-values, we further propose a method using the normal mixture approximation [20]. Specifically, we fit a two-population normal mixture π₁N(μ₁,σ12) + π₂N(μ₂,σ22) for the $Φ^{- 1} (p_{κ χ_{ν}^{2}}^{(b)})$ where $p_{κ χ_{ν}^{2}}^{(b)}$ is the scaled χ² approximated p-value for the statistic Q^(b) obtained at permutation b, b = 1,...,B (B is the number of permutation), Φ is the cumulative distribution function of the standard normal, and π_a, μ_a and $σ_{a}^{2}$ are proportion, mean and variance of the normal distribution a (a = 1,2), respectively. p-value can then be estimated as the tail probability by comparing $Φ^{- 1} (p_{κ χ_{ν}^{2}})$ and ${\hat{π}}_{1} N ({\hat{μ}}_{1}, {\hat{σ}}_{1}^{2}) + {\hat{π}}_{2} N ({\hat{μ}}_{2}, {\hat{σ}}_{2}^{2})$ where ${\hat{μ}}_{a}$ ${\hat{σ}}_{a}^{2}$ and ${\hat{π}}_{a}$ , respectively are maximum likelihood estimates of μ_a, $σ_{a}^{2}$ and π_a.

Power calculations

To design a new study using a gene set analysis, one needs to perform power calculations. We discuss in this section power calculations using TEGS. The distribution of Q under the alternative hypothesis follows a mixture of non-central chi-square distributions. We approximate this distribution using a scaled non-central chi-square distribution $κ χ_{ν}^{2} (δ)$ . Specifically, we first estimate κ and ν under H₀ as $κ = {Var}_{H_{0}} (Q) / [2 E_{H_{0}} (Q)]$ and ${ν = {2 [E}_{H_{0}} (Q)]}^{2} / {Var}_{H_{0}} (Q)$ , where $E_{H_{0}} (Q)$ and ${Var}_{H_{0}} (Q)$ are the mean and the variance estimated theoretically as under the null as

\begin{align} E_{H_{0}} (Q) & = tr ((I - H) V^{- 1} {XX}^{T} V^{- 1} (I - H) Σ) \\ {Var}_{H_{0}} (Q) & = 2 tr ((I - H) V^{- 1} {XX}^{T} V^{- 1} (I - H) Σ \\ (I - H) V^{- 1} {XX}^{T} V^{- 1} (I - H) Σ) . \end{align}

For power calculations, to estimate the non-centrality parameter δ, the theoretical mean $E_{H_{A}} (Q$ ) under the alternative is

\begin{align} E_{H_{A}} (Q) & = tr ((I - H) V^{- 1} {XX}^{T} V^{- 1} (I - H) Σ) \\ + β^{T} X^{T} (I - H) V^{- 1} X X^{T} V^{- 1} (I - H) X β . \end{align}

Setting $E_{H_{A}} (Q) = (ν + δ) κ$ , which is the mean of $κ χ_{ν}^{2} (δ)$ , one can solve for δ, and calculate the power of the test using $Pr (χ_{ν}^{2} (δ) > χ_{ν, α}^{2})$ where α is the size of the test. The true covariance Σ and the working covariance V can be estimated using the pilot data. One can perform calculations by varying and the effects β.

Simulation study

Single gene set

We simulated the data using model (1). Two different combinations of n and p were considered: n = 50 and p = 10 and n = 20 and p = 40. Four different true covariances of ε_i, Σ, were investigated: (1) compound symmetry (CS), where we assumed the diagonal elements equal to 1 and the off-diagonal elements equal to 0.1 or 0.5; (2) first-order autoregressive (AR1), where we assumed the diagonal elements equal to 1 and off-diagonal elements decay by a factor of 0.5 or 0.8; (3) two factor covariance (F2): $Σ = P_{1} P_{1}^{T} + P_{2} P_{2}^{T} + diag {u}$ , where the p elements of the two factors, P₁ and P₂ were generated independently from two Gaussian distributions, and u was chosen to make the diagonal elements of the Σ equal to 1’s; (4) the unstructured covariance (UNS), which was assumed to be the sample covariance of MAP00240_Pyrimidine_metabolism (p = 40) using the Type II Diabetes data in Mootha et al. (2003). The sample covariance of MAP00240_Pyrimidine_metabolism was calculated based on 17 subjects with normal glucose tolerance and 17 Type II Diabetes patients by conditioning on the disease status. To avoid singularity of the sample covariance, the 5th percentile of the diagonal elements was added to the diagonal to construct the true covariance matrix used in simulations.

The regression coefficients β was set by varying the proportion of non-zero β’s and their magnitudes. For n = 50 and p = 10, 0%, 40% and 80% of β’s were set to non-zero. The non-zero βs were set to be β = ±0.25 or ±0.5. For n = 20 and p = 40, 0%, 25%, 50% and 60% of β’s were set to be non-zero. The non-zero βs were set to be ±0.5 or ±1.0. The numbers of (−0.5,0.5) (or (−1.0,1.0)) are (2,8), (5,5), (5,15), (10,10), (5,25), (10,20) and (15,15). The effect size is summarized by an index, $\sum_{j = 1}^{p} β_{j} / \bar{σ^{2}}$ where $\bar{σ^{2}}$ is the average variance of the p gene expression in the same gene set, in the power plots given in Figures 1, 2, 3 and 4.

For each simulation and each true covariance configuration, we compared the performance of TEGS assuming six different covariance matrices: true covariance, unstructured covariance, independence, two factor analysis covariance, adaptive factor analysis covariance, and compound symmetry. Note that the TEGS assuming working independence corresponds to the global test (Goeman et al. 2004). The p-values were calculated as the tail probability of Q under the null distribution. The null distribution was approximated by the methods described in Section Null distribution of TEGS. A total of 1000 permutations were performed to nonparametrically approximate the null distribution of Q. A total of 5000 and 1000 simulations, respectively were run for the setting under the null hypothesis (i.e., β = 0) and the alternative hypothesis to calculate sizes and powers. Type I error was calculated at the α = 0.05 level. Statistical power was calculated as the percentage of p-values less than 0.05 among 1000 simulations.

Multiple gene sets

Gene set enrichment analysis (GSEA) is a widely used approach to study the enrichment of a gene set in a large number of genes, which often consists of multiple gene sets. The null hypothesis hence corresponds to the competitive null hypothesis [10]. To compare the performance of our proposed method TEGS with GSEA, we set up a simulation study involving multiple gene sets. The configuration is as follows:

Setting 1: We set n = 20 and the number of gene sets be 20. Ten gene sets have p = 10 genes (gene sets #1-10). Among them, six gene sets are under the null and four gene sets are under the alternative. The other ten gene sets have p = 40 genes per gene set (gene sets #11-20). Among them, six gene sets are under the null and four gene sets are under the alternative. Among the gene sets under the alternatives, we allowed some genes to have no effects (i.e., those with β_j = 0), and varied the number of signal genes (i.e., those with β_j ≠ 0). The number and magnitude of non-zero β’s or each of the gene sets under the alternative hypothesis are given in the top of Table 1. This setting has a total of 500 genes, with the total number of signal genes equal to 104 spreading across 8 gene sets and the total number of null genes equal to 396. We assumed in this setting the 20 gene sets were uncorrelated. Within each gene set, we assumed the genes were correlated with the two factor covariance matrix: $v^{*} = P_{1} P_{1}^{T} + P_{2} P_{2}^{T} + diag {u}$ .
Setting 2: This setting is identical to Setting 1 except that we allowed correlation among the gene sets: gene sets #1-3, #4-6, #7-9, #11-13, #14-16 and #17-19 are correlated. The correlation structures are estimated by two factor covariance from the sample covariances of the gene sets with p = 30 and 120 in the diabetes dataset. We marked correlated gene sets in Table 1.
Setting 3: This setting is identical to Setting 2 except that we added additional 4500 null genes to the 500 genes in the 20 gene sets. This setting mimics more practical gene expression studies. This gives a total of 5000 genes, with 104 signal genes spreading across 8 gene sets and 4896 null genes. Among the 20 gene sets, same as before, there are 8 gene sets under the alternative and 12 null gene sets.

Table 1 The simulation results comparing size and power of TEGS and GSEA

Full size table

For each setting, we applied TEGS and GSEA to each of the 20 gene sets to compare size and power.

Application: reanalysis of Type II Diabetes data

We applied the proposed method to analyze the Type II Diabetes gene expression data, which were previously analyzed by Mootha et al. (2003) using GSEA to study for the pathway effects. The original data have three patient groups: normal glucose tolerance, impaired glucose tolerance, and Type II Diabetes. To illustrate our method and compare it with GSEA, we restricted our analysis to two groups: 17 patients with normal glucose tolerance and 17 patients with Type II Diabetes. A total of 124 out of 149 gene sets were analyzed here. We excluded 25 small gene sets, which have less than four probes.

We performed TEGS assuming five different working covariances, including independence, unstructured covariance, factor analysis covariance using two factors and the number of factors that explain up to 80% variability, and compound symmetry covariance. We calculated the p-values using permutation and the Satterthwaite method described in Section Null distribution of TEGS. The number of permutations for each gene set was 2000. The working independence TEGS corresponds to the global test [4]. We compared the performance of TEGS with GSEA. The q value, an index measuring the false discovery rate (FDR) [21, 22], was used to adjust for multiple comparisons.