Open Access

Multiple testing for gene sets from microarray experiments

  • Insuk Sohn1,
  • Kouros Owzar2,
  • Johan Lim3,
  • Stephen L George2,
  • Stephanie Mackey Cushman4 and
  • Sin-Ho Jung2Email author
BMC Bioinformatics201112:209

DOI: 10.1186/1471-2105-12-209

Received: 13 December 2010

Accepted: 26 May 2011

Published: 26 May 2011

Abstract

Background

A key objective in many microarray association studies is the identification of individual genes associated with clinical outcome. It is often of additional interest to identify sets of genes, known a priori to have similar biologic function, associated with the outcome.

Results

In this paper, we propose a general permutation-based framework for gene set testing that controls the false discovery rate (FDR) while accounting for the dependency among the genes within and across each gene set. The application of the proposed method is demonstrated using three public microarray data sets. The performance of our proposed method is contrasted to two other existing Gene Set Enrichment Analysis (GSEA) and Gene Set Analysis (GSA) methods.

Conclusions

Our simulations show that the proposed method controls the FDR at the desired level. Through simulations and case studies, we observe that our method performs better than GSEA and GSA, especially when the number of prognostic gene sets is large.

Background

One of the primary objectives in microarray association studies is the identification of individual genes that are associated with clinical endpoints such as disease type, toxicity or time to death. It is also of interest to examine the association between known biological categories or pathways and outcome. To this end, gene sets a priori believed to have similar biological functions from databases including KEGG [1] and Gene Ontology [2] are used. In recent years, a number of statistical methods have been proposed for the identification of significant genesets based on microarray experiments. Ackerman and Strimmer [3] list 36 methods, including [413], while outlining a general framework for formulating the hypothesis and analysis method for gene set inference.

In this paper, we propose a gene set analysis framework that utilizes classical theory for estimating equations to assess the association between each gene set and the outcome of interest. One of the statistical challenges in this setting is that there is dependency within each gene set, by virtue of coregulated genes belonging to the same gene set, as well as dependency across the gene sets since gene sets are not mutually exclusive. Our method will account for both intra-gene set and inter-gene set dependencies. Furthermore, given the large number of gene sets, one has to address the issue of multiple testing. The sampling distribution of our proposed procedure is approximated using permutation resampling to simultaneously address the dependency and multiple testing issues by controlling the false discovery rate (FDR; [14]). In the framework described by Ackerman and Strimmer [3], gene set analysis methods are broadly categorized as univariate or as global and multivariate procedures. Generally speaking, our method belongs to the latter category. The novelty of our proposed approach is that it leverages the flexibility of estimating equations to conduct inference for a variety of endpoints including binary, continuous, censored or longitudinal outcomes.

After presenting the theoretical and computational details for the proposed method, we summarize the results from a simulation study evaluating its statistical properties. We then apply the proposed method to analyze a number of microarray data sets. Finally, we provide a brief discussion to compare the performance of our method to those of two other methods: GSEA [6] and GSA [7]. For notational brevity, we will refer to transcripts on microarrays as genes, even though this may not be technically correct.

All analyses are carried out using the R statistical environment [15]. The code is available from http://www.duke.edu/~is29/GeneSet. Generalized inverses are computed using the pinv function from the maanova[16] extension package. The inverse of linear shrinkage covariance matrix is computed using invcov.shrink function in corpcor [17] extension package. The R extension packages R-GSEA[6] and GSA[18] are used to implement the GSEA and GSA methods respectively. The qvalue[19] extension package is used for calculating FDR adjusted P-values. For gene set and probe set annotation, Bioconductor [20] annotation packages (e.g., hu6800.db[21]) and Molecular Signature Database (MSigD; http://www.broad.mit.edu/gsea) annotation files are used.

Methods

In these discussions, we will assume that RNA expression levels for m genes have been measured for n patients. Let us denote the set of genes on the microarray by G = {G1, ..., G m }. For patient i(= 1, ..., n), let y i denote the clinical outcome and z ij denote the measured gene expression level for G j . Let G j Y denote that expression of gene j is not associated with outcome. For each gene the marginal inference of interest will be canonically presented as testing H j : G j Y versus https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq1_HTML.gif .

Suppose that for gene j, the hypotheses of independence can be quantified using a parameter say θ j . We assume that θ j = 0 indicates that G j and the outcome are independent. Thus, the hypotheses of interest can be expressed as testing H j : θ j ≠ 0 against https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq2_HTML.gif . We consider testing these marginal hypotheses within the context of general estimating functions, which for large n, are expressible in the form
https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_Equa_HTML.gif

where U ij (θ j ) is a function of the data for subject i only so that U1j, ...,U nj are independent. The corresponding test statistic for H j will be U j (0). Let μ ij (θ j ) = E(U ij ) and https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq3_HTML.gif . If E{U j (θ)} is a smooth function and E{U j (θ)} = 0 has a unique solution, then the solution https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq4_HTML.gif to U j (θ) = 0 is a consistent estimator of θ j . The family of score statistics [22] is a special case of this type of estimating equation.

A gene set is defined as a subset of G. We will assume that there are K pre-specified gene sets say https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq5_HTML.gif based on a given annotation database such as KEGG or GO. Note that https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq6_HTML.gif is usually a proper subset of G as not all genes are annotated. Let m k (k = 1, ..., K) denote the number of genes in gene set https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq7_HTML.gif . We consider a gene set to be associated with the outcome of interest if at least one of its member genes is associated with the outcome. Let https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq8_HTML.gif denote that gene set https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq7_HTML.gif is not associated with the outcome Y. The hypotheses of interest from gene set k can then be denoted as testing https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq9_HTML.gif versus https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq10_HTML.gif .

For notational convenience, for the remainder of this section we will focus on the first gene set https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq11_HTML.gif and assume that it consists of the first m1 genes, https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq12_HTML.gif . Then the hypotheses of interest can be presented as testing https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq13_HTML.gif against https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq14_HTML.gif . For testing this hypothesis, consider the vector https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq15_HTML.gif , of the first m1 marginal statistics, which is approximately normal with marginal means μ j (θ j ) and co-variances σ jj' (j, j' = 1, ..., m1). These quantities can be consistently estimated by https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq16_HTML.gif and
https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_Equb_HTML.gif

respectively, where https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq17_HTML.gif . Let https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq18_HTML.gif and https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq19_HTML.gif .

In the marginal testing setting, we have μ j (0) = 0 under H j , so we reject H j in favor of https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq20_HTML.gif if the realized value of https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq21_HTML.gif is large. The test statistic https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq21_HTML.gif has an asymptotic χ2 distribution with 1 degree of freedom under the null distribution. For gene set https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq11_HTML.gif we will consider the test statistic
https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_Equc_HTML.gif

where https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq22_HTML.gif . If n is large and m1 < n, the distribution of W1 under https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq23_HTML.gif is approximately χ2 with m1 degrees of freedom. Similarly, we can compute U k , V k and W k for any gene set https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq7_HTML.gif .

In many cases, the sample size for a microarray study may not be large enough for the null sampling distribution to be well approximated by the theoretical limiting distribution. To address this issue, we propose calculating the P-values by approximating the exact null sampling distribution using permutation resampling. Note that the permutation distribution is generated under the hypothesis https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq24_HTML.gif . That is, none of the K gene sets are associated with the outcome. This hypothesis is equivalent to the hypothesis ∩ j H j (i.e., none of the genes are associated with outcome). Note that the latter intersection is restricted to G*, the set of annotated genes. A permutation replicate sample is obtained by randomly shuffling the the clinical outcomes {y1, ..., y n } while holding the gene expression matrix in place. This ensures that the intra-gene dependency structure is preserved while breaking the association between the genes and the outcome.

If m k is large, V k many not be reliably inverted numerically and, in the case where it exceeds n, is not invertible. For these cases, we consider the Moore-Penrose (MP) generalized inverse or the inverse of the linear shrinkage covariance matrix estimate VLW[13, 2325]. Here, we remark that the Hotelling's tests with the MP generalized inverse (g-inverse) and that with the inverse of VLW have been previously studied [13, 23]. The MP g-inverse (of the sample covariance matrix) uses https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq25_HTML.gif to derive a test statistic https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq26_HTML.gif , where P k is the eigen matrix and https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq27_HTML.gif , ν1, ..., ν d are the d positive eigenvalues of V k . The asymptotic distribution of W k when m k is larger than n has been investigated extensively (e.g., [2628]). The linear shrinkage estimate (LW) of V is VLW = λV + (1 - λ)E, where E is a well conditioned target matrix and λ is the tuning parameter. The tuning parameter λ is chosen to minimize the Frobenius risk along with several candidates of target matrices [24, 25].

Two-Sample Tests

Suppose that there are two groups with n g subjects in group g(= 1, 2), n = n1 + n2. Let z gi = (zgi 1, ..., z gim )T denote the gene expression measurements from subject i(= 1, ..., n g ) in group g(= 1, 2), and https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq28_HTML.gif the vector of sample means. Kong et al. [10] consider the Hotelling's T2 statistic
https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_Equd_HTML.gif

where https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq29_HTML.gif is the pooled variance-covariance matrix. For θ j = E(z1ij) - E(z2ij) and μ ij (θ j ) = θ j , T2 asymptotically has a https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq30_HTML.gif distribution under H0.

For https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq31_HTML.gif , our method gives
https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_Eque_HTML.gif

where https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq32_HTML.gif and https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq33_HTML.gif . Since T2 is asymptotically equivalent to W = U T V-1U under H0, we use the more popular Hotelling's T2 statistic in this paper.

As a rank test alternative to the t-test, it is easy to show that the Wilcoxon rank sum test can be expressed as T2 with z gij the rank of the gene j expression level for subject i in the pooled data {z gij , 1 ≤ in g , g = 1, 2}. In this case, θ j = P(z1ijz2ij) - 1/2 and μ ij (θ j ) = θ j .

Linear Regression Case

Suppose that we want to relate the gene expression z ij for gene j with a continuous outcome y i through a linear regression
https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_Equf_HTML.gif
No association between y and the expression of gene j implies that θ j = 0. In this case, we use https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq34_HTML.gif , the least square estimator of the slope θ j ,
https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_Equg_HTML.gif
and
https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_Equh_HTML.gif

where https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq35_HTML.gif , https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq36_HTML.gif and https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq37_HTML.gif .

Cox Regression Case

For right-censored time to event data, the outcome data are pairs of the form y i = (t i , δ i ), where t i is the minimum of survival and censoring times, and δ i is the event indicator. Let λ i (t) denote the hazard function of patient i. Then the Cox proportional hazards model relates the expression of gene j, z ij , with the survival time of patient i using the model λ ij (t) = λ0j(t) exp(θ j z ij ), where λ0j(t) is an unknown baseline hazard function. We propose using the partial score statistic [29]U j = U j (0), where
https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_Equi_HTML.gif
and https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq38_HTML.gif , Y i (t) = I(t i t) and N i (t) = δ I I(t i t). Let https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq4_HTML.gif denote the partial MLE of θ j solving the partial score equation U j (θ) = 0. In this case, we have
https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_Equj_HTML.gif
where
https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_Equk_HTML.gif

The resulting variance estimator is equivalent to the robust estimator under the possible violation of the proportional hazards model proposed by Lin and Wei [30].

Results

Simulation Study

We investigate the performance of our proposed method with respect to FDR control through a simulation study. Let z ijk denote the expression level of gene j(= 1, ..., m k ) from subject i (= 1, ..., n1 + n2) in the group g(= 1, 2) for gene set https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq7_HTML.gif . We consider the following model:
https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_Equl_HTML.gif

where s i = 0 if subject i belongs to group 1 and s i = 1 otherwise, and ρ3 = 1 - ρ1 - ρ2. For gene j, δ j is the treatment effect, D is the number of prognostic genes, K1 is the number of prognostic gene sets, a k is the gene set effect, b is the array effect, ρ1 and ρ2 are the within gene sets and within arrays correlation coefficients resepctively, and ε ijk is the error term. The gene set effect a k , the array effect b, and the error term ε ijk are generated from independently and identically distributed N(0, 1) random variate.

At first, we investigate the performance of the test statistic using the MP inverse generalized inverse. We consider m = 1, 000 genes and n = 100 samples, each with non-overlapping K = 50 or 20 gene sets of m k = 20 or 50 genes, respectively, (ρ1, ρ2) = (0, 0), (0.2, 0.2) or (0.4, 0.4), D/m k = 0.2, 0.5 or 0.8, δ = 0.4 and K1 = 1 or 5. We conduct N = 1, 000 simulations under each setting, and approximate the null distribution of the test statistic using B = 10, 000 random permutations for each simulation. The q-values [31] are obtained from the resulting unadjusted permutation P-values by setting λ = 0.5. Results are presented in Table 1 where https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq39_HTML.gif denotes the empirical FDR and https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq40_HTML.gif denotes the mean number of true rejections, i.e. the mean number of prognostic gene sets that are discovered by testing. These results illustrate that the proposed method accurately controls the FDR at the desired level q*. The observed true rejection rate is high when the proportion of prognostic genes within each gene set is large (i.e., D/m k = 0.8).
Table 1

Empirical FDR and mean true rejections

    

q* = 0.01

q* = 0.05

q* = 0.1

q* = 0.2

( m k , K )

K 1

D / m

( ρ 1 , ρ 2 )

https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq41_HTML.gif

https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq42_HTML.gif

https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq41_HTML.gif

https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq42_HTML.gif

https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq41_HTML.gif

https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq42_HTML.gif

https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq41_HTML.gif

https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq42_HTML.gif

(50,20)

1

0.2

(0, 0)

0.010

0.10

0.063

0.25

0.105

0.35

0.189

0.46

   

(0.2, 0.2)

0.015

0.08

0.053

0.28

0.128

0.38

0.231

0.45

   

(0.4, 0.4)

0.013

0.12

0.045

0.25

0.087

0.36

0.215

0.47

  

0.5

(0, 0)

0.011

0.72

0.060

0.88

0.107

0.95

0.220

0.98

   

(0.2, 0.2)

0.015

0.71

0.051

0.89

0.122

0.94

0.212

0.97

   

(0.4, 0.4)

0.005

0.74

0.071

0.89

0.115

0.94

0.266

0.95

  

0.8

(0, 0)

0.013

0.97

0.057

1.00

0.106

1.00

0.215

1.00

   

(0.2, 0.2)

0.013

0.98

0.065

1.00

0.138

1.00

0.235

1.00

   

(0.4, 0.4)

0.018

0.97

0.078

1.00

0.124

1.00

0.253

1.00

 

5

0.2

(0, 0)

0.025

0.72

0.067

1.70

0.136

2.51

0.239

3.56

   

(0.2, 0.2)

0.009

0.68

0.056

1.79

0.127

2.63

0.235

3.59

   

(0.4, 0.4)

0.011

0.74

0.058

1.82

0.118

2.46

0.232

3.55

  

0.5

(0, 0)

0.007

4.44

0.058

4.94

0.124

4.97

0.227

5.00

   

(0.2, 0.2)

0.013

4.46

0.056

4.90

0.118

4.97

0.214

4.99

   

(0.4, 0.4)

0.012

4.44

0.074

4.87

0.137

4.96

0.239

5.00

  

0.8

(0, 0)

0.012

4.99

0.066

5.00

0.120

5.00

0.235

5.00

   

(0.2, 0.2)

0.011

5.00

0.056

5.00

0.105

5.00

0.200

5.00

   

(0.4, 0.4)

0.011

4.98

0.053

5.00

0.101

5.00

0.218

5.00

(20,50)

1

0.2

(0, 0)

0.008

0.04

0.055

0.13

0.092

0.21

0.185

0.25

   

(0.2, 0.2)

0.015

0.04

0.051

0.10

0.128

0.15

0.219

0.21

   

(0.4, 0.4)

0.005

0.06

0.050

0.10

0.110

0.14

0.198

0.22

  

0.5

(0, 0)

0.013

0.45

0.048

0.63

0.118

0.72

0.240

0.82

   

(0.2, 0.2)

0.010

0.43

0.077

0.64

0.122

0.72

0.223

0.80

   

(0.4, 0.4)

0.010

0.49

0.072

0.68

0.112

0.76

0.214

0.83

  

0.8

(0, 0)

0.010

0.86

0.043

0.97

0.113

0.99

0.222

0.99

   

(0.2, 0.2)

0.015

0.89

0.048

0.98

0.115

0.98

0.208

0.98

   

(0.4, 0.4)

0.013

0.86

0.052

0.96

0.102

0.99

0.201

1.00

 

5

0.2

(0, 0)

0.013

0.31

0.054

0.76

0.121

1.08

0.210

1.64

   

(0.2, 0.2)

0.010

0.28

0.039

0.57

0.102

0.89

0.224

1.56

   

(0.4, 0.4)

0.015

0.18

0.062

0.57

0.103

0.94

0.195

1.59

  

0.5

(0, 0)

0.011

3.03

0.055

4.03

0.107

4.43

0.201

4.75

   

(0.2, 0.2)

0.008

3.01

0.054

4.11

0.104

4.44

0.218

4.77

   

(0.4, 0.4)

0.016

3.22

0.058

4.16

0.103

4.50

0.203

4.72

  

0.8

(0, 0)

0.010

4.74

0.054

4.91

0.112

4.95

0.224

4.99

   

(0.2, 0.2)

0.011

4.76

0.054

4.94

0.111

4.97

0.212

4.98

   

(0.4, 0.4)

0.012

4.73

0.054

4.93

0.110

4.96

0.201

4.99

We proceed by investigating the case with small n but large m k . We set the sample size n = 20, and consider K = 20 and m k = 50. All other parameters are identical to those used in the simulation study reported in Table 1. We conduct N = 500 simulations and apply the test using both MP and LW generalized inverses. The results reported in Table 2 show that both tests control the FDR at the desired level q*. Similar to the results presented in Table 1, for both tests the observed true-rejection rate ( https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq40_HTML.gif ) increases in the proportion of prognostic genes within each gene set (D/m k ). However, the test with the LW inverse has generally higher true-rejection rate than that with the MP generalized inverse.
Table 2

Empirical FDR and mean true rejections on simulation data with small n large p values.

    

q* = 0.01

q* = 0.05

q* = 0.1

q* = 0.2

K 1

D / m k

( ρ 1 , ρ 2 )

method

https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq41_HTML.gif

https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq42_HTML.gif

https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq41_HTML.gif

https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq42_HTML.gif

https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq41_HTML.gif

https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq42_HTML.gif

https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq41_HTML.gif

https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_IEq42_HTML.gif

1

0.2

(0, 0)

MP

0.0160

0.004

0.0633

0.018

0.1102

0.036

0.2328

0.058

   

LW

0.0060

0.018

0.523

0.040

0.1022

0.060

0.2294

0.098

  

(0.2 0.2)

MP

0.0100

0.006

0.0593

0.010

0.1145

0.022

0.2325

0.042

   

LW

0.0180

0.006

0.0563

0.020

0.1058

0.036

0.2249

0.084

  

(0.4 0.4)

MP

0.0180

0.008

0.0590

0.020

0.1133

0.032

0.2182

0.054

   

LW

0.0100

0.010

0.0503

0.032

0.1087

0.048

0.2387

0.094

 

0.5

(0, 0)

MP

0.0120

0.024

0.0650

0.068

0.1138

0.100

0.2016

0.162

   

LW

0.0210

0.078

0.0652

0.160

0.1189

0.230

0.2303

0.330

  

(0.2 0.2)

MP

0.0300

0.024

0.0720

0.072

0.1269

0.106

0.2608

0.170

   

LW

0.0200

0.084

0.0823

0.154

0.1467

0.222

0.2658

0.316

  

(0.4 0.4)

MP

0.0290

0.038

0.0920

0.072

0.1359

0.112

0.2161

0.184

   

LW

0.0150

0.070

0.0747

0.168

0.1300

0.228

0.2444

0.338

 

0.8

(0, 0)

MP

0.0150

0.084

0.0667

0.136

0.1261

0.178

0.2534

0.258

   

LW

0.0070

0.240

0.0520

0.404

0.1199

0.508

0.2179

0.618

  

(0.2 0.2)

MP

0.0260

0.068

0.0563

0.142

0.1147

0.198

0.2341

0.276

   

LW

0.0220

0.234

0.0603

0.420

0.1054

0.516

0.2095

0.622

  

(0.4 0.4)

MP

0.0153

0.068

0.0738

0.138

0.1299

0.176

0.2548

0.258

   

LW

0.0270

0.274

0.0607

0.434

0.1129

0.512

0.2118

0.636

5

0.2

(0, 0)

MP

0.0080

0.022

0.0503

0.094

0.0924

0.194

0.1817

0.386

   

LW

0.0120

0.066

0.0443

0.188

0.0933

0.344

0.1883

0.726

  

(0.2 0.2)

MP

0.0180

0.024

0.0683

0.084

0.1030

0.138

0.2140

0.364

   

LW

0.0170

0.054

0.0642

0.144

0.1028

0.288

0.2125

0.646

  

(0.4 0.4)

MP

0.0060

0.040

0.0350

0.078

0.0848

0.158

0.1969

0.304

   

LW

0.0150

0.072

0.0446

0.192

0.0762

0.328

0.1761

0.668

 

0.5

(0, 0)

MP

0.0143

0.158

0.0479

0.402

0.0990

0.608

0.2144

1.062

   

LW

0.0140

0.442

0.0626

1.148

0.1221

1.734

0.2131

2.586

  

(0.2 0.2)

MP

0.0157

0.148

0.0586

0.418

0.0974

0.662

0.2090

1.128

   

LW

0.0127

0.452

0.0646

1.162

0.1076

1.796

0.2165

2.644

  

(0.4 0.4)

MP

0.0160

0.150

0.0602

0.418

0.0978

0.678

0.2062

1.142

   

LW

0.0108

0.498

0.0572

1.154

0.1134

1.680

0.2226

2.548

 

0.8

(0, 0)

MP

0.0193

0.468

0.0662

1.006

0.1194

1.452

0.2320

2.054

   

LW

0.0216

1.660

0.0609

3.044

0.1220

3.702

0.2385

4.286

  

(0.2 0.2)

MP

0.0127

0.478

0.0567

1.036

0.1130

1.440

0.2257

2.156

   

LW

0.0145

1.654

0.0506

2.970

0.1157

3.670

0.2269

4.278

  

(0.4 0.4)

MP

0.0200

0.510

0.0587

1.020

0.1080

1.468

0.2163

2.164

   

LW

0.0201

1.772

0.0729

3.066

0.1303

3.662

0.2350

4.234

Here, n = 20, m k = 50, and K = 20.

We compare the performance of our method to GSEA and GSA within the simulation framework described above. We choose, for GSEA, the weighted Kolmogorov Smirnov-like statistic as enrichment correlation-based weighting, while for GSA we choose the maxmean statistic along with restandardization. The technical details are provided in [6] and in [7] respectively.

We generate m = 1, 000 genes and n = 100 samples, each with non-overlapping K = 50 gene sets of m k = 20 genes, (ρ1, ρ2) = (0, 0), D/m k = 1, and δ = 0.4 as in [7]. The first (n1 = 50) and second (n2 = 50) samples will constitute the control and treatment groups respectively. Next, we will discuss two scenarios similar to those considered by [7]:

  • One-sided shifts: The mean expression level for the m k = 20 genes in each of the K1 prognostic gene sets is δ = 0.4 units higher in the treatment group.

  • Two-sided shifts: The mean expression level for the first 10 genes in each of the K1 prognostic gene sets is δ = 0.4 units higher, while the mean expression level for the next 10 genes is δ = 0.4 units lower.

Each scenario is simulated 100 times using 1000 permutation replicates. The P-values for the first gene set is shown against the number of prognostic gene sets in Figure 1. Overall, our method gives lower mean P-values under both scenarios. In the one-sided shift case, the three methods are comparable when the number of prognostic gene sets is at most thirteen. For the cases with a large number of prognostic gene sets or a two-sided shift, our method is consistently better.
https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_Fig1_HTML.jpg
Figure 1

Mean P -value against the number of prognostic gene sets.

Case Studies

Two-Sample Case

We analyze two microarray data sets available from the GSEA website http://www.broad.mit.edu/gsea. The first data set, called the Gender data set, consists of profiles of m = 15, 056 genes from male (n1 = 15) and female (n2 = 17) lymphoblastoid cell lines. The second data set consists of transcriptional pro-les of m = 10, 100 genes from p53 positive (n1 = 17) and p53 mutant (n2 = 33) cancer cell lines. The pathways from MSigDB are currently organized into five catalogs. We use the Positional gene sets (C1, 319 gene sets), which correspond to each human chromosome and each cytogentic band, for the Gender data set and the Curated gene sets (C2, 522 gene sets), which are derived from online pathway databases and publications, for the p53 data set. For the analyses, we limit our attention to gene sets which consist of a minimum of 15 and a maximum of 500 genes. Each analysis is based on 10,000 permutation replicates. The performance of our method is compared to those of GSEA and GSA. We compare the number of prognostic gene sets identified by each method. The analysis results for the two data sets are shown in Figure 2. For both data sets, our method consistently identifies more prognostic gene sets than GSEA and GSA for any q-value threshold.
https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_Fig2_HTML.jpg
Figure 2

The number of prognostic gene sets, at a given q -value threshold, identified by all three methods are shown for the Gender and p53 data sets.

Cox Regression Case

We carry out gene set analysis of the lung cancer microarray data set [32] using the KEGG pathway (175 gene sets) provided by the hu6800.db Bioconductor package. The data set consists of gene expressions of m = 4, 966 genes from n = 86 stage I or III lung cancer patients. As in the analyses for the previous data sets, we include gene sets consisting of 15 to 500 genes each in the analysis, and use 10,000 permutations to derive the null distribution of the test statistics. For this analysis, we will compare our method to GSA only since the R-GSEA extension package does not provide the functionality for analyzing right censored data. The results are shown in Figure 3 suggest that our method generally identifies a larger number of prognostic gene sets compared to GSA.
https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-209/MediaObjects/12859_2010_Article_4618_Fig3_HTML.jpg
Figure 3

The number of prognostic gene sets, at a given q -value threshold, identified by our and the GSA method are shown for the Beer Lung Cancer data set.

Discussion

For the Gender data set, at the FDR level of q* = 0.2, our method identifies 8 gene sets compared to only 4 for the other two methods [see Additional file 1]. There are 4 prognostic gene sets identified in common among the three methods, consisting of gene sets found on ChrY, ChrYp11, ChrYq11, and ChrXp22. Our method identifies 4 other gene sets not identified by the other two methods, which include gene sets for ChrX, ChrXp11, Chr3q25, and Chr6q25. Genes expressed on the Y chromosome are expected to be differentially expressed between genders, while gene expression from the X chromosome is more similar between genders due to X chromosome inactivation in females [33, 34]. However, ChrXp22 and ChrXp11 gene sets have been previously been shown to be overrepresented in females likely caused by escape of X inactivation [35]. Furthermore, several genes within the Chr3q25 and Chr6q25 gene sets have also been shown to be differentially expressed between males and females, including ACAT2 [36], MAP3K4 [37], NOX, PTX3 [38], SGEF, and SOD2 [39]. Thus, our method for identifying overrepresented genes in gene set lists can provide biologically relevant and important information that may be overlooked by other common methods such as GSA and GSEA.

For the p53 data set, at the same FDR level, our method identifies 87 prognostic gene sets while GSA and GSEA identify 5 and 9 prognostic gene sets, respectively [see Additional file 1]. There are 5 prognostic gene sets common among the three methods, including the p53 pathway, hsp27 pathway, radiation sensitivity pathway, ceramide pathway, and the ras pathway. However, our method identifies 78 gene sets not identified by the other two methods. Additional file 1 also provides a list of gene sets that are identified only by our method. p53 is a tumor suppressor protein that is activated in response to DNA damage. p53 can induce growth arrest by halting the cell cycle at the G1/S phase transition to allow DNA repair or it can induce apoptosis if the DNA damage cannot be repaired. p53 acts as a transcription factor regulating the expression of many genes involved in its functions [40]. Thus many of the gene sets identified by our method can be directly linked to p53 functions, such as cell cycle arrest, ATM pathway, tumor suppressor, bcl2 family and network, death pathway, etc [40]. Additionally, several cytokine and growth factor signaling pathways are represented in our list of gene sets differentially expressed between p53 positive and mutant cell lines, including the IL-4 [41], EGF [42], NGF [43], CXCR4 [44], IL-7 [45], and PDGF [46] pathways, which have all shown roles for p53 in their regulation and signaling. The method that we describe here for identifying prognostic gene sets can provide a more inclusive list of gene sets that provide further insight into the biology of two sample case studies from microarray experiments.

Conclusion

In this paper, we have presented a multiple testing procedure to identify prognostic gene sets from a microarray experiment correlated with common types of binary, continuous and time to event clinical outcomes. We calculate the marginal P-values using a permutation method accounting for dependency among the genes within and across each gene set, and account for multiple testing by controlling the FDR. Our simulations show that our proposed method controls the FDR at the desired level. Through extensive simulations and real case studies, we observe that our method performs better than GSEA and GSA, especially when the number of prognostic gene sets is large.

Declarations

Acknowledgements

Partial support for this research was provided by a grant from the National Cancer Institute (CA142538).

Authors’ Affiliations

(1)
Biostatistics and Bioinformatics Center, Samsung Cancer Research Institute, Samsung Medical Center
(2)
Department of Biostatistics and Bioinformatics, Duke University Medical Center
(3)
Department of Statistics, Seoul National University
(4)
Department of Medicine, Division of Medical Oncology, Duke University

References

  1. Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res 1999, 27: 29–34. 10.1093/nar/27.1.29PubMed CentralView ArticlePubMedGoogle Scholar
  2. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene Ontology: tool for the unification of biology. Nat Genet 2000, 25: 25–29. 10.1038/75556PubMed CentralView ArticlePubMedGoogle Scholar
  3. Ackermann M, Strimmer K: A general modular framework for gene set enrichment analysis. BMC Bioinformatics 2009, 88: 365–411.Google Scholar
  4. Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J, Puigserver P, Carlsson E, Ridderstrale M, Laurila E, Houstis N, Daly MJ, Patterson N, Mesirov JP, Golub TR, Tamayo P, Spiegelman B, Lander ES, Hirschhorn JN, Altshuler D, Groop LC: PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 2003, 34: 267–273. 10.1038/ng1180View ArticlePubMedGoogle Scholar
  5. Goeman JJ, van de Geer SA, de Kort F, van Houwelingen HC: A global test for groups of genes: testing association with a clinical outcome. Bioinformatics 2004, 20: 93–99. 10.1093/bioinformatics/btg382View ArticlePubMedGoogle Scholar
  6. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP: Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 2005, 102: 15545–15550. 10.1073/pnas.0506580102PubMed CentralView ArticlePubMedGoogle Scholar
  7. Efron B, Tibshirani R: On testing the significance of sets of genes. Annals of Applied Statistics 2007, 1: 107–129. 10.1214/07-AOAS101View ArticleGoogle Scholar
  8. Mansmann U, Meister R: Testing differential gene expression in functional groups. Goeman's global test versus an ANCOVA approach. Methods of Inf Med 2005, 44: 449–453.Google Scholar
  9. Barry WT, Nobel AB, Wright F: Significance analysis of functional categories in gene expression studies: a structured permutation approach. Bioinformatics 2005, 21: 1943–1949. 10.1093/bioinformatics/bti260View ArticlePubMedGoogle Scholar
  10. Kong SW, Pu WT, Park PJ: A multivariate approach for integrating genome-wide expression data and biological knowledge. Bioinformatics 2006, 22: 2373–2380. 10.1093/bioinformatics/btl401PubMed CentralView ArticlePubMedGoogle Scholar
  11. Nettleton D, Recknor J, Reecy JM: Identification of differentially expressed gene categories in microarray studies using non-parametric multivariate analysis. Bioinformatics 2008, 24: 192–201. 10.1093/bioinformatics/btm583View ArticlePubMedGoogle Scholar
  12. Barry WT, Nobel AB, Wright F: A statistical framework for testing functional categories in microarray data. Annals of Applied Statistics 2008, 2: 286–315.View ArticleGoogle Scholar
  13. Tsai C-A, Chen JJ: Multivariate analysis of variance test for gene set analysis. Bioinformatics 2009, 25: 897–903. 10.1093/bioinformatics/btp098View ArticlePubMedGoogle Scholar
  14. Benjamini Y, Hochber Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B 1995, 57: 289–300.Google Scholar
  15. Development Core Team: R: A Language and Environment for Statistical Computing. 2009. ISBN 3-900051-07-0Google Scholar
  16. Wu H, Yang H, Sheppard K, Churchill G, Kerr K, Cui X: maanova: Tools for analyzing Micro Array experiments. R package version 1.20.0 2010.Google Scholar
  17. Schaefer J, Opgen-Rhein R, Strimmer K: corpcor: Efficient Estimation of Covariance and (Partial) Correlation. R package version 1.5.7 2010.Google Scholar
  18. Efron B, Tibshirani R: GSA: Gene set analysis. R package version 1.03 2010.Google Scholar
  19. Dabney A, Storey JD, Warnes GR: qvalue: Q-value estimation for false discovery rate control. R package version 1.24.0 2010.Google Scholar
  20. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, Zhang J: Bioconductor: Open software development for computational biology and bioinformatics. Genome Biology 2004, 5: R80. 10.1186/gb-2004-5-10-r80PubMed CentralView ArticlePubMedGoogle Scholar
  21. Carlson M, Falcon S, Pages H, Li N: hu6800.db: Affymetrix HuGeneFL Genome Array annotation data (chip hu6800). R package version 2.4.5 2010.Google Scholar
  22. Cox DR, Hinkley DV: Theoretical Statistics. Chapman and Hall: London; 1974.View ArticleGoogle Scholar
  23. Warton DI: Penalized normal likelihood and ridge regularization of correlation and covariance matrices. Journal of the American Statistical Association 2009, 103: 340–349.View ArticleGoogle Scholar
  24. Schafer J, Strimmer K: A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical Applications in Genetics and Molecular Biology 2005, 4: 32.View ArticleGoogle Scholar
  25. Ledoit O, Wolf M: A Well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis 2004, 88: 365–411. 10.1016/S0047-259X(03)00096-4View ArticleGoogle Scholar
  26. Box GEP: Some theorems on quadratic forms applied in the study of analysis of variance problems, I. Effect of inequality of variance in the one-way classification. The Annals of Mathematical Statistics 1995, 25: 290–302.View ArticleGoogle Scholar
  27. Brunner E: Asymptotic and approximate analysis of repeated measuresd esigns under heteroscedasticity. In mathematical statistics with applications in biometrys. Edited by: Kunert J, Trenkler G. Josef Eul Verlag, Lohmar; 2001.Google Scholar
  28. Chen SX, Qin YL: A two sample test for high dimensional data with application to gene-set testing. The Annals of Statistics 2010, 38: 808–835. 10.1214/09-AOS716View ArticleGoogle Scholar
  29. Cox DR: Regression models and life-tables. Journal of the Royal Statistical Society. Series B 1972, 34: 187–220.Google Scholar
  30. Lin DY, Wei LJ: The robust inference for the Cox proportinal havards model. Journal of the American Statistical Association 1989, 84: 1074–1078. 10.2307/2290085View ArticleGoogle Scholar
  31. Storey JD: A direct approach to false discovery rates. Journal of the Royal Statistical Society. Series B 2002, 64: 479–498. 10.1111/1467-9868.00346View ArticleGoogle Scholar
  32. Beer DG, Kardia SL, Huang CC, Giordano TJ, Levin AM, Misek DE, Lin L, Chen G, Gharib TG, Thomas DG, Lizyness ML, Kuick R, Hayasaka S, Taylor JM, Iannettoni MD, Orringer MB, Hanash S: Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med 2002, 25: 25–29.Google Scholar
  33. Barakata TS, Jonkers I, Monkhorst K, Gribnau J: X-changing information on X inactivation. Exp Cell Res 2010, 316: 679–687. 10.1016/j.yexcr.2010.01.015View ArticleGoogle Scholar
  34. Prothero KE, Stahl JM, Carrel L: Dosage compensation and gene expression on the mammalian X chromosome: one plus one does not always equal two. Chromosome Res 2009, 17: 637–648. 10.1007/s10577-009-9063-9View ArticlePubMedGoogle Scholar
  35. Zhang W, Huang RS, Duan S, Dolan ME: Gene set enrichment analyses revealed differences in gene expression patterns between males and females. In Silico Biol 2009, 9: 55–63.PubMed CentralPubMedGoogle Scholar
  36. Parini P, Jiang ZY, Einarsson C, Eggertsen G, Zhang SD, Rudel LL, Han TQ, Eriksson M: ACAT2 and human hepatic cholesterol metabolism: identification of important gender-related differences in nor-molipidemic, non-obese Chinese patients. Atherosclerosis 2009, 207: 266–271. 10.1016/j.atherosclerosis.2009.04.010PubMed CentralView ArticlePubMedGoogle Scholar
  37. Bogani D, Siggers P, Brixey R, Warr N, Beddow S, Edwards J, Williams D, Wilhelm D, Koopman P, Flavell RA, Chi H, Ostrer H, Wells S, Cheeseman M, Greenfield A: Loss of mitogen-activated protein kinase kinase kinase 4 (MAP3K4) reveals a requirement for MAPK signalling in mouse sex determination. PLoS Biol 2009, 7: e1000196. 10.1371/journal.pbio.1000196PubMed CentralView ArticlePubMedGoogle Scholar
  38. Yamasaki K, Kurimura M, Kasai T, Sagara M, Kodama T, Inoue K: Determination of physiological plasma pentraxin 3 (PTX3) levels in healthy populations. Clin Chem Lab Med 2009, 47: 471–477. 10.1515/CCLM.2009.110View ArticlePubMedGoogle Scholar
  39. Khymenets O, Covas MI, Farre M, Langohr K, Fito M, de la Torre R: Role of sex and time of blood sampling in SOD1 and SOD2 expression variability. Clin Biochem 2008, 41: 1348–1354. 10.1016/j.clinbiochem.2008.08.064View ArticlePubMedGoogle Scholar
  40. Tomasini R, Mark TW, Melino G: The impact of p53 and p73 on aneuploidy and cancer. Trends Cell Biol 2008, 18: 244–252. 10.1016/j.tcb.2008.03.003View ArticlePubMedGoogle Scholar
  41. Pesch J, Brehm U, Staib C, Grummt F: Repression of interleukin-2 and interleukin- 4 promoters by tumor suppressor protein p53. J Interferon Cytokine Res 1996, 16: 595–600. 10.1089/jir.1996.16.595View ArticlePubMedGoogle Scholar
  42. Sheikh MS, Carrier F, Johnson AC, Ogdon SE, Fornace AJ Jr: Identification of an additional p53-responsive site in the human epidermal growth factor receptor gene promotor. Oncogene 1997, 15: 1095–1101. 10.1038/sj.onc.1201264View ArticlePubMedGoogle Scholar
  43. Brynczka C, Labhart P, Merrick BA: NGF-mediated transcriptional targets of p53 in PC12 neuronal differentiation. BMC Genomics 2007, 8: 139. 10.1186/1471-2164-8-139PubMed CentralView ArticlePubMedGoogle Scholar
  44. Mehta SA, Christopherson KW, Bhat-Nakshatri P, Goulet RJ Jr, Broxmeyer HE, Kopelovich L, Nakshatri H: Negative regulation of chemokine receptor CXCR4 by tumor suppressor p53 in breast cancer cells: implications of p53 mutation or isoform expression on breast cancer cell invasion. Oncogene 2007, 26: 3329–3337. 10.1038/sj.onc.1210120View ArticlePubMedGoogle Scholar
  45. Costello PS, Cleverley SC, Galandrini R, Henning SW, Cantrell DA: The GTPase rho controls a p53-dependent survival check-point during thymopoiesis. J Exp Med 2000, 192: 77–85. 10.1084/jem.192.1.77PubMed CentralView ArticlePubMedGoogle Scholar
  46. Yang W, Wetterskog D, Matsumoto Y, Funa K: Kinetics of repression by modified p53 on the PDGF beta-receptor promoter. Int J Cancer 2008, 123: 2020–2030. 10.1002/ijc.23735View ArticlePubMedGoogle Scholar

Copyright

© Sohn et al; licensee BioMed Central Ltd. 2011

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.