Molecular biology, which is targeted at studying biological systems at a molecular level, has provided rich information of individual cellular components and their contributions to biological functions over the last 50 years. Our understanding of genes and their functions has been accelerated in the last decade by microarray experiments, which identify genes that are induced or repressed in a specific biomedical condition [1–3]. The multiplicity and heterogeneity of these gene expression profiles revealed that even a simple biological process or a molecular function in a cell requires co-operations of hundreds or even thousands of genes. Nonetheless, decoding this kind of gene interaction and networking in a biological process is hampered by the complexity of biological systems.

Instead of looking at individual genes, researchers started to interpret biological phenomena in terms of groups of genes, or gene sets. For example, Segal *et al*. (2004) mined a large number of cancer expression profiles and deduced 456 cancer-related modules (gene sets) which are selected by combining with the knowledge of transcriptional pathways and gene ontology [4]. The development of new statistical tools enables us to test whether a gene set is activated in the microarray dataset of interest. An important contribution is made by Subramanian *et al*. (2005) who proposed the gene set enrichment analysis (GSEA) to assess the significance of a set of genes. Their idea is that the genes that cooperate in a biological function have similar patterns in transcriptional levels such that the statistical power of assessing a gene set is higher than that of individual genes [5].

GSEA relies on permutation tests to identify the significant gene sets that have distinct gene expression between treatment groups. It works in three steps. First, all genes are ranked according to their statistics for the treatment effect. For example, a t-statistic can be used to compare two classes of samples. A score is assigned to each gene set using a weighted Kolmogorov-Smirnov-like statistic that sums up the ranks of the genes. Secondly, the class labels of the samples are permuted for a number of times, and gene set scores are calculated for each new label assignment. The permutation of sample labels preserves the inherent correlation between genes. Because the permutation is conducted under the null hypothesis of no treatment differences, the P value of each observed score can be determined empirically by the null score distribution. Thirdly, if more than one gene set is tested, the P values should be adjusted for multiple tests. GSEA is often applied for hundreds of gene sets, for which the false discovery rate (FDR) is recommended.

Ever since GSEA was introduced, it has drawn a wide attention from the biomedical and biostatistical communities. A number of alternative and extended versions of gene set analysis method (GSA) have been proposed in the last few years that use a variety of score systems and randomization procedures to resample data [6, 7]. For instance, Efron et al. proposed a GSA method, which is based on a more powerful statistic *maxmean* to score gene sets [8]. In the case of two sample classes, *maxmean* is the maximum absolute value between the average of the positive t-statistics, and also the average of the negative t-statistics. Before permutation test, the *maxmean* score should be restandardized by centering and scaling its mean and standard deviation using randomized gene sets.

Despite their enormous success, all these aforementioned GSA methods have limited applications in microarray samples with dependence. A permutation test has to rely on the assumption of sample independence. This assumption presents a barrier to extend GSA to the fast-growing area of longitudinal microarray experiments, which repeatedly profiles the gene expression of a same object over time. Longitudinal microarray experiments allow researchers to investigate dynamic behavior of biological processes, such as cell cycles, cell proliferation, oncogenosis, and apoptosis. The temporal component is an inherent part of the study. Such time course experiments pose novel challenges for statistical analyses because effective methods have to take into account both a large number of genes and within-gene correlations. Most of the analyses in literature carry out repeated measures analysis of individual genes followed by FDR control [9–12].

It is desirable to apply repeated measures analysis methods, such as a linear mixed effects model (LME) or generalized estimating equations (GEE), to gene sets. Tsai and Qu (2008) assessed subsets of genes by applying a non-parametric time-varying coefficient model [13]. The within-gene correlation was taken into account by the quadratic inference function (QIF) that is derived from GEE. Both LME and GEE achieve their asymptotic distributions when the number of replications goes to ∞. However, the large sample size assumption is usually not applicable due to the high cost of microarray experiments. Rather, there is often a relatively large number of genes in a gene set compared to the sample size, a curse of dimensionality problem. An effective GSA method should also be robust against deviation from the normal distribution because gene expression data may be largely skewed, and the normal or log-normal distribution does not provide a close fit to the data [14, 15]. Furthermore, to allow variability between genes, heteroscedastic correlation structures should be assumed for different genes.

In this paper we propose a GSA method for assessing the expression patterns of gene sets from longitudinal microarray data. The method employs a couple of novel nonparametric statistics that work for small sample size as long as we maintain a relative large number of genes in a set (large p, small n). The method is robust with respect to non-normality and heteroscedastic correlation structures. To ensure extensive application, unbalanced designs are allowed in our model. For example, unbalanced data may occur when the data are pooled from different versions or manufacturers of arrays.

The genes in a signal transduction pathway are often highly correlated in that the expression of one gene is regulated by the other gene in this pathway. To ensure an unbiased analysis, we need to take into account the correlation among genes. Permutation method has been widely used in GSA to provide a robust test that preserves between-genes correlations. For example, Tsai and Chen (2009) used permutation test with the Wilks' Λ statistic for their multivariate analysis of GSA [16]. To take into account the correlations among genes within a gene set, we also present a permutation-based test for our proposed statistics.

The outline of this paper is as follows. Our main results are presented in section Results and Discussion. In subsection Model and Hypotheses, we describe the model and assumptions. In the subsection of Simulation study, we present the simulation results of type I error estimates and power analysis for our proposed methods. In subsection Results on real data, we describe an application of our method to a recent longitudinal microarray study in which the gene expression profiles of murine T cells in the presence or absence of interleukin-2 (IL-2) were repeatedly collected. A number of functional gene sets were tested to investigate IL-2 signaling over time. The test statistics and their asymptotic results for a large number of genes but small replications are provided in subsection Test statistics of section Methods. Subsection Permutation tests described the permutation-based test with our proposed nonparametric statistics. Finally, we provide mathematical proof for the asymptotic results of our test statistics in Appendix.