Time course microarray and RNA-seq experiments are increasingly used to study biological phenomena that evolve in a temporal fashion. Unlike the static experiment which captures only a snapshot of the gene expression, the time course experiment monitors the gene expression levels over several time points in a biological process, allowing investigators to study dynamic behaviors of the genes. One goal of such experiments is to identify genes associated with a biological process of interest or a particular stimulus, like a therapeutic treatment or virus infection. The differentially expressed genes can be defined as genes with expressions changed significantly with respect to time or across multiple conditions.
The time course gene expression data typically exhibit features such as high dimensionality, short time course, few or no replicates, missing values, large measurement errors, correlations between observations over time, etc. Many of the multivariate techniques for analyzing such data, for example, SAM [1, 2], ANOVA [3, 4] and empirical Bayes , suffer from serious limitations when facing missing data or non-uniformly sampled data. They also fail to account for correlations between measurements from the same gene and do not facilitate the removal of noise from the measured data. In addition, the timing information of when the measurements are taken is not utilized and the inherent temporal structure of the time course data is ignored. A hidden Markov model has been proposed by  and , where the observed gene profiles are considered to be influenced by an underlying Markov process. The computation of this model involves a large number of parameters, which can be difficult if there are no replicated data, and it can not be applied if the observation time points are distinct for different experimental groups.
Functional data analysis approaches view the expression profile of each gene as a smooth function of time, and the time course measurements are collected as discrete observations from the function that are contaminated by noisy signals [8, 9]. A key step is to create an estimate of the gene expression curve from the noisy functional data and this usually involves representing the expression curve as a linear combination of a finite number of basis functions, such as polynomials  and B-splines [11–13]. Another popular approach for representing functional curves is the Functional Principal Component Analysis (FPCA) [14, 15]. In the FPCA model, the basis functions are estimated from the observed data and the data-adaptive basis has the favorable property to flexibly characterize the major modes of variation in the data. So fewer number of basis functions are needed to capture the shape of gene expression patterns than that using the pre-specified basis.
Many of the existing functional methods are designed for time course data with longitudinal replicates.  used a functional hierarchical model and empirical Bayes techniques to determine differentially expressed genes, but the estimates of the model parameters can be very unstable if the number of replicates is small.  proposed a functional ANOVA model and  adopted the FPCA model for identifying differentially expressed genes across two conditions. In both methods, the model is fitted for one gene at a time to estimate the gene-specific group mean and the covariance structure, which is computationally intensive and may result in overfitted models for a small sample size.  adopted a similar approach as  to impose a mixture distribution on the gene-specific variation and employed an indicator to reflect whether a gene is differentially expressed. All of these methods require longitudinal replicates in data and only apply to two group comparisons.
Time course data with longitudinal replicates are costly and rather rare in reality. Many of the published time course data have no replicates or only a small number of independent replicates [17–19]. The EDGE method proposed by  is a comprehensive approach that is suitable for data with or without replicate, and for both single group and multiple group tests. It represented gene expression trajectories using natural cubic splines and then compared the goodness-of-fit of the model under the null hypothesis to that under the alternative hypothesis. The null distribution of test statistics was approximated by bootstrap.  recently extended this method under a permutation-based multiple testing framework. Specifically for time course data without replicate,  developed a statistics to measure the signal-to-noise ratio by comparing the energies of the smoothing convolution and differential convolution of the expression profiles. A common problem to these existing methods is that the test statistics are constructed for each gene separately. So they may not be powerful enough to identify differentially expressed genes for short time series data and data without replicates.
In this work, we propose a unified approach to model the gene profiles using the techniques of FPCA, and to identify differentially expressed genes in both single group test and multiple group test. Our methodology is motivated by the gene expression data without replicate, so we will focus on this case in this paper, although our method can also be easily adapted to accommodate data with replicates. We argue that our method can improve the power in identifying differentially expressed genes compared to existing methods. First, using the eigen-basis enables a parsimonious modeling of the gene expression curves, so we have more degrees of freedom for the inference than that using a pre-specified basis. Moreover, we propose to estimate the expression curve of a gene by borrowing strength across all the genes, which leads to a more powerful inference than that using the information of one gene only.
The remainder of the paper is organized as follows. We first describe the FPCA model for representing the gene expression curves and then elaborate a hypothesis testing method based on random permutations to identify differentially expressed genes in both single group and multiple group scenarios. The proposed method is compared with several existing methods via the analysis of the Saccharomyces cerevisiae cell cycle data from  and simulation studies. Lastly, we summarize the proposed method and discuss possible extensions.