Quadratic regression analysis for gene discovery and pattern recognition for noncyclic short timecourse microarray experiments
 Hua Liu^{1}Email author,
 Sergey Tarima^{1},
 Aaron S Borders^{2},
 Thomas V Getchell^{2, 4},
 Marilyn L Getchell^{3, 4} and
 Arnold J Stromberg^{1}
DOI: 10.1186/147121056106
© Liu et al; licensee BioMed Central Ltd. 2005
Received: 25 August 2004
Accepted: 25 April 2005
Published: 25 April 2005
Abstract
Background
Cluster analyses are used to analyze microarray timecourse data for gene discovery and pattern recognition. However, in general, these methods do not take advantage of the fact that time is a continuous variable, and existing clustering methods often group biologically unrelated genes together.
Results
We propose a quadratic regression method for identification of differentially expressed genes and classification of genes based on their temporal expression profiles for noncyclic short timecourse microarray data. This method treats time as a continuous variable, therefore preserves actual time information. We applied this method to a microarray timecourse study of gene expression at short time intervals following deafferentation of olfactory receptor neurons. Nine regression patterns have been identified and shown to fit gene expression profiles better than kmeans clusters. EASE analysis identified overrepresented functional groups in each regression pattern and each kmeans cluster, which further demonstrated that the regression method provided more biologically meaningful classifications of gene expression profiles than the kmeans clustering method. Comparison with Peddada et al.'s orderrestricted inference method showed that our method provides a different perspective on the temporal gene profiles. Reliability study indicates that regression patterns have the highest reliabilities.
Conclusion
Our results demonstrate that the proposed quadratic regression method improves gene discovery and pattern recognition for noncyclic short timecourse microarray data. With a freely accessible Excel macro, investigators can readily apply this method to their microarray data.
Background
Microarray timecourse experiments allow researchers to explore the temporal expression profiles for thousands of genes simultaneously. The premise for pattern analysis is that genes sharing similar expression profiles might be functionally related or coregulated [1]. Due to the large number of genes involved and the complexity of gene regulatory networks, clustering analyses are popular for analyzing microarray timecourse data. Heuristicbased cluster analyses group genes based on distance measures; the most commonly used methods include hierarchical clustering [2], kmeans clustering [3], selforganizing maps [4], and support vector machines [5]. Due to the lack of statistical properties of these heuristicbased clustering methods, statistical models, especially analysis of variance (ANOVA) models and mixed models are often implemented as a precursor to clustering to ensure the genes used for clustering are statistically meaningful [6, 7]. Only genes identified to be significantly regulated by statistical models are used for further clustering. Fitting statistical models prior to clustering usually dramatically reduces the number of genes used for clustering, which in general will improve the performance of the clustering method. An alternative way of clustering is statistical modelbased clustering methods, which assume that the data is from a mixture of probability distributions such as multivariate normal distributions and describe each cluster using a probabilistic model [8, 9].
In microarray timecourse studies, time dependency of gene expression levels is usually of primary interest. Since time can affect the gene expression levels, it is important to preserve time information in timecourse data analysis. However, most methods for analyzing microarray timecourse data treat time as a nominal variable rather than a continuous variable, and thus ignore the actual times at which these points were sampled. Peddada et al. (2003) proposed a method for gene selection and clustering using orderrestricted inference, which preserves the ordering of time but treats time as nominal [1]. Recently, a number of algorithms treating time as a continuous variable have been introduced. Xu et al. (2002) applied a piecewise regression model to identify differentially expressed genes [10]. Both Luan and Li (2003) and BarJoseph et al (2003) proposed Bsplines based approaches [11, 12], which are appropriate for microarray data with relatively long timecourse, but their application to short timecourse data is questionable. New methods for analyzing short timecourse microarray data are needed [13].
In this paper, we propose a modelbased approach, step down quadratic regression, for gene identification and pattern recognition in noncyclic short timecourse microarray data. This approach takes into account time information because time is treated as a continuous variable. It is performed by initially fitting a quadratic regression model to each gene; a linear regression model will be fit to the gene if the quadratic term is determined to have no statistically significant relationship with time. Significance of gene differential expression and classification of gene expression patterns can be determined based on relevant Fstatistics and least squares estimates. Major advantages of our approach are that it not only preserves the ordering of time but also utilizes the actual times at which they were sampled; it identifies differentially expressed genes and classifies these genes based on their temporal expression profiles; and the temporal expression patterns discovered are readily understandable and biologically meaningful. A free Excel macro for applying this method is available at http://www.mc.uky.edu/UKMicroArray/bioinformatics.htm[14]. The proposed quadratic regression method is applied to a microarray timecourse study of olfactory receptor neurons [15]. Biologically meaningful temporal expression patterns have been obtained and shown to be more effective classifications than ANOVAprotected kmeans clusters. Comparison with Peddada et al.'s orderrestricted inference method [1] showed that our method provides a different and interesting insight into the temporal gene profiles. Reliabilities of the results from all 3 methods were assessed using a bootstrap method [16] and regression patterns were shown to have the highest reliabilities.
Results
Stepdown quadratic regression
We propose a stepdown quadratic regression method for gene discovery and pattern recognition for noncyclic short timecourse microarray experiment. The first step is to fit the following quadratic regression model to the j^{th} gene:
y_{ ij }= β_{0j}+ β_{1j}x + β_{2j}x^{2} + ε_{ ij } (1)
 1.
If overall model (1) pvalue >α_{0}, the j^{th} gene is considered to have no significant differential expression over time. The expression pattern of the gene is "flat".
 2.
If overall model (1) pvalue ≤ α_{0}, the j^{th} gene will be considered to have significant differential expression over time. The patterns are then determined based on the pvalues obtained from F tests (Table 1).
Type I Sum of Squares used to construct F test for pattern determination.
Type I Sum of Squares  Interpretations  F tests 

SS(linear)  total variability in the experiment due to the linear effect of time 

SS(quadratic  linear)  total variability in the experiment due to the quadratic effect of time that is not contained in SS(linear) 

SS(residual)  SS(total)  SS(linear)  SS(quadratic  linear) 
 a.
If both pvalue of quadratic effect ≤ α_{1} and pvalue of linear effect ≤ α_{1}, the j^{th} gene is considered to be significant in both the quadratic and linear terms. The expression pattern of the gene is "quadraticlinear".
 b.
If pvalue of quadratic effect ≤ α_{1} and pvalue of linear effect >α_{1}, the j^{th} gene is considered to be significant only in the quadratic term. The expression pattern of the gene is "quadratic".
 c.
If pvalue of quadratic effect > α_{1}, the j^{th} gene is considered to be nonsignificant in the quadratic term. The quadratic term will be dropped and a linear regression model will be fitted to the gene:
y_{ ij }= β_{0j}+ β_{1j}x + ε_{ ij } (2)
From fitting model (2),

If pvalue of linear effect ≤ α_{1}, the j^{th} gene is considered to be significant in the linear term. The expression pattern of the gene is "linear".

If pvalue of linear effect > α_{1}, the j^{th} gene is considered to be nonsignificant in the linear term. The expression pattern of the gene is "flat".
Determination of gene temporal expression patterns by the proposed regression method.
Regression Patterns  Sign of  Sign of  Predicted Signals  

Linear  up (LU)  +  N/A  N/A 
down (LD)    N/A  N/A  
Quadratic  concave (QC)  N/A    N/A 
convex (QV)  N/A  +  N/A  
QuadraticLinear  concave up (QLCU)  N/A   

concave down (QLCD)  N/A   
 
convex up (QLVU)  N/A  + 
 
convex down (QLVD)  N/A  + 

Application of the quadratic regression method
Comparison with Peddada et al.'s method
Comparisons of regression patterns and Peddada et al.'s profiles obtained from 3834 present genes.
Regression patterns  Peddada et al.'s profiles  

MI (48)  MD (16)  UD2 (155)  UD3 (25)  UD4 (34)  DU2 (62)  DU3 (26)  DU4 (13)  
LU (228)  41  0  0  0  13  11  0  1 
LD (123)  0  11  8  0  0  0  0  0 
QC (69)  0  0  14  7  1  0  0  0 
QV (15)  0  0  0  0  0  0  3  1 
QLCU (20)  2  0  1  3  3  0  0  0 
QLCD (214)  0  3  102  9  1  0  0  0 
QLVU (125)  2  0  0  0  0  44  13  2 
QLVD (4)  0  0  0  0  0  0  0  1 
Comparison with ANOVAprotected kmeans clustering
Comparisons of regression patterns and kmeans clusters obtained from 770 ANOVA significant genes.
Regression Patterns  Kmeans Clusters  

K1 (163)  K2 (126)  K3 (107)  K4 (42)  K5 (81)  K6 (95)  K7 (41)  K8 (64)  K9 (51)  
FLAT (211)  12  30  36  18  19  8  41  20  27 
LU (165)  0  68  0  0  51  10  0  36  0 
LD (72)  19  0  53  0  0  0  0  0  0 
QC (43)  5  0  0  20  0  0  0  0  18 
QV (8)  0  1  0  0  0  7  0  0  0 
QLCU (13)  0  0  0  0  4  0  0  8  1 
QLCD (151)  127  0  15  4  0  0  0  0  5 
QLVU (104)  0  27  0  0  7  70  0  0  0 
QLVD (3)  0  0  3  0  0  0  0  0  0 
EASE functional analysis on regression patterns and kmeans clusters
Overrepresented gene categories in some regression patterns from EASE functional analysis.
Reg. Patterns  Gene Category  List Hits  List Total  Pop. Hits  Pop. Total  EASE Score 

LD  Cell adhesion  10  64  52  699  3.53E02 
LU  Immune regulation  
defense response  43  159  98  699  9.77E07  
response to biotic stimulus  44  159  104  699  2.38E06  
immune response  37  159  88  699  2.77E05  
response to external stimulus  50  159  142  699  1.79E04  
immune cell activation  5  159  6  699  2.58E02  
cell activation  5  159  6  699  2.58E02  
lymphocyte activation  5  159  6  699  2.58E02  
Bcell activation  4  159  4  699  3.79E02  
Other biological functions  18  159  43  699  7.52E03  
QLCD  Coenzyme and prosthetic group metabolism  7  139  12  699  1.73E02 
QLCU  Signaling  
cyclicnucleotidemediated signaling  3  12  4  699  1.33E03  
secondmessengermediated signaling  3  12  5  699  2.20E03  
Gprotein signaling, coupled to cyclic nucleotide second messenger  2  12  3  699  4.65E02  
cAMPmediated signaling  2  12  3  699  4.65E02  
Protein metabolism  7  12  173  699  3.16E02  
QLVU  Immune regulation  
response to pest/pathogen/parasite  21  98  61  699  5.80E05  
response to wounding  14  98  40  699  1.54E03  
inflammatory response  12  98  32  699  2.19E03  
innate immune response  12  98  32  699  2.19E03  
defense response  24  98  98  699  3.88E03  
response to biotic stimulus  25  98  104  699  3.98E03  
immune response  22  98  88  699  4.82E03  
response to stress  23  98  98  699  8.67E03  
response to chemical substance  7  98  18  699  2.80E02  
humoral defense mechanism (sensu Vertebrata)  6  98  14  699  3.32E02  
response to external stimulus  28  98  142  699  3.53E02  
Cell surface receptor linked signal transduction  18  98  80  699  3.64E02  
Cellmatrix adhesion  4  98  6  699  3.78E02 
Overrepresented gene categories in some kmeans clusters from EASE functional analysis.
Kmeans Clusters  Gene Category  List Hits  List Total  Pop. Hits  Pop. Total  EASE Score 

K2  Immune Regulation  
immune response  23  121  88  699  3.02E02  
defense response  25  121  98  699  3.02E02  
response to biotic stimulus  26  121  104  699  3.38E02  
K4  Humoral immune regulation  5  37  24  699  3.01E02 
K5  Immune Regulation  
immune response  18  80  88  699  1.25E02  
response to biotic stimulus  20  80  104  699  1.51E02  
defense response  19  80  98  699  1.72E02  
Cell death  
apoptosis  9  80  34  699  2.91E02  
programmed cell death  9  80  35  699  3.43E02  
Ion Homeostasis  
ion homeostasis  5  80  13  699  4.88E02  
cell ion homeostasis  5  80  13  699  4.88E02  
Embryogenesis and morphogenesis  4  80  6  699  2.16E02  
Other biological functions  12  80  43  699  5.39E03  
K6  Immune Regulation  
innate immune response  14  87  32  699  3.03E05  
inflammatory response  14  87  32  699  3.03E05  
response to pest/pathogen/parasite  20  87  61  699  3.32E05  
response to wounding  15  87  40  699  1.04E04  
defense response  24  87  98  699  6.22E04  
response to biotic stimulus  24  87  104  699  1.57E03  
immune response  21  87  88  699  2.41E03  
response to stress  22  87  98  699  4.07E03  
response to chemical substance  7  87  18  699  1.59E02  
acutephase response  4  87  6  699  2.73E02  
chemotaxis  6  87  16  699  3.65E02  
taxis  6  87  16  699  3.65E02  
response to external stimulus  25  87  142  699  4.55E02  
Regulation  
regulation of biological process  11  87  39  699  1.45E02  
regulation of cellular process  11  87  39  699  1.45E02  
regulation of cell proliferation  9  87  30  699  2.25E02  
Cell surface receptor linked signal transduction  17  87  80  699  2.50E02 
Reliability analysis
Simulation study
We investigated the false positive rate (gene specific) of our method via a simulation study. The data were generated randomly from N(0,1), containing expression signals of 10000 "null" genes (no gene differentially expressed over time), with 5 time points and 3 replications per time point per gene. 50 of such data were generated. The regression approach was applied to each gene in each simulated data at α_{0} = 0.01 and the numbers of significant genes in each of the 50 data were obtained. The average proportion of significance (average false positive rate) is 1.01% with standard deviation 0.01%. This demonstrates that the false positive rate of the regression method is accurate because 1% of 10000 genes would be expected to be significant at 0.01 level by chance. The false positive rates of the regression patterns LU, LD, QC, QV are all approximately equal to 1/6 of the average false positives, and those of QLCU, QLCD, QLVU, and QLVD are all approximately equal to 1/12 of the average false positives.
Discussion
The proposed stepdown quadratic regression method is an effective statistical approach for gene discovery and pattern recognition. It utilizes the actual time information, and provides biologically meaningful classification of temporal gene expression profiles. Furthermore, it does not require replication at each time point, which ANOVAtype methods do require. Also, this method can identify genes with subtle changes over time and therefore discover genes that might be undetectable by other methods, eg, ANOVAtype methods. However, there are several limitations to this method. Firstly, it is designed to fit timecourse data with a small number of time points. We recommend this method when there are 4 to 10 time points in the data. For an experiment with more time points, splinetype methods [11, 12] could be a possible choice; for an experiment with 2 or 3 time points, ANOVAtype method is recommended. Secondly, the 9 regression patterns are rather limited considering the complexity of gene regulatory networks. For example, certain proportion of genes show cubic, "M", and "W" shaped patterns in 211 regression FLAT genes which are ANOVA significant (Table 4). These patterns could be caused by random chance, but they could also be real patterns. Fitting a higher order polynomial regression model may discover these types of genes profiles. Theoretically, one could fit a 4^{th}order polynomial regression model to this data (the highest order of the polynomial one can fit is the number of time points minus one). The model with 4^{th}order polynomial will work similarly to connecting the mean at each time point, therefore will provide a good fit to the data with smallest R^{2} and minimum Mean Squared Error, compared with lowerorder polynomials. However, the purpose of pattern analysis is to cluster the data instead of fitting models, so the quadratic fit is useful even though the goodness of fit may not be great. Also, the use of highorder polynomials (higher than the secondorder) should be avoided if possible [20], particularly in cases such as this where the regression coefficients are used primarily for classification. Another issue is the transformation of the experimental time. Transformation should be considered when the sampling time is unequally spaced. The choice for the type of transformation (logtransformation, squareroot transformation, etc) is not critical because the resulting pattern classification will in general not be impacted.
In the reliability curves, at 95% reliability, regression patterns, Peddada et al.'s profiles, and kmeans clusters have 33%, 12%, and 0% of genes, respectively; and at 80% reliability, the percentage of genes are 55%, 32%, and 0%, respectively (Figure 9). Even though the regression patterns have the highest reliability, only 33% of genes have 95% reliabilities. We examined the overall model (1) pvalues of 770 genes by the regression method and found that genes that have the smallest overall model (1) pvalues all have 95% reliabilities. This suggests that we could reduce the level of significance α_{0} to increase the stability of regression patterns. α_{0} could be reduced using various multiple testing pvalue adjustment procedures, for example, Westfall and Young's step down method [21], and False Discovery Rate (FDR) [17]. Application of the FDR method can be done as follows (assuming FDR is controlled at level of α): let p_{(1)} <p_{(2)} < … <p_{(m)}be the ordered overall model (1) pvalues, start from the largest pvalue p_{(m)}, compare each p_{(i)}with α *i/m; let k be the largest i that p_{(k)}≤ α *k/m, conclude p_{(1)}, …, p_{(k)}to be significant.
Both our quadratic regression method and Peddada et al.'s method serve the same overall goal: gene selection and classification. Peddada et al.'s method provides more choices of temporal profiles than our method. While our regression method offers less choice of patterns, it may provide deeper insight into the gene expression profiles than Peddada et al.'s method. Our method distinguishes patterns with different rates of change and provides more information on the relative relationship among the expression levels of all time points. For example, specifying a profile of updown with maximum at one time point does not provide much information on the relative relationships among other time points (Figure 5). A further refinement of Peddada et al.'s method may provide such information about the relationship of other time points besides the maximum/minimum. However, it is less likely to separate the patterns in Figure 5a, b, and 5c by their method. Another fact is that Peddada et al.'s method provides exactly the location of the maximum/minimum, whereas our method provides the neighborhood of the location of the maximum/minimum. Furthermore, their method is based on bootstrap, which is computationally intensive. The result of their method, for example, the reliability curves, might be improved by applying more bootstrap, which is 4000 in this paper due to the computational difficulties and time constraints. Moreover, their method depends on the ordering of time but not the actual time at which the samples were taken, whereas the regression method accounts for both.
Kmeans is an iterative clustering algorithm [22]. The first step of this method is to randomly assign the data points to the k clusters. Next, the distance to the center of each cluster is calculated for each data point, and the data point is moved into the closest cluster. This step will be repeated until no data point is moving from one cluster to another. In kmeans, the number of clusters, k, needs to be prespecified. Researchers usually choose several different k and find the one which has the most biologically meaningful clusters. There are methods of finding the "optimal" k, for example, Bayesian Information Criterion [23]. In this paper, k was arbitrarily chosen to be 9. Since the kmeans clustering does not perform well (Table 4; Figures 6, 7, and 8), we investigated different choice of k based on the Bayesian Information Criterion and identified that the "optimal" k is 15. However, as we examined these 15 kmeans clusters, the pattern classification does not seem to be improved, the same problem exists as with k = 9. For example, Prom1, Clu, and D17H6S56E5 (Figure 5c, Figure 7c and 7d) all have similar temporal profiles and are all classified to be QLVU, but they were separated into 3 of the 15 kmeans clusters. This could be related to the distance measure used (Pearson correlation coefficient). As we discovered, genes in the same cluster do not necessarily have higher correlation than genes in different clusters. For example, Sfpi1 and Anxa2 (Figure 7a and 7b) are highly correlated (Pearson correlation coefficient is 0.9934) and their expression patterns are similar, but they are in different kmeans clusters. A possible reason might be that the timecourse in olfactory receptor neuron data is too short for correlation to perform well. Even though there are a total of 15 observations for each gene, correlation calculations are based on the 5 mean signals, which could be too few to describe the relationship between temporal profiles. There is also concern about using correlation as the distance measure. A large correlation coefficient does not necessarily indicate two similarly shaped profiles, nor does a small correlation coefficient necessarily indicate differently shaped profiles [1].
A number of regression algorithms have been proposed recently, which treat time as a continuous variable. Several of them are based on cubic Bsplines [11, 12]. Bsplines are defined as a linear combination of a set of basis polynomials. In order to fit cubic Bsplines to timecourse data, the entire duration of experimental time needs to be divided into several segments by "knots" (the point to separate segments), and each segment will be fit by cubic polynomial. The successful application of these methods to microarray timecourse data depends heavily on having a relatively large numbers of time points. The Bspline based methods will not be effective when there are a small number of time points in the timecourse experiment [13]. For a data with 5 time points, cubic Bspline type methods would not be appropriate because it is recommended that there should be at least 4 or 5 experimental time points in each segment [24]. Xu et al used a piecewise quadratic regression model to identify differentially expressed genes [10]. In their approach, expression levels at 0 hour and 2 hours after treatment are fit differently from the rest of time points after treatment. Although appropriate for their data, their method cannot be applied to the dataset used in this paper.
The quadratic regression method that we applied to the olfactory receptor neuron data relies on the normality assumption. This is supported by the result of the ShapiroWilk normality test, which indicates that most of the genes used for the analysis follow a normal distribution. This might be due to the fact that we removed genes that are called "A" (absent) by Affymetrix across all chips. "A" calls are often assigned to low expression signals, which tend to be nonnormal in general. Therefore removing genes with a high proportion of "A" calls may reduce the possibility of violation of the normality assumption, which will then make the test based on distributional assumption more likely to be valid, and thus avoid computational intensive resampling procedures, for example, bootstrap and permutation. If desired, experimenters could also try various types of data transformation to make their data closer to normal when the data are shown to have large departure from normality. However, the log transformation performed on the olfactory receptor neuron data was not to reduce the possible nonnormality, but solely to make a fair comparison of our regression method and kmeans method because it is the default transformation in Genespring. When the normality assumption (ε_{ ij }~ N(0, )) does not hold, the bootstrap method [25] can be used to avoid the distributional assumption. For an experiment with m genes, T time points, and r replications per time point, the bootstrap procedure can be performed in the following way: form the data into a matrix of m × rT, each column in the matrix contains expressions of m genes in one chip and each row contains rT expressions of one gene; randomly draw rT columns with replacement to form a bootstrap sample; apply stepdown quadratic regression procedure to the bootstrap sample to obtain F statistics from F tests; repeat the above steps 1000 times to form a bootstrap F distribution for each gene; claim a gene to be significance at level of α if its observed F statistics is greater than the upper (α / 2)^{ th }percentile or less than the lower (α / 2)^{ th }percentile of its bootstrap F distribution. One concern about using bootstrap here is that the bootstrap F distribution might be too discrete due to the small number of time points. However, the fact that we are bootstrapping both the explanatory and response variables mitigates this issue by using all the data points, not just the time points. Additionally, in a small simulation study, we observed that the bootstrap F distribution is rather smooth (result not shown).
Conclusion
The proposed stepdown quadratic regression approach is shown to be effective for gene discovery and pattern recognition for noncyclic short timecourse microarray experiment. Major advantages of this method are that it preserves the actual time information, and provides a useful tool for gene identification and pattern recognition. The nine regression patterns, obtained when applied to the olfactory receptor neuron data, are shown to be more reasonable classifications compared to ANOVAprotected kmeans clustering method. EASE analysis further showed that our regression patterns are more biologically meaningful than the kmeans clusters. Comparison with Peddada et al.'s method showed that our method provides a different perspective on the temporal gene profiles. Reliability study indicates that regression patterns are most reliable. In conclusion, this method should improve gene discovery and pattern recognition for microarray timecourse data. With the freely accessible Excel macro, investigators can readily apply this method to their research data.
Methods
ANOVAprotected kmeans clustering
Oneway ANOVA model y_{ ijk }= μ_{ j }+ τ_{ k }+ ε_{ ijk }was fitted to each gene in SAS v9, where y_{ ijk }denotes the gene expression level of the j^{th} gene at the i^{th} replication of the k^{th} time point, μ_{ j }denotes the overall mean signal of the j^{th} gene, τ_{ k }denotes the effect of k^{th} time point, ε_{ ijk }denotes the random error associated with the i^{th} replication at the k^{th} time point of the j^{th} gene and is assumed to be independently distributed normal with mean 0 and variance . Genes that have overall ANOVA model pvalues ≤ α_{0} will be used for kmeans clustering. Kmeans clustering was performed in Genespring V6.1 (Silicon Genetics. Redwood City, CA) with k = 9. The similarity measure was chosen to be Pearson correlation coefficient, which was calculated from vectors of length 5 containing mean signals of 3 replications at each of the 5 time points. 500 additional random clusters were tested and the best clusters were selected by the software.
EASE functional analysis
EASE software was used to identify the overrepresented categories of genes [19]. Gene Ontology Biological Process was chosen as the categorization system in EASE analysis. A functional gene category with an EASE score of less than 0.05 is considered to be overrepresented. The EASE software is available at: http://david.niaid.nih.gov/david/ease.htm.
Data description
The data used here are from a study of olfactory receptor neurons [15]. The goal is to investigate the induction of gene regulation at short time intervals following deafferentation of olfactory receptor neurons by target ablation at 2, 8, 16, and 48 hrs compared with the sham control. Total RNA was isolated from 3 male littermate mice per time point. Following hybridization with Affymetrix GeneChips MGU74Av2, 3 chips per time point, the signals were generated by GeneChip Analysis Suite v5.0. The data was filtered before statistical tests were performed. First, 66 Affymetrix quality control probesets and 6432 expressed sequence tags were removed. Next, the absent call (A) provided by Affymetrix was considered. 2156 genes that are called "A" across all 15 chips were removed from the data. The remaining 3834 present genes were used for the regression analysis (Figure 3). The hybridization signals of these 3834 genes were logtransformed in Genespring. Because the time points in this experiment are not equally spaced, ln(t+1) transformation was performed to each of the 5 time points, where t stands for the time point.
List of abbreviations
 ANOVA:

analysis of variance.
 LD:

linear down regulated regression pattern.
 LU:

linear up regulated regression pattern.
 QC:

quadratic concave regulated regression pattern.
 QV:

quadratic convex regulated regression pattern.
 QLCD:

quadraticlinear concave down regulated regression pattern.
 QLCU:

quadraticlinear concave up regulated regression pattern.
 QLVD:

quadraticlinear convex down regulated regression pattern.
 QLVU:

quadraticlinear convex up regulated regression pattern.
 MI:

monotone increasing.
 MD:

monotone decreasing.
 UD2/UD3/UD4:

updown with maximum at the second/third/forth time point.
 DU2/DU3/DU4:

downup with maximum at the second/third/forth time point.
Declarations
Acknowledgements
We wish to thank Christopher P. Saunders for his help on the statistical analysis, Dr. KueyChu Chen for her help in the application of EASE analysis, Dr. Shyamal D. Peddada for his help in the application of his method, and referees for their thoughtful comments. This work is supported by NIHAG01682423 (TVG), NIH1P20RR1648103 (AJS), and NSFEPS0132295 (AJS).
Authors’ Affiliations
References
 Peddada SD, Lobenhofer EK, Li L, Afshari CA, Weinberg CR, Umbach DM: Gene selection and clustering for timecourse and doseresponse microarray experiments using orderrestricted inference. Bioinformatics 2003, 19(7):834–841. 10.1093/bioinformatics/btg093View ArticlePubMedGoogle Scholar
 Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genomewide expression patterns. PNAS 1998, 95(25):14863–14868. 10.1073/pnas.95.25.14863PubMed CentralView ArticlePubMedGoogle Scholar
 Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM: Systematic determination of genetic network architecture. Nature Genet 1999, 22(3):281–285. 10.1038/10343View ArticlePubMedGoogle Scholar
 Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR: Interpreting patterns of gene expression with selforganizing maps: Methods and application to hematopoietic differentiation. PNAS 1999, 96(6):2907–2912. 10.1073/pnas.96.6.2907PubMed CentralView ArticlePubMedGoogle Scholar
 Brown MPS, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS, Ares M Jr, Haussler D: Knowledgebased analysis of microarray gene expression data by using support vector machines. PNAS 2000, 97(1):262–267. 10.1073/pnas.97.1.262PubMed CentralView ArticlePubMedGoogle Scholar
 Wolfinger RD, Gibson G, Wolfinger ED, Bennett L, Hamadeh H, Bushel P, Afshari C, Paules RS: Assessing Gene Significance from cDNA Microarray Expression Data via Mixed Models. Journal of Computational Biology 2001, 8(6):625–637. 10.1089/106652701753307520View ArticlePubMedGoogle Scholar
 Park T, Yi SG, Lee S, Lee SY, Yoo DH, Ahn JI, Lee YS: Statistical tests for identifying differentially expressed genes in timecourse microarray experiments. Bioinformatics 2003, 19(6):694–703. 10.1093/bioinformatics/btg068View ArticlePubMedGoogle Scholar
 Yeung KY, Fraley C, Murua A, Raftery AE, Ruzzo WL: Modelbased clustering and data transformations for gene expression data. Bioinformatics 2001, 17(10):977–987. 10.1093/bioinformatics/17.10.977View ArticlePubMedGoogle Scholar
 Pan W, Lin J, Le C: Modelbased cluster analysis of microarray geneexpression data. Genome Biology 2002, 3(2):research0009.0001research0009.0008. 10.1186/gb200232research0009View ArticleGoogle Scholar
 Xu XL, Olson JM, Zhao LP: A regressionbased method to identify differentially expressed genes in microarray time course studies and its application in an inducible Huntington's disease transgenic model. Hum Mol Genet 2002, 10(17):1977–1985. 10.1093/hmg/11.17.1977View ArticleGoogle Scholar
 Luan Y, Li H: Clustering of timecourse gene expression data using a mixedeffects model with Bsplines. Bioinformatics 2003, 19(4):474–482. 10.1093/bioinformatics/btg014View ArticlePubMedGoogle Scholar
 BarJoseph Z, Gerber G, Simon I, Gifford DK, Jaakkola TS: Comparing the continuous representation of timeseries expression profiles to identify differentially expressed genes. PNAS 2003, 100(18):10146–10151. 10.1073/pnas.1732547100PubMed CentralView ArticlePubMedGoogle Scholar
 BarJoseph Z: Analyzing time series gene expression data. Bioinformatics 2004, 20(16):2493–2503. 10.1093/bioinformatics/bth283View ArticlePubMedGoogle Scholar
 The Excel macro for the stepdown quadratic regression method[http://www.mc.uky.edu/UKMicroArray/bioinformatics.htm]
 Getchell TV, Liu H, Vaishnav RA, Kwong K, Stromberg AJ, Getchell ML: Temporal profiling of gene expression during neurogenesis and remodeling in the olfactory epithelium at short intervals after target ablation. Journal of Neuroscience Research 2005, 80(3):309–329. 10.1002/jnr.20411View ArticlePubMedGoogle Scholar
 Kerr MK, Churchill GA: Bootstrapping cluster analysis: Assessing the reliability of conclusions from microarray experiments. PNAS 2001, 98(16):8961–8965. 10.1073/pnas.161273698PubMed CentralView ArticlePubMedGoogle Scholar
 Benjamini Y, Hochberg Y: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society Series B (Methodological) 1995, 57(1):289–300.Google Scholar
 SAS v9 online document Cary, NC, USA: SAS Institute Inc;
 Hosack D, Dennis G, Sherman B, Lane H, Lempicki R: Identifying biological themes within lists of genes with EASE. Genome Biology 2003, 4(10):R70. 10.1186/gb2003410r70PubMed CentralView ArticlePubMedGoogle Scholar
 Montgomery DC, Peck EA, Vining GG: Introduction to linear regression analysis. 3rd edition. John Wiley &Sons, Inc; 2001.Google Scholar
 Westfall PH, Young SS: ResamplingBased Multiple Testing: Examples and Methods for pValue Adjustment. John Wiley &Sons, Inc; 1993.Google Scholar
 Hartigan JA: Clustering Algorithms. John Wiley &Sons, Inc; 1975.Google Scholar
 Schwarz G: Estimating the dimension of a model. Annals of Statistics 1978, 6: 461–464.View ArticleGoogle Scholar
 Seber GA, Lee AJ: Linear regression analysis, second edition. John Wiley &Sons, Inc; 2003.View ArticleGoogle Scholar
 Efron B, Tibshirani RJ: An Introduction to the Bootstrap. Chapman and Hall; 1993.View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.