The description of the methods in the following includes some technical material that is not essential for the description of the results in the following sections but is provided so as to give a complete account of the functional regression methodology. The methodology described below and used for the analysis of gene expression trajectories has been implemented in Matlab and is freely available in the PACE (principal analysis by conditional expectation) package, downloadable from the internet [14].

In this program, if one chooses default values, the only required input for the functional regression part PACE-REG are the measurements of gene expressions for predictor and response trajectories, denoted in the previous section by (*s*_{
ij
}, *X*_{
ij
}) and (*t*_{
ik
}, *Y*_{
ik
}) where *i* ranges over the genes, *j* over the predictor measurements (its range may depend on *i*), and *k* over the response measurements (range may depend on *i*). As outputs one obtains the quantities as shown in the results section.

### Preliminaries on functional linear regression

Denoting the random predictor and response functions by *X*(*s*) resp. *Y*(*t*), their mean functions by *μ*_{
X
}(*s*) = *E*(*X*(*s*)), *μ*_{
Y
}(*t*) = *E*(*Y*(*t*)) and their covariance functions by *G*_{
X
}(*s*_{1}, *s*_{2}) = cov(*X*(*s*_{1}), *X*(*s*_{2})), *G*_{
Y
}(*t*_{1}, *t*_{2}) = cov(*Y*(*t*_{1}), *Y*(*t*_{2})), one obtains under mild conditions the Karhunen-Loève expansions for trajectories *X* and *Y*, given by

X(s)={\mu}_{X}(s)+{\displaystyle \sum _{j=1}^{\infty}{\xi}_{k}{\phi}_{k}(s)},

(1)

Y(t)={\mu}_{Y}(t)+{\displaystyle \sum _{k=1}^{\infty}{\zeta}_{k}{\psi}_{k}(t)}

(2)

[15, see Appendix for further explanations]. These representations provide a convenient way to implement the necessary dimension reduction for the trajectories *X* and *Y*, by truncating the sums on the r.h.s. at a finite number of terms, where the truncation point needs to be chosen data-adaptively (flatter and simpler structured trajectories requiring fewer and more complex trajectories requiring more components to be included). The trajectories are represented by their overall mean function, random coefficients *ξ*_{
j
}resp. *ζ*_{
k
}and the sequence of basis functions *φ*_{
j
}resp. *ψ*_{
k
}, with indices *j* and *k* ranging between 1 and the truncation value, say 1 ≤ *j* ≤ *J* for *X* and 1 ≤ *k* ≤ *K* for *Y*. The functions *φ*_{
j
}and *ψ*_{
k
}are chosen as eigenfunctions and are often referred to as "modes of variation": They represent the main "directions" in function space in which the trajectories vary. Representations (1) and (2) are analogous to expressing a centered random vector in terms of the basis of the vector space that consists of the eigenvectors. The random effects *ξ*_{
j
}, *ζ*_{
k
}are centered at zero and are referred to as functional principal component scores, or just scores. Additional mathematical details can be found in the Appendix.

The functional linear regression model with response function *Y* and predictor function *X* is

E(Y(t)|X)={\mu}_{Y}(t)+{\displaystyle {\int}_{S}\beta (s,t)(X(s)-{\mu}_{X}(s))ds},

(3)

where the bivariate regression parameter function *β*(*s*, *t*) is smooth and square integrable [12]. This model emerges as a generalization of the multivariate linear regression model *E*(*Y*|*X*) = *BX*, where *X* and *Y* are random vectors and *B* is a parameter matrix. The function *β* is central to this functional regression model. For fixed *t*, the value of the response trajectory at *t*, which is *Y*(*t*), is obtained by integrating the predictor trajectory over its domain with the weight function *β*(*s*, *t*), viewed as a function of *s* for fixed *t*.

One way to interpret the functional regression is therefore to fix various levels of *t* and to then inspect these weight functions by taking the appropriate cross-section through the surface *β*(*s*, *t*) when *t* is held fixed at the selected level. This weight function then indicates which parts of the predictor trajectory contribute positively or negatively to the outcome *Y*(*t*). Under regularity assumptions, the regression parameter surface *β* has the following basis representation,

\begin{array}{ll}\beta (s,t)={\displaystyle \sum _{k=1}^{\infty}{\displaystyle \sum _{j=1}^{\infty}{\beta}_{kj}{\phi}_{j}(s){\psi}_{k}(t)}},\hfill & {\beta}_{kj}=\frac{E({\xi}_{j}{\zeta}_{k})}{E({\xi}_{j}^{2})}.\hfill \end{array}

(4)

A detailed description of how the mean functions *μ*_{
X
}, *μ*_{
Y
}, eigenfunctions *φ*_{
j
}, *ψ*_{
k
}and eigenvalues *λ*_{
j
}, *τ*_{
k
}can be consistently estimated from noisy data by using nonparametric smoothing methods can be found in references [13, 16]. These estimation steps involve pooling the measurements for all predictor (resp. response) trajectories and then applying a smoothing method to obtain the mean function *μ*_{
X
}, and to smooth pointwise "raw" covariances (omitting the diagonal variances) to obtain the smooth covariance function cov(*X*(*s*_{1}), *X*(*s*_{2})). Then eigenfunctions are obtained by numerical discretization and spectral decomposition of the resulting covariance matrix and projected onto a positive definite covariance function. This leads to estimates {\widehat{\xi}}_{j}, {\widehat{\zeta}}_{k} of the functional principal component scores *ξ*_{
j
}, *ζ*_{
k
}. These estimates must cope with noise in the observed expression data and therefore are implemented as estimated conditional expectations. The resulting estimates are then plugged in on the r.h.s. of eq. (4) to obtain an estimate of the regression parameter function *β*.

### Decomposing functional linear regression into simple linear components

As is shown in the Appendix, the relationships between response scores within the functional linear model are simple: They are simple linear regressions through the origin, i.e.,*E*(*ζ*_{
k
}|*ξ*_{
j
}) = *β*_{
kj
}*ξ*_{
j
}.

Here the slope coefficients *β*_{
kj
}of these simple linear regressions define the functional regression parameter function according to (4). This leads to the conclusion that functional linear regression can be decomposed into a series of simple linear regressions of the functional principal component scores of response processes on those of predictor processes.

This fact substantially simplifies the analysis and provides an alternative interpretation of the resulting functional linear regression model. Note that representation (4) implies that for given slopes *β*_{
kj
}, the regression parameter surface *β*(·,·) is uniquely determined. It follows from (12) in the Appendix that the inverse also holds. Therefore, a functional linear relationship between random trajectories *Y* and *X* can be described equivalently by the regression parameter function *β*(*s*, *t*) or the doubly-indexed sequence of all slope coefficients *β*_{
kj
}, *k*, *j* ≥ 1.

A consequence of this equivalence is that estimates of the slope coefficients can be directly used to obtain straightforward estimates of the regression parameter function. Another consequence of this equivalence is that it opens up two different perspectives for the interpretation of a functional linear regression relation: Either through features of the shape of the regression parameter surface *β*(*s*, *t*), or through the ensemble of the regression slopes *β*_{
kj
}, *k*, *j* = 1, 2,.... For the latter, the interpretation is in terms of changes towards the direction of the corresponding eigenfunctions. For example, if the slope coefficient *β*_{11} relating the first predictor score to the first response score is positive and large, it implies that as the predictor trajectories move increasingly towards the direction marked by *φ*_{1}, i.e., from its mean *μ*_{
X
}towards *μ*_{
X
}+ *γφ*_{1} for increasing *γ*, response trajectories move increasingly towards direction *ψ*_{1}, i.e., from *μ*_{
Y
}towards *μ*_{
Y
}+ *γψ*_{1} for increasing *γ*. The other slope coefficients can be interpreted analogously.

Often there will be interest in measuring the strength of association between predictors and responses in a functional regression. The classical measure in a multiple linear regression relationship is the coefficient of determination *R*^{2}. Extensions to the functional case were discussed in reference [13]. When applying the decomposition into simple linear regressions, this functional coefficient of determination assumes a particularly simple form.

### Implementing the decomposition

Given estimates {\widehat{\phi}}_{j}, {\widehat{\psi}}_{k}, {\widehat{\xi}}_{ij} and {\widehat{\zeta}}_{ik} of eigenfunctions and functional principal component scores, the regression parameter surface is estimated by

\widehat{\beta}(s,t)={\displaystyle \sum _{k=1}^{K}{\displaystyle \sum _{j=1}^{J}{\widehat{\beta}}_{kj}{\widehat{\phi}}_{j}(s){\widehat{\psi}}_{k}(t)}},

(6)

where unknown slope estimates {\widehat{\beta}}_{kj} can be obtained by least squares estimation of the slopes in scatterplots {{\widehat{\zeta}}_{ik}} on {{\widehat{\xi}}_{ij}}, fitted without intercept, for all *k* = 1,..., *K*, and *j* = 1,..., *J*. These least squares estimates are however subject to attenuation and therefore bias due to the presence of errors in the predictors {{\widehat{\xi}}_{ij}} (corresponding to errors-in-variables), as these need to be estimated from the data and are therefore imprecise. An alternative estimate that is less subject to this problem is given by

\begin{array}{cc}{\widehat{\beta}}_{kj}=\frac{1}{(n-1){\widehat{\lambda}}_{j}}{\displaystyle \sum _{i=1}^{n}({\widehat{\xi}}_{ij}-\overline{\widehat{\xi}}{.}_{j})({\widehat{\zeta}}_{ik}-\overline{\widehat{\zeta}}{.}_{k})},& 1\le j\le J,1\le k\le K,\end{array}

(7)

as the denominator is estimated directly from the covariance surface where the measurement errors are confined to the diagonal which allows to eliminate their effect to a large extent. If the denominator is estimated from empirically observed functional principal component scores, as is the case in the usual least squares estimators, then contamination by error may lead to inflated empirical variances, and as a result to attenuation of the estimated regression coefficients. In the following we therefore adopt the alternative estimate (7).

Note that *J* and *K* denote the numbers of random components of predictor resp. response processes to be included in the functional regression analysis. These numbers can be determined in practice by scree plots, displaying the fraction of variance explained as the number of included components increases. This simple and fast approach leads to adequate results, choosing the smallest number of components that explain 85% of the variation. Alternative selectors are discussed in references [13, 17]. Based on (13), we insert the chosen values of *J* and *K* to obtain the estimated coefficient of determination

\begin{array}{ccc}{\widehat{R}}^{2}={\displaystyle \sum _{j=1}^{J}\frac{{\displaystyle {\sum}_{k=1}^{K}{\widehat{R}}_{kj}^{2}{\widehat{\tau}}_{k}}}{{\displaystyle {\sum}_{k=1}^{K}{\widehat{\tau}}_{k}}},}& \text{with}& {\widehat{R}}_{kj}^{2}=\frac{{\widehat{\beta}}_{kj}^{2}{\widehat{\lambda}}_{j}}{{\tau}_{k}},\end{array}

(8)

using estimates (7). Here the {\widehat{R}}_{kj}^{2} are the estimated coefficients of determination of the simple linear regressions of *ζ*_{
ik
}on *ξ*_{
ij
}, which target (14).

### Bootstrap inference

Inference for the functional regression that overcomes the high dimension and multiple testing problem is obtained by the bootstrap. In the face of the complexity of the dependencies between estimated functional principal component scores and estimated eigenfunctions, the regression bootstrap with resampling from the sample of all pairs of predictor and response trajectories emerges as a doable approach. The starting point for generating the bootstrap samples is to randomly sample *n* units from trajectory indices {1, 2,..., *n*} by sampling with replacement and, for each sampled index *i**, entering all observations for the corresponding predictor and response trajectories. This sampling procedure is repeated until *B* bootstrap samples, consisting of predictor and response data for each of *n* trajectories, have been assembled.

For each of these bootstrap samples, we then carry out the functional linear regression procedure, obtaining all relevant estimates. The resulting estimates from the *B* bootstrap samples are used to construct pointwise confidence intervals for the regression parameter function *β*(·,·), simply by locating the corresponding lower and upper quantiles in the bootstrap distribution of the estimated surface values \widehat{\beta}(*s*, *t*) (6) for all fixed *s*, *t*. The resulting upper and lower confidence surfaces will provide an idea how well the surface is actually determined from the data. Analogously, local confidence bands can be constructed for individual predicted trajectories, given any predictor function.

In addition to confidence intervals, it is a common objective in regression analysis to establish the "significance of the regression relation", i.e., to infer whether within the assumed model the predictors indeed influence the responses. The classical test for a multiple regression corresponds to the null hypothesis that all regression slope coefficients vanish, or equivalently, that the coefficient of determination *R*^{2} = var(*E*(*Y*|*X*))*/* var(*Y*) vanishes. The extension to the functional case is not straightforward. In a first attempt to assess the overall significance of the functional regression, one can use pointwise bootstrap confidence intervals, say at the 95% level, and check whether there are areas where 0 is not included in these intervals. However, this does not properly account for the multiple confidence statements that are made. To obtain an overall bootstrap test for the significance of the functional regression relation, we use an alternative resampling procedure where predictors and responses are resampled separately under the null hypothesis of no regression relationship. For each resample, the data for a predictor trajectory and the data for a response trajectory are selected separately by sampling with replacement from all predictor and all response trajectories. As predictors and responses are unrelated in this case, applying functional regression to *B* such null bootstrap samples (each of size *n*) provides a null distribution for the pivotal statistic.

In our approach we specifically select the functional coefficient of determination *R*^{2} (13) as test statistic, as it summarizes the regression effect. Accordingly, we obtain the overall *p*-value of the functional regression by determining the empirical quantile of the observed value of *R*^{2} within the null bootstrap distribution for *R*^{2}. Bootstrap inference for the overall regression effect based on this device will be illustrated in the application to gene expression profiles in the next section.