### MPTGA and extension

Lin et al. [6] proposed a Multivariate Polynomial Time-dependent Genetic Association (MPTGA) method to detect genetic effects in temporal gene expression trajectories. The MPTGA method assumes that for each genotype *j*, the temporal gene expression trait **y** across *m* time points follows a multivariate normal distribution \(\mathcal {N}(\mathbf {g}_{j},\,\Sigma)\), with density function

$$f_{j}(\mathbf{y})=\frac{1}{(2\pi)^{m/2}|\Sigma|^{1/2}}\exp{\left[-\left(\mathbf{y}-\mathbf{g}_{j}\right)\Sigma^{-1}\left(\mathbf{y}-\mathbf{g}_{j}\right)^{T}/2\right]}. $$

The mean vector **g**_{j} for genotype *j* is modeled with a polynomial curve

$$\mathbf{g}_{j}=\left[\mathbf{g}_{j}(t)\right]_{1\times m}=\left[\Sigma_{k=0}^{K} \beta_{kj}t^{k}\right]_{1\times m},$$

where K is the degree of the polynomial function. In this study, K was set to be 3.

In particular, MPTGA assumes a first order auto-regression model to take into account the auto-correlation between different time points, with the covariance matrix specified as

$$\Sigma=\sigma_{e}^{2}\begin{bmatrix} 1 & \rho & \dots & \rho^{m-1} \\ \rho & 1 & \dots & \rho^{m-2} \\ \vdots & \vdots & \ddots & \vdots \\ \rho^{m-1} & \rho^{m-2} & \dots & 1 \end{bmatrix}. $$

For each sample *i*, its gene expression trajectory **y**_{i}(*t*) across *m* time points can be written as

$$\mathbf{y}_{i}(t)=\delta_{i0}\sum_{k=0}^{K} \beta_{k0}t^{k} + \delta_{i1}\sum_{k=0}^{K} \beta_{k1}t^{k} +\epsilon_{i},$$

where *δ*_{i0} and *δ*_{i1} are the indicator variables of sample *i* taking genotype 0 or 1.

Given the joint likelihood

$$\mathcal{L}(\Theta)=\prod_{i=1}^{N}[\!\delta_{i0}f_{0}(\mathbf{y}_{i}) + \delta_{i1}f_{1}(\mathbf{y}_{i})]$$

The maximum likelihood estimates of the parameter set \(\Theta =\left (\beta _{kj},\rho,\sigma _{e}^{2}\right)\) can be derived as described in the Lin et al. [6] paper.

Lin et al. [6] compared longitudinal trajectories between two different conditions (e.g. two possible genotypes in a haploid system). The method can be extended to accommodate three or more conditions (e.g. effects of three possible genotypes in a diploid system; e.g. effects of different viruses infection). This comparison can be done using done using a pairwise test between conditions. Alternatively, we can construct a full model to detect the (genetic) effects:

$$y_{i}(t)=\delta_{i0}\sum\limits_{k=0}^{K} \beta_{k0}t^{k} + \delta_{i1}\sum\limits_{k=0}^{K} \beta_{k1}t^{k} + \delta_{i2}\sum_{k=0}^{K} \beta_{k2}t^{k} + \epsilon_{i}$$

for a given trait Y, then the reduced model *H*_{0} (single gene expression trait curve)

$$H_{0}: \beta_{k0}=\beta_{k1}=\beta_{k2} \qquad \text{for all} k$$

can be compared against the full model *H*_{1} (different gene expression trait curve for different conditions (genotypes/viruses infection)):

$$H_{1}: \text{at least one of the equalities does not hold}$$

to test the hypothesis of the existence of eQTL at a locus or difference between effects of viruses by estimating these parameters with a MLE procedure and performing a likelihood ratio test. The full model approach differs from the pairwise test in that it imposes an extra condition of a shared covariance structure among all three conditions, which is estimated in the full model taking all three conditions into account; whereas the pairwise test only requires the covariance structure to be shared between each pair, which is estimated separately for each pair.

Similar to the MPTGA model with two genotypes, we assume that for each genotype *j*=0,1,2, the mean vector **g**_{j} for genotype *j* is modeled with a polynomial curve

$$\mathbf{g}_{j}=\left[\mathbf{g}_{j}(t)\right]_{1\times m}=\left[\Sigma_{k=0}^{K} \beta_{kj}t^{k}\right]_{1\times m}\qquad j=0,1,2.$$

Given the joint log-likelihood

$$\begin{array}{*{20}l} \log\mathcal{L}(\Theta)=&\sum\limits_{i=1}^{N}\delta_{i0}\log f_{0}(\mathbf{y}_{i}) +\sum\limits_{i=1}^{N} \delta_{i1}\log f_{1}(\mathbf{y}_{i}) \\ &+ \sum\limits_{i=1}^{N}\delta_{i2}\log f_{2}(\mathbf{y}_{i}), \end{array} $$

the maximum likelihood estimates of the parameter set \(\Theta =\left (\beta _{0j},\beta _{1j},\beta _{2j},\beta _{3j},\rho,\sigma _{e}^{2}\right),j=0,1,2\) can be derived by first looking for the critical point of \(\log \mathcal {L}(\Theta)\) by taking its derivative with respect to *β*’s and \(\sigma _{e}^{2}\), finding that both \(\beta 's,\sigma _{e}^{2}\) can be expressed as functions of *ρ* at the critical point, therefore \(\log \mathcal {L}(\Theta)\) can be expressed as functions of *ρ* at the critical point as well; then the MLE of *ρ* can be derived by taking derivative of \(\log \mathcal {L}(\Theta)\) with respect to *ρ*, and thus *β*’s and \(\sigma _{e}^{2}\) can be obtained accordingly.

The detailed derivation is as follows. Define \(\mathbf {T}_{j}=\sum _{i=1}^{N}\delta _{ij}\mathbf {y}_{i},j=0,1,2\); **I**_{0}= [1⋯1]_{1×m},**I**_{1}= [1⋯*m*], \(\mathbf {I}_{2}=\mathbf {I}_{1.}^{2}=\ [1\cdots m^{2}]\), \(\mathbf {I}_{3}=\mathbf {I}_{1.}^{3}=\ [1\cdots m^{3}]\); \(Q(\rho,\mathbf {U},\mathbf {V})=\frac {1}{1-\rho ^{2}}(U_{1}V_{1}+U_{m}V_{m})-\frac {\rho }{1-\rho ^{2}}\left [\sum _{i=1}^{m-1}U_{i}V_{i+1}+U_{i+1}V_{i}\right ]+\frac {1+\rho ^{2}}{1-\rho ^{2}}\sum _{i=2}^{m-1}U_{i}V_{i}\), where **U**= [*U*_{1},⋯,*U*_{m}] and **V**= [*V*_{1},⋯,*V*_{m}].

Taking derivative of \(\log \mathcal {L}(\Theta)\) with respect to *β*_{.0}’s gives the following linear system:

$$\begin{bmatrix} \alpha_{11}\beta_{00}+\alpha_{12}\beta_{10} + \alpha_{13}\beta_{20} + \alpha_{14}\beta_{30}=b_{1} \\ \alpha_{21}\beta_{00}+\alpha_{22}\beta_{10} + \alpha_{23}\beta_{20} + \alpha_{24}\beta_{30}=b_{2} \\ \alpha_{31}\beta_{00}+\alpha_{32}\beta_{10} + \alpha_{33}\beta_{20} + \alpha_{34}\beta_{30}=b_{3} \\ \alpha_{41}\beta_{00}+\alpha_{42}\beta_{10} + \alpha_{43}\beta_{20} + \alpha_{44}\beta_{30}=b_{4} \end{bmatrix},$$

where *α*_{ij}=*n*_{0}*Q*(*ρ*,**I**_{i−1},**I**_{j−1}), *b*_{i}=*Q*(*ρ*,**T**_{0},**I**_{i−1}), \(n_{0}=\sum _{i=1}^{N}\delta _{i0}\). Then *β*_{.0}’s can be estimated by

$$\begin{bmatrix} \beta_{00}\\ \beta_{10}\\ \beta_{20}\\ \beta_{30} \end{bmatrix}=\begin{bmatrix} \alpha_{11}&\alpha_{12}& \alpha_{13}& \alpha_{14} \\ \alpha_{21}&\alpha_{22}& \alpha_{23}& \alpha_{24}\\ \alpha_{31}&\alpha_{32}&\alpha_{33}&\alpha_{34}\\ \alpha_{41}&\alpha_{42}&\alpha_{43}&\alpha_{44} \end{bmatrix}^{-1}*\begin{bmatrix} b_{1}\\ b_{2}\\ b_{3}\\ b_{4} \end{bmatrix}.$$

Similarly, define \(n_{1}=\sum _{i=1}^{N}\delta _{i1}\), \(\alpha _{ij}^{(1)}=n_{1}Q(\rho,\mathbf {I}_{i-1},\mathbf {I}_{j-1})\), \(b_{i}^{(1)}=Q(\rho,\mathbf {T}_{1},\mathbf {I}_{i-1})\); \(n_{2}=\sum _{i=1}^{N}\delta _{i2}\), \(\alpha _{ij}^{(2)}=n_{2}Q(\rho,\mathbf {I}_{i-1},\mathbf {I}_{j-1})\), \(b_{i}^{(2)}=Q(\rho,\mathbf {T}_{2},\mathbf {I}_{i-1})\); then *β*_{.}’s can be derived accordingly.

Next, taking derivative with respect to \(\sigma _{e}^{2}\) gives

$$\begin{array}{*{20}l} \sigma_{e}^{2}=&\left\{\sum_{i=1}^{N}Q(\rho,\mathbf{y}_{i},\mathbf{y}_{i})+\sum_{i=0}^{2}\left[Q(\rho,\mathbf{T}_{i},\mathbf{T}_{i})\right.\right.\\ &-\sum_{j=0}^{3} 2\beta_{ji}Q\left(\rho,\mathbf{T}_{i},\mathbf{I}_{j}\right)\\ &\left.+\sum_{j=0}^{3}\sum_{k=0}^{3} n_{i}\beta_{ji}\beta_{ki}Q(\rho,\mathbf{I}_{j},\mathbf{I}_{k})]\right\}/(mN) \end{array} $$

With the previously derived estimators of *β*’s and \(\sigma _{e}^{2}\) also as functions of *ρ*, the log-likelihood can now be written as

$$\begin{array}{*{20}l} \log\mathcal{L}(\Theta)=&-\frac{mN}{2}\log 2\pi-\frac{N}{2}\left[(m-1)\log\left(1-\rho^{2}\right)\right.\\ &\left.+m\log\sigma_{e}^{2}\right]-\frac{mN}{2} \end{array} $$

Based on the MLE of the unknown parameters, a LRT \(\left (-2\log \Lambda \sim \chi _{8}^{2}\right)\) can be conducted comparing the aforementioned full model *H*_{1} with the null model *H*_{0} to test for virus-specific effects.

### Detecting virus-specific effects on post-infection temporal expression

The MPTGA method can be applied to a broader setting that compares the temporal gene expression traits between two different conditions, whereas the original setting proposed by Lin et al. [6] is a special case that compares two possible genotypes in a haploid system. In this paper, we are interested in detecting the virus-specific effects on the post-infection temporal gene expression trajectories. For a given virus *v*, we define *j*=0,1 as the indicator variable of a sample infected by virus *v* or other viruses. The null hypothesis *H*_{0} assumes that the temporal gene expression trait **y**_{j} share the same trajectory pattern after infected by virus *v* and other viruses:

$$H_{0}: \beta_{k0}=\beta_{k1} \qquad \text{for all}\ k$$

which is tested against the full model *H*_{1}, assuming that the temporal post-infection gene expression trait **y**_{j} with virus *v* has a unique trajectory pattern distinct from the post-infection trajectories with other viruses:

$$H_{1}: \text{at least one of the equalities does not hold}$$

The hypothesis can be tested using a likelihood ratio test.

Similarly, MPTGA can be applied to cases with three or more groups as outlined above.