Detecting virus-specific effects on post-infection temporal gene expression

Background Different types of viruses have different envelope proteins, and may have their shared or distinctive host-virus interactions which result in various post-infection effects in humans and animals. These effects often do not appear at once but take time to unfold. To characterize the virus-specific effects, we applied a Multivariate Polynomial Time-dependent Genetic Association (MPTGA) method, previously proposed for detecting differences in temporal gene expression traits, to test for the differences in mouse lung transcriptome response to infection of different subtypes of influenza A viruses. Results We compared two methods: the Multivariate Polynomial Time-dependent Genetic Association (MPTGA) method, and the conventional modified t-test, to study the virus-specific effects on mouse lung gene expression. Both methods found H3N2 to be the most different virus among the three viruses tested, with the largest number of genes with H3N2-specific effects. However, the MPTGA method demonstrated much higher power of detection, and the detected genes with virus-specific effects showed better biological relevance. Conclusions Transcriptome response to virus infection is dynamic. MPTGA which leverages temporal gene expression traits showed increased power in detecting biologically relevant virus-specific effects comparing with conventional t-test method.


Background
Influenza viruses have various types and subtypes that differ in their morbidity, virulence and pathogenicity. Influenza A virus is the most virulent type and can be further divided into several subtypes [1]. These subtypes also include different strains which may evolve over time and vary in their manifestations: zoonotic, pandemic and seasonal [2]. The highly pathogenic avian influenza H5N1 virus, for example, is a zoonotic influenza virus that is transmitted from poultry to humans. The H5N1 virus strain A/Vietnam/1203/04 was the most pathogenic strain from the 2004 outbreaks of H5N1 compare temporal gene expression traits, we proposed a multivariate polynomial temporal genetic association (MPTGA) method [6] previously in the context of genetic association with temporal gene expression. We further extended this method to a more general setting that compares the temporal gene expression trait under different conditions, which applies to the specific aim in this study-to detect the differences in the way each virus affects temporal gene expression traits over time. We also compared the MPTGA method with the conventional modified t-test to detect differential gene expression post infection of different viruses, and demonstrated significant increase in the power and specificity of our temporal method.

Detecting virus-specific effects on post-infection temporal expression
We applied MPTGA to all 24,421 genes profiled in the mouse lung dataset, testing for differential temporal postinfection gene expression between infection with each virus and the other two. In addition, we applied the moderated t-test using an empirical Bayes method implemented in the R package Limma [8] to test for differential expression based on the last time point.
Genes have different temporal trajectories in response to different subtypes of viruses. For example, Fig. 1 shows expression of Mill2 (MHC I like leukocyte 2), an mouse ortholog for the MICB gene in human. Mill2 is highly expressed in mouse lung according to Mouse ENCODE transcriptome data [9]. It involves in immunoregulatory interactions between a lymphoid and a non-lymphoid cell and innate immune system. Its related GO processes include immune response and immune response-activating cell surface receptor signaling pathway. Figures 1 and 2 [10][11][12][13][14] have investigated the immune-related functions of Mill2 (or its orthologs) and have characterized the underlying mechanisms. MICB activates the cytolytic response of natural killer (NK) cells and CD8 T-cells. As is shown in Fig. 1 with H3N2 infection, Mill2 expression trajectories demonstrate an evidently distinct pattern from the trajectories post-infection with the other two viruses, consistent with previous observations that H3N2 is less effective in induction of immune response in lung [15]. However, at the last time point, all trajectories return to similar levels of gene expression values, which suggests that the standard differential expression test based on the last time point was not able to detect the difference, whereas the MPTGA method was able to detect the subtype specific effect of post-infection temporal expression of Mill2, as shown in Fig. 2. Figure 3 provides a summary of the genes identified by MPTGA with differential post-infection temporal gene expression specific to each virus. Figure 4 provides a summary of the genes identified by the conventional differential gene expression test implemented by the R package Limma [8] as a moderated t-test based on the  last time point. MPTGA identified more temporal differential expression genes than the conventional test, with 3260 genes showing differential temporal gene expression specific to at least one virus, while only 401 genes were identified by the conventional method. In addition, the conventional test results in three disjoint sets of genes with differential gene expression specific to each virus, suggesting that the test is only able to distinguish the post-infection effect on the gene expression between at most one of the viruses and the others. On the contrary, MPTGA identified 24 genes with differential temporal expression specific to multiple viruses. These results demonstrate significant superiority of the MPTGA method over the conventional method in both the power of detection and the specificity of virus-specific effects.
In spite of the differences, both MPTGA and the conventional test resulted in the largest number of differentially expressed genes specific to H3N2. The same observation also holds for the genes uniquely specific to each virus, with H3N2 having the largest number of genes that are only found to take on a post-infection effect specific to H3N2 but not to H5N1 nor H1N1. On the contrary, H1N1 appears to be the least specific virus, with the smallest number of a temporal effect specific to H1N1.

Functional enrichment
We looked into the functional enrichment in the identified differentially expressed genes specific to H3N2 only (without being specific to H1N1 or H5N1), as H3N2 was shown to be the virus with the most specific post-infection effects by both the MPTGA and the conventional method. The MPTGA method identified 2882 genes of only H3N2specific effects, whereas the conventional method identified 390 genes of only H3N2-specific effects. Among them, 381 (98%) genes were in the common between the two. We compare the significance level of the top enriched GO categories for the 2882 genes identified by MPTGA with the significance level of these GO categories for the 390 genes identified by the conventional method in Fig. 5. The majority of the top enriched GO terms from the MPTGA identified 2882 genes of only H3N2-specific effects were related to immune or defense mechanisms. Comparing with Limma, the MPTGA method shows a consistently stronger enrichment in the immune related categories. We also look into the 9 genes identified by Limma but not by MPTGA and find no function enrichment in any immune or defense related GO categories. Next, we show the GO enrichment of the 2501 genes among the set of 2882 genes identified by MPTGA but not by Limma in Fig. 6. Most of the top enriched GO terms are related to immune/defense mechanisms or cell cycle regulation.
Previous studies [16][17][18][19][20][21][22] have compared the pathogenesis of different viruses in human cases and animal models. The H5N1 and H1N1 subtype viruses are more pathogenic and have a larger effect on lower respiratory tract comparing the H3N2 subtype virus. The pandemic 2009 H1N1 virus is shown to have similar pathogenesis to other influenza A viruses with high virulence such as the H5N1 viruses, with extension of its inflammation process of the larger airways into the alveoli causing diffuse alveolar damage, while the less virulent seasonal H3N2 virus is found to have less extension of its inflammatory process to alveoli. The difference in the morbidity and pathogenesis observed in these viruses is consistent with what we find in comparing the effect of each of the viruses on the temporal gene expression after infection, and finding the most number of genes showing specific effects unique to H3N2. Our results showed that H3N2 specific effects in lung were related to immune response and cell cycle regulation, indicating that the MPTGA method did not only detect more genes with virus-specific effects, but also revealed significant biological pathways underlying virus-specific effects.
It has been shown that innate lymphoid cells (ILCs) accumulated in lung after virus infection to promote lung tissue homeostasis [23]. Monticelli et al. [23] reports a ILC enriched signature consisting of 121 genes. We compared the signature with our virus specific genes and identified 41, 1, and 0 genes in the overlap with H3N2, H1N1, and H5N1 specific genes, respectively. The significant overlap with H3N2 specific genes (Fisher's Exact test p-value = 5.0e-10) suggests a potential mechanism explaining why H3N2 is least pathogenic to lung tissues.

Discussion
Virus infection in humans and animals is a dynamic process, with wildly different virulence and pathology specific to types or subtypes of viruses. For health-care professionals, it is essential to understand the virus-specific effects over a temporal dimension in order to accurately identify the virus and to proceed with appropriate treatment strategies.
In this paper, we studied the temporal gene expression response in mouse lung to infection with a seasonal H3N2 virus strain, a pandemic H1N1 virus strain, and a zoonotic H5N1 virus strain. We aimed to identify the virus-specific effects on the post-infection gene expression. For this purpose, we compared two methods, a Multivariate Polynomial Time-dependent Genetic Association (MPTGA) method proposed for temporal gene expression traits, and the conventional modified t-test for differential gene expression detection. Both methods showed that the seasonal H3N2 virus was the most different among the three tested, showing distinct response in the expression of the largest number of genes. The MPTGA method identified significantly more genes with virus-specific effects for each of the three viruses comparing with the modified ttest. We looked into specific examples and observed that the effects of the viruses infection varied over time, which explained why the modified t-test failed to capture the post-infection effects for many of the genes based on a single static time point. Moreover, by comparing the genes with virus-specific effects identified by the two methods, we showed that the MPTGA method identified more genes involved in immune response and cell cycle regulation pathways, consistent with previous studies comparing patho-physiological effects of these virus subtypes. Nevertheless, what virus-host interactions drive the virus specific effects can not be derived from this study. Further studies are needed to elucidate causal regulations leading to the virus specific effects.
We also generalized the MPTGA model based on two genotypes/groups to three or more genotypes/groups (the MPTGA models based on 2 or 3 genotypes/groups are noted as MPTGA2 and MPTGA3, respectively). We applied MPTGA3 to the mouse lung data set and compared results based on MPTGA2 and MPTGA3 (Fig. 7). MPTGA3 identified more genes differentially regulated among 3 subtypes of viruses. There were also genes identified by MPTGA2 but not by MPTGA3, suggesting the model selection should be based on underlying data distributions. For results based on MPTGA2, it is easier to interpret common of viruses and unique to General GO terms with more than 1500 genes associated are not displayed here each specific virus. However, it is not easy to biologically interpret the results based on MPTGA3. Thus, we focused on the results based on MPTGA2 in the "Results" section.
In the MPTGA model, we assumed expression levels of a gene followed a group mean trajectory with variance at each time following a normal distribution. To assess the performance of MPTGA in the cases where variances deviate from a normal distribution, we simulated data sets based on the empirical patterns identified in the mouse lung data and added residuals sampling from different distributions (normal, uniform, exponential distributions, and Student's t-distribution with 3, 5, and 7 degrees). Then, we applied MPTGA to the simulated data sets. The accuracy were greater than 99% for T3, T5, and exponential distributions and 100% for other distributions tested. The results are summarized in Fig. 8. In general, MPTGA was robust to residuals deviated from a normal distribution.
It is worth to note that there were lots of assumptions made in the MPTGA model, such as cubic polynomial function for mean trajectories and covariances related in first order auto correlation. The choices were made mainly due to limited number of time points in a time series data set. If a long time series is available, high degree polynomial functions and higher order correlation structure should be explored.

Conclusions
Transcriptome response to virus infection may vary between viruses and over time. The Multivariate Polynomial Time-dependent Genetic Association (MPTGA) method can be applied to detect virus-specific effects on temporal gene expression traits. It is shown to enhance the power and specificity of detection subtype specific effects, which could result a more accurate target gene set for understanding the virus pathology.

MPTGA and extension
Lin et al. [6] proposed a Multivariate Polynomial Timedependent Genetic Association (MPTGA) method to detect genetic effects in temporal gene expression trajectories. The MPTGA method assumes that for each genotype j, the temporal gene expression trait y across m time points follows a multivariate normal distribution  N (g j , ), with density function The mean vector g j for genotype j is modeled with a polynomial curve where K is the degree of the polynomial function. In this study, K was set to be 3. In particular, MPTGA assumes a first order autoregression model to take into account the auto-correlation between different time points, with the covariance matrix specified as For each sample i, its gene expression trajectory y i (t) across m time points can be written as where δ i0 and δ i1 are the indicator variables of sample i taking genotype 0 or 1.
Given the joint likelihood The maximum likelihood estimates of the parameter set = β kj , ρ, σ 2 e can be derived as described in the Lin et al. [6] paper.
Lin et al. [6] compared longitudinal trajectories between two different conditions (e.g. two possible genotypes in a haploid system). The method can be extended to accommodate three or more conditions (e.g. effects of three possible genotypes in a diploid system; e.g. effects of different viruses infection). This comparison can be done using done using a pairwise test between conditions. Alternatively, we can construct a full model to detect the (genetic) effects: for a given trait Y, then the reduced model H 0 (single gene expression trait curve) for allk can be compared against the full model H 1 (different gene expression trait curve for different conditions (genotypes/viruses infection)): H 1 : at least one of the equalities does not hold to test the hypothesis of the existence of eQTL at a locus or difference between effects of viruses by estimating these parameters with a MLE procedure and performing a likelihood ratio test. The full model approach differs from the pairwise test in that it imposes an extra condition of a shared covariance structure among all three conditions, which is estimated in the full model taking all three conditions into account; whereas the pairwise test only requires the covariance structure to be shared between each pair, which is estimated separately for each pair. Similar to the MPTGA model with two genotypes, we assume that for each genotype j = 0, 1, 2, the mean vector g j for genotype j is modeled with a polynomial curve Given the joint log-likelihood the maximum likelihood estimates of the parameter set = β 0j , β 1j , β 2j , β 3j , ρ, σ 2 e , j = 0, 1, 2 can be derived by first looking for the critical point of log L( ) by taking its derivative with respect to β's and σ 2 e , finding that both β s, σ 2 e can be expressed as functions of ρ at the critical point, therefore log L( ) can be expressed as functions of ρ at the critical point as well; then the MLE of ρ can be derived by taking derivative of log L( ) with respect to ρ, and thus β's and σ 2 e can be obtained accordingly. The detailed derivation is as follows. Define T j = N i=1 δ ij y i , j = 0, 1, 2; I 0 = [ 1 · · · 1] 1×m , I 1 = [ 1 · · · m], Taking derivative of log L( ) with respect to β .0 's gives the following linear system: ⎡ ⎢ ⎢ ⎣ α 11 β 00 + α 12 β 10 + α 13 β 20 + α 14 β 30 = b 1 α 21 β 00 + α 22 β 10 + α 23 β 20 + α 24 β 30 = b 2 α 31 β 00 + α 32 β 10 + α 33 β 20 + α 34 β 30 = b 3 α 41 β 00 + α 42 β 10 + α 43 β 20 + α 44

Detecting virus-specific effects on post-infection temporal expression
The MPTGA method can be applied to a broader setting that compares the temporal gene expression traits between two different conditions, whereas the original setting proposed by Lin et al. [6] is a special case that compares two possible genotypes in a haploid system. In this paper, we are interested in detecting the virus-specific effects on the post-infection temporal gene expression trajectories. For a given virus v, we define j = 0, 1 as the indicator variable of a sample infected by virus v or other viruses. The null hypothesis H 0 assumes that the temporal gene expression trait y j share the same trajectory pattern after infected by virus v and other viruses: for all k which is tested against the full model H 1 , assuming that the temporal post-infection gene expression trait y j with virus v has a unique trajectory pattern distinct from the post-infection trajectories with other viruses: H 1 : at least one of the equalities does not hold The hypothesis can be tested using a likelihood ratio test. Similarly, MPTGA can be applied to cases with three or more groups as outlined above.