tigaR: integrative significance analysis of temporal differential gene expression induced by genomic abnormalities

Miok, Viktorian; Wilting, Saskia M; van de Wiel, Mark A; Jaspers, Annelieke; van Noort, Paula I; Brakenhoff, Ruud H; Snijders, Peter JF; Steenbergen, Renske DM; van Wieringen, Wessel N

doi:10.1186/1471-2105-15-327

Methodology article
Open access
Published: 02 October 2014

tigaR: integrative significance analysis of temporal differential gene expression induced by genomic abnormalities

Viktorian Miok^1,2,
Saskia M Wilting²,
Mark A van de Wiel^1,3,
Annelieke Jaspers²,
Paula I van Noort⁴,
Ruud H Brakenhoff⁵,
Peter JF Snijders²,
Renske DM Steenbergen² &
…
Wessel N van Wieringen^1,3

BMC Bioinformatics volume 15, Article number: 327 (2014) Cite this article

2379 Accesses
2 Citations
Metrics details

Abstract

Background

To determine which changes in the host cell genome are crucial for cervical carcinogenesis, a longitudinal in vitro model system of HPV-transformed keratinocytes was profiled in a genome-wide manner. Four cell lines affected with either HPV16 or HPV18 were assayed at 8 sequential time points for gene expression (mRNA) and gene copy number (DNA) using high-resolution microarrays. Available methods for temporal differential expression analysis are not designed for integrative genomic studies.

Results

Here, we present a method that allows for the identification of differential gene expression associated with DNA copy number changes over time. The temporal variation in gene expression is described by a generalized linear mixed model employing low-rank thin-plate splines. Model parameters are estimated with an empirical Bayes procedure, which exploits integrated nested Laplace approximation for fast computation. Iteratively, posteriors of hyperparameters and model parameters are estimated. The empirical Bayes procedure shrinks multiple dispersion-related parameters. Shrinkage leads to more stable estimates of the model parameters, better control of false positives and improvement of reproducibility. In addition, to make estimates of the DNA copy number more stable, model parameters are also estimated in a multivariate way using triplets of features, imposing a spatial prior for the copy number effect.

Conclusion

With the proposed method for analysis of time-course multilevel molecular data, more profound insight may be gained through the identification of temporal differential expression induced by DNA copy number abnormalities. In particular, in the analysis of an integrative oncogenomics study with a time-course set-up our method finds genes previously reported to be involved in cervical carcinogenesis. Furthermore, the proposed method yields improvements in sensitivity, specificity and reproducibility compared to existing methods. Finally, the proposed method is able to handle count (RNAseq) data from time course experiments as is shown on a real data set.

Background

Cervical cancer is caused by infection with high-risk types of the human papillomavirus (HPV) followed by additional changes in the host cell genome. Insight in genes that are consistently altered over time will improve our understanding of the molecular mechanisms driving cervical carcinogenesis. These genes may provide novel biomarkers for early detection of cervical cancer as well as potential therapeutic targets. High-throughput techniques, such as microarrays and next generation sequencing, are tools for fast high-resolution genome-wide molecular profiling. Applying these techniques to measure genes at consecutive moments in time at multiple molecular levels generates a description of the occurrence of molecular abnormalities during cervical carcinogenesis.

A longitudinal in vitro system of four independent cell lines immortalized with either HPV16 or HPV18, previously shown to faithfully mimic cervical carcinogenesis at the (epi)genetic level [1–3], was used in this study. Cell lines were assayed for gene expression (mRNA) and gene copy number (DNA) with microarrays at consecutive moments in time, representing distinct stages of transformation. Abnormalities in DNA copy number were previously shown to directly affect expression of the genes located within these abnormalities and are believed to facilitate the identification of functionally relevant gene expression changes [4]. Integrating these two molecular levels will yield models of cancer development and progression, thereby reducing the complexity of (cervical) carcinogenesis [5]. We present a method that, in contrast to existing methods, is able to integrate DNA copy number and gene expression over time, while identifying temporal differential gene expression.

Available methods in current literature for time-course differential gene expression analysis can only be applied to a single molecular level. Since microarrays have become widely used for studying genome-wide gene expression, a range of statistical methods have been tailored for the identification of differentially expressed genes in microarray time-course experiments. Several of these methods are developed in an empirical Bayes framework [6–9]. Tai and Speed [9] use multivariate empirical Bayes statistics to rank time-course gene expression profiles. Their method is applicable to both single-condition and multiple-condition datasets and includes a variance stabilization imposing common matrix as a gene-specific variance-covariance matrix. Alternative approaches involve spline-based methods which fit a smoothed curve to the longitudinal data to use for statistical testing [10–12]. Storey et al. [11] use a population average time curve based on natural cubic splines to capture dynamics in gene expression levels and employ the F-test to identify significant genes. On the other hand, BATS [13] combines these two: it employs gene-wise functional modelling to explain temporal differential gene expression, which is casked in a hierarchical Bayesian framework.

In this article we present a method for identification of temporal differential gene expression driven by genomic abnormalities, introducing several new concepts. First, employing low-rank thin-plate splines and empirical Bayes shrinkage, identification of temporal differential gene expression is improved in terms of sensitivity, specificity and reproducibility. Second, including DNA copy number as a time-varying molecular covariate reduces residual variance and allows for the identification of genes which have variation in expression over time caused by genomic abnormalities. Genes with expression levels affected by DNA copy number aberrations have the capability to contribute to malignant cell growth [4, 14]. Identification of these genes is therefore essential for a better understanding of cancer development in general. Third, we impose a multivariate spatial prior for the DNA copy number effect to make the estimate more stable, borrowing information from neighboring features. Furthermore, by changing the link function our method can straightforwardly deal both with continuous and count data. To illustrate the wide applicability of our method we applied it to HPV-induced transformation (microarray) and head & neck cancer (RNA-seq) data.

Methods

A method for the identification of temporal variation in gene expression (due to genomic abnormalities) from an integrative genomics study with a time-course set-up is presented. The variation in gene expression over time is described by a generalized linear mixed model employing low-rank thin-plate splines. Hyperparameters of the model are estimated from the data with an empirical Bayes procedure. With parameter estimates at hand, we describe how relevant hypotheses may be tested. The section concludes with extensions of the model and practical considerations for its application.

Model

Consider a time-course microarray experiment where n cell lines are assayed repeatedly over consecutive time points. At each time point both the DNA copy number and mRNA gene expression of each cell line are measured. Let the random variables X_i,j,t and Y_i,j,t represent the DNA copy number and the expression level, respectively, of gene j of cell line i at time point t. Their realizations are denoted by x_i,j,t and y_i,j,t. For both random variables the index j runs from j=1 to p. This is due to the application of a matching procedure [15], which matches probes from two high-throughput platforms on the basis of their genomic location.

The expression levels of gene j are assumed to be normally distributed: $Y_{i, j, t} \sim N (μ_{i, j, t}, σ_{ε, j}^{2})$ , with $σ_{ε, j}^{2}$ the error variance of the expression levels of gene j. The mean of this distribution is modelled as:

\begin{array}{lcr} μ_{i, j, t} = E [Y_{i, j, t} | i, x_{i, j, t}, α_{j}, β_{j}, γ_{j}] \\ = f (i, x_{i, j, t}; α_{j}, β_{j}) + h (t; γ_{j}), \end{array}

(1)

where f(·;·) and h(·;·) model the fixed and random effects, respectively, of the expression level of gene j. Both f(·;·) and h(·;·) are specified next.

The fixed effects, encompassing both cell line and DNA copy number effects, are modelled by a linear regression component:

\begin{array}{lcr} f (i, x_{i, j, t}; α_{j}, β_{j}) = α_{i, j} + β_{j} x_{i, j, t}, \end{array}

where α_i,j is the effect of cell line i in gene j and β_j the DNA copy number effect on the expression levels of gene j.

The random effect captures the dynamics in the expression levels over time and is modelled nonparametrically by low-rank thin-plate splines [16]. These splines provide enough flexibility for modelling gene expression variation over time. Furthermore, the low rank approximation avoids heavy computational costs due to a large number of unknown parameters. The random effect is then modelled by:

\begin{array}{lcr} h (t; γ_{j}) = \sum_{k = 1}^{K} γ_{j, k} {|t - κ_{k}|}^{3} = Z_{t} γ_{j}, \end{array}

where Z_t=(|t−κ₁|³,…,|t−κ_K|³) and γ_j=(γ_j,…,γ_j,K)^T, the vector of coefficients of the spline. These coefficients are randomly distributed as $γ_{j} \sim N (0_{K \times 1}, Σ_{γ, j})$ . The κ_k, κ₁<κ₂< ⋯<κ_K, are fixed knots, equally distributed over the interval $[1, T]$ .

Model (1) is recasked as a semi-parametric mixed model. In a mixed model the random effects γ_j,k are assumed independent and all stemming from the same distribution $N (0, σ_{γ, j}^{2})$ . Within the context of thin plate splines this is achieved by assuming a particular parameterization of Σ_γ,j. This requires the definition of the matrix Ω with ${(Ω)}_{k_{1}, k_{2}} = | κ_{k_{1}} - κ_{k_{2}} |^{3}$ for k₁,k₂=1,…,K. Furthermore, let $U_{ω} D_{ω} V_{ω}^{T}$ be the singular value decomposition of Ω with U_ω and V_ω containing its left and right singular vectors as columns and diagonal matrix D_ω of its singular values. It is then assumed that Σ_γ,j can be written as $V_{ω} D_{ω} V_{ω}^{T}$ . Under this assumption Model (1) can be reformulated as the mixed model:

\begin{array}{lcr} Y_{i, j, t} = α_{i, j} + β_{j} x_{i, j, t} + {\tilde{Z}}_{t} {\tilde{γ}}_{j} + ε_{i, j, t}, \end{array}

(2)

where ${\tilde{Z}}_{t} = Z_{t} V_{ω} D_{ω}^{1 / 2}$ and ${\tilde{γ}}_{j} = D_{ω}^{- 1 / 2} V_{ω}^{T} γ_{j}$ . The independent random variables γ_j,k and ε_i,j,t of this model are both multivariately normal with mean zero and covariances $Cov (ε_{i_{1}, j, t_{1}}, ε_{i_{2}, j, t_{2}}) = σ_{ε}^{2}$ only if i₁=i₂ and t₁=t₂ and zero otherwise, and $Var (γ_{j}) = σ_{γ, j}^{2} I_{K \times K}$ . The choice of Ω results in a covariance matrix with higher covariances of random effects of neighboring knots (than those of more distant knots).

If we denote with Y_∗,j,∗ matrix of measurements for gene j which are first ordered by cell lines i and within a cell line by time. The likelihood for gene j thus is:

\begin{array}{l} L (Y_{*, j, *}, x_{j}; α_{j}, β_{j}, σ_{ε, j}^{2}, σ_{γ, j}^{2}) \\ = \prod_{i = 1}^{n} \prod_{t = 1}^{T} P (Y_{i, j, t}, x_{i, j, t}, Z_{t}; α_{j}, β_{j}, σ_{ε, j}^{2} | γ_{j}) P (γ_{j} | σ_{γ, j}^{2}), \end{array}

which is to be used in the estimation.

Estimation

The parameters of Model (2) are estimated by means of an empirical Bayes procedure. Empirical Bayes enables us to exploit the high-dimensionality of the data by ‘borrowing information across genes’, which yields more reproducible results. Information will be shared among genes via common hyperparameters of the priors of the model parameters. Here this sharing is done only for parameters that will be subject to inference. Other parameters are considered confounders that need to be taken into account but are not of central interest. In principle, our estimation allows common hyperparameters for these confounders, but at a computational cost.

For the fixed DNA copy number effect a Dirac-Gaussian mixture prior (a mixture of a point mass on zero and Gaussian distributions) is used:

π (β_{j}) = p_{0} I_{{β_{j} = 0}} + (1 - p_{0}) N (β_{j}; 0, τ^{2}),

where $N (β_{j}; 0, τ^{2})$ denotes the Gaussian density with parameters (0,τ²). The point mass accommodates the proportion of genes without a DNA copy number effect. As this proportion is likely to comprise the majority of genes, it shrinks the β_j to zero for those genes with a gene dosage effect.

The random effect γ_j,k and error ε_i,j,t are endowed with normal priors: $N (γ_{j, k}; 0, η_{j}^{2})$ and $N (ε_{i, j, t}; 0, ξ_{j}^{2})$ . Shrinkage is applied on the dispersion parameters $η_{j}^{2}$ and $ξ_{j}^{2}$ for two reasons: a) more stable parameter estimates and b) protection against over-fitting. The reciprocals of these parameters, $η_{j}^{- 2}$ and $ξ_{j}^{- 2}$ , represent precision, for each a Gamma distribution is used as a conjugate prior. The parameters a and b of these hyperprior Gamma distributions are estimated using the method of [17]. The amount of shrinkage via this prior is determined by the data. The procedure is initiated with a and b resulting in a very flat prior on the precision. This corresponds to a very narrow prior on the variance of the random effect, that is, a flat spline. Iteratively, should the data give rise to it, the prior of precision becomes more informative. As a result the variance moves away from zero, increasing the flexibility of the spline. Would one desire more shrinkage our procedure allows the employment of a hyperprior composed of a Gamma and a point mass at zero.

For the fixed cell line effect α_i,j a Gaussian prior is assumed: $N (α_{i, j}; μ_{i, j}, ν_{j}^{2})$ . No shrinkage (via information borrowing) is applied to α_i,j as it is considered a confounder, for which an unbiased estimate is preferred.

We temporarily assume that the cell lines are merely biological replicates and no inferential statement with respect to the cell line effect will be made in the Section ‘Head-and-neck cancer’. Hence, for the moment the prior of the cell line effect α_i,j is $N (μ_{i, j}, ν_{j})$ . The variance of this prior depends on index j: the hyperparameter of this prior is different for each gene.

Given the hyperparameters shared by all genes, the model parameters of the individual genes are estimated by the mean of their posterior distributions. These are obtained by means of integrated nested Laplace approximations (INLA) [18]. INLA yields the marginal posterior distribution of a model parameter through integration of the posterior distribution over all remaining model parameters. This can be done computationally efficient by means of Laplace approximations under the assumption of a Gaussian prior on the model parameters. The use of approximated posterior distributions (instead of the possibly more exact posterior produced by Markov chain Monte Carlo (MCMC)) is motivated computationally: the posterior distributions of model parameters of many genes need evaluation.

It remains to choose the hyperparameters. An informed choice of the hyperparameters is made through application of the empirical Bayes procedure of [17]. That is, the hyperparameters are estimated from the data rather than set prior to the analysis. Apart from a less subjective choice of the hyperprior, this has two favorable consequences. First, it yields more reproducible results (confer the Section ‘Comparison’). Second, the hyperpriors employed cause shrinkage of the model parameter estimates. In particular, for the random effect it shrinks the dispersion parameter, which controls the smoothness of the fit [19]. Hence, it constrains the flexibility of the spline and thus reduces the risk of overfitting. An illustration of this effect on the fitted spline can be found in Additional file 1, Section 1.

The conventional empirical Bayesian estimate maximizes the product of the marginal likelihoods:

\begin{array}{l} \prod_{j = 1}^{p} \int_{0}^{\infty} \int_{R} \int_{R^{n}} L (Y_{*, j, *}, x_{j}; α_{j}, β_{j}, σ_{ε, j}^{2}, σ_{γ, j}^{2}) π (α_{j}) π (β_{j}) \\ \times π (σ_{γ, j}^{2}) d α_{j} d β_{j} d σ_{γ, j}^{2}, \end{array}

(3)

where π(·) denotes the prior of its argument (hyperparameters of the priors are suppressed for ease of notation). The hyperparameters of the priors of the α_j, β_j, and $σ_{γ, j}^{2}$ are estimated by maximization of loss function (3). Hereto equate the derivative of (3) with respect to the hyperparameters to zero and solve. The iterative procedure of [17] yields an approximate solution to estimating equation (3). The procedure of [17] is computationally fast and allows non-parametric hyperpriors.

From expression (3) it becomes clear how the parameter estimates are shrunken. The prior of (say) β_j is shared by all genes. Hence, the choice of the hyperparameters affects the posterior distribution of all β_j. In particular, if the mass of π(β_j) is more concentrated around zero, the posterior has more probability mass close to zero. Only if the data contains enough evidence (in favor of a non-zero β_j) to outweigh the prior, the posterior will center around a non-zero value. The prior of β_j puts (via the spike at zero) more mass at zero. Moreover, the precision of the other mixture component is estimated from all genes. Under the assumption that a (vast) majority of genes do not exhibit an effect, the precision is under-estimated for the minority. This, together with the spike, yields a conservative prior leading to shrunken estimates.

Finally, we point out that the procedure described in [17] assumes covariates to be identical over features. In our setting the DNA copy number covariate varies over the features. It would be more appropriate to assume a different Dirac-Gaussian prior for each group of features that shares the same aberration pattern over the samples. These mixture priors differ in the variance of their Gaussian part. This could – in principle – be accommodated by a mixture of a point mass at zero and a lot of Gaussians, all with mean zero but different variances. However, a mixture of Gaussians with the same location but different variance may be approximated by a t-distribution with degrees of freedom equal to the number of Gaussians [20]. However, this approximation improves as the number of Gaussians in the mixture increases. On the other hand, a t-distribution with many degrees approaches a Gaussian. As there are usually many different genomic aberration patterns, we exploit this approximation and assume a common variance in our Dirac-Gaussian mixture prior. In a simulation study (Additional file 1, Section 2) we show that the t-distribution approximation does not differ substantially from the mixture of Gaussians with common mean but different variances.

Hypothesis testing

From a biological point two questions are of main interest: i) does DNA copy number drive gene expression, and ii) is there differential expression over time? The former question can be answered by testing whether the DNA copy number effect β_j differs significantly from zero, evaluating the null hypothesis H₀:β_j=0 versus its alternative H₀:β_j≠0. The second question is addressed by testing $H_{0} : σ_{γ, j}^{2} = 0$ against the alternative $H_{A} : σ_{γ, j}^{2} > 0$ . The alternative implies that there is at least one γ_j,k≠0, resulting in a non-constant spline. Additionally, one may ask whether there is a difference between the cell lines, but this question is not considered here (although it could straightforwardly be addressed within the presented framework).

Both hypotheses are evaluated by means of the likelihood ratio statistic. For the first question on DNA copy number the statistic is:

\begin{array}{l} D_{j} = log [L ({\hat{α}}_{j}^{(H_{A})}, {\hat{β}}_{j}^{(H_{A})}, {\hat{σ}}_{γ, j}^{2, (H_{A})}, {\hat{σ}}_{ε, j}^{2, (H_{A})}] \\ - log [L ({\hat{α}}_{j}^{(H_{0})}, 0, {\hat{σ}}_{γ, j}^{2, (H_{0})}, {\hat{σ}}_{ε, j}^{2, (H_{0})}], \end{array}

where e.g. ${\hat{α}}_{j}^{(H_{A})}$ is the estimate of α_j under the alternative hypothesis H_A. P-values are then obtained from the asymptotic (chi-square) distribution of the likelihood ratio statistics. The degrees of freedom of this chi-square distribution are equal to the difference in the number of parameters of the models compared. Note that this test is likely to be somewhat conservative, because ${\hat{β}}_{j}^{(H_{A})}$ shrinks towards the null domain. The Dirac-Gaussian mixture prior, however, prevents overly conservative behavior, because it better separates the null genes from the non-null ones than a simple Gaussian prior. When testing $H_{0} : σ_{γ, j}^{2} = 0$ , the degrees of freedom consumed by the penalized splines are estimated by the trace of hat matrix (details: Additional file 1, Section 3). Finally, to account for multiplicity the False Discovery Rate (FDR) is controlled by means of the procedure of [21].

This hybrid testing procedure (empirical Bayesian estimation with p-value based inference) mimics Limma [22, 23]. Limma obtains a shrunken (empirical Bayes) estimate of the variance and detects differential expression using a classical t-test (after adjusting the degrees of freedom). Here too we borrow strength (information) across genes to arrive at shrunken parameter estimates, and provide classical p-values. The latter meets the wishes of medical researchers. They commonly report p-values rather than Bayes factors. In addition, multiple testing corrections are generally more rigorous in a classical setting than in a Bayesian one, because the latter typically provides FDR estimation rather than control.

Spatial multivariate prior

DNA copy number aberrations are often not confined to a single gene but span a large region of the genome that harbors multiple genes. Consequently, neighboring genes may share the same genomic aberration signature. At the transcriptomic level this results in co-expression of these genes [24]. Put differently, the DNA copy number effect (β_j) of gene j may be correlated with that of neighboring genes (e.g. β_j−1 and β_j+1). This phenomenon is not accommodated by the aforementioned prior of the β_j.

In this section we describe an extension of our procedure that incorporates the possible spatial correlation among the DNA copy number effects. To this end it is assumed that the β_j follows a first-order autoregressive (AR(1)) process along the genome: β_j=ρ β_j−1+ε_j. The relevant parameter of this process is simply estimated by regression of the β_j on the β_j−1.

Having obtained an estimate of the spatial correlation among the β_j, it rests to refit the model. However, the assumption of an AR(1) process on the gene dosage effect complicates the refitting as this should now be done simultaneously for all p genes. This is computationally too demanding. To approximate the joint fit the model is refitted per triplet of neighboring genes (e.g. for β_j−1,β_j,β_j+1). For each triplet a trivariate normal prior is assumed (all other priors as before):

\begin{array}{l} (\begin{array}{l} β_{j - 1} \\ β_{j} \\ β_{j + 1} \end{array}) \sim N ((\begin{array}{l} 0 \\ 0 \\ 0 \end{array}), (\begin{array}{l} σ_{j - 1}^{2} & σ_{j - 1} σ_{j} ρ & σ_{j - 1} σ_{j + 1} ρ^{2} \\ σ_{j - 1} σ_{j} ρ & σ_{j}^{2} & σ_{j} σ_{j + 1} ρ \\ σ_{j - 1} σ_{j + 1} ρ^{2} & σ_{j} σ_{j + 1} ρ & σ_{j + 1}^{2} \end{array})), \end{array}

where the correlation structure of the covariance matrix follows an AR(1) process. For the re-estimated vector of β_js only the middle one is conserved. More details are provided in Additional file 1, Section 4.

Besides doing more justice to the underlying biology the ‘spatial prior’ above reduces the variation of the DNA copy number effect. This is achieved as the assumption of an AR(1) process effectively ‘averages’ the DNA copy number effect over neighbouring genes. As such, it is also a way of borrowing information across genes.

Practical considerations

A straightforward extension of Model (2) is to allow for a different spline in each cell line. This reflects the biological plausibility of different dynamical behaviour in different cell lines. In particular, we may then test for differences in the behavior over time between the cell lines. When rewriting Model (2) to a vector notation, the incorporation of different splines per cell line amounts to the replacement of $\tilde{Z} = \tilde{Z} \otimes 1_{n \times n}$ by $\tilde{Z} = \tilde{Z} \otimes I_{n \times n}$ (the operator ⊗ is the Kronecker product), and adjusting the parameter vector γ_j accordingly. Figure 1 illustrates the difference between the same and different spline models.

Within mixed Model (2) DNA copy number and time (via the spline) compete to explain the gene expression. Due to its flexibility the spline may consume variation in expression levels actually due to DNA copy number changes. Moreover, with DNA copy number changes being a more clearly delineated cause (than time in the form of a nonparametric spline), we prefer to attribute variation in expression levels to the genomic aberration. To let DNA copy number changes prevail over the spline, the design matrix Z of the latter is orthogonalized to the DNA copy number data. This orthogonalization does not affect the overall fit, but ensures that the spline captures only variation in expression levels that cannot be explained by DNA copy number changes. The effect of the orthogonalization is illustrated in the Section ‘HPV-induced transformation’.

To determine the optimal number of knots, we employ the deviation information criterion (DIC) [25]. The DIC is a measure for the balance between fit and complexity. For each gene the DIC of the model is estimated for different numbers of knots. The number of knots that over all genes yields the best DIC is chosen as the optimal k and applied to all genes in the analysis. More details are provided in Additional file 1, Section 5.

Results and discussion

HPV-induced transformation

The proposed method is demonstrated on data of an experiment on HPV-induced transformation. The experiment intends to faithfully mimic cervical cancer development employing a HPV-immortalized in vitro cell line model. Hereto two cell lines are affected with HPV16 and two with HPV18 [26]. Over time these cell lines acquire genomic and transcriptomic changes. To assess these changes, the genomic and transcriptomic characteristics of the cell lines are measured at eight time points by means of oligonucleotide microarrays. The preprocessing of the DNA copy number data comprises of median normalization and segmentation using the circular binary segmentation (CBS) method [27]. Similarly, the gene expression data are background corrected using robust multi-array average (RMA) [28] and between-array normalized by the robust quantile method. Finally, the resulting expression intensity values are transformed using the variance stabilizing transformation [29]. DNA copy number is assigned to each transcript from the expression array by the overlapPlus matching procedure of [15] which uses chromosomal location information. The final data set contains genomic and transcriptomic information on 37768 features, however in this section analysis is performed only on one chromosome which contains 2202 features.

We now turn to the identification of genes with differential expression over time. To this end only the expression data features are used, ignoring the effect of genomic aberrations in Model (2). Gene j exhibits temporal gene expression if $H_{0} : σ_{γ, j}^{2} = 0$ is rejected. This corresponds to a spline differing from a flat line, and thus indicating changes in expression levels over time. Temporal gene expression is identified both with a common and different spline(s) for the cell lines. In both analyses the optimal number of knots equals two (determined by the procedure described in the Section ‘Practical considerations’). Specification of the prior distribution and description of the hyperparameter estimations for $η_{j}^{2}$ can be found in the Section ‘Estimation’.

Table 1 (the first row) shows the number of features that exhibit significant temporal differential expression at a 5% false-discovery rate. Considerably more significant genes are identified when allowing for a different spline per cell line. This is in line with expectations as it is the more flexible model. Often it also does more justice to the biology, as many of its selected genes indeed show different behavior over time among cell lines. The common spline model may yield a poor fit for these genes, in particular if there is opposing behavior among the cell lines. However, the common spline model is effective in identifying genes which are consistently up- or down-regulated in all cell lines. The model with a common spline found 74 significant features not identified by the model with a different spline per cell line. In the latter analysis these genes have p-values close to but just short of the significance threshold.

Table 1 The number of significant probes identified in the analysis for temporal differential expression and copy number (CN) effect

Full size table

The common and different spline models employed for the identification of temporal differential expression are now extended to include DNA copy number (as originally proposed in the Section ‘Model’). As noted in the Section ‘Practical considerations’ the flexibility of the spline may consume part of the DNA copy number effect. The proposed remedy limits (via projection) the spline basis to the space orthogonal to space spanned by the DNA copy number information. To assess the potential gain of the orthogonalization each analysis, with common and different spline(s), is done with and without orthogonalization. Prior distributions are as before. The number of knots is determined as done previously (optimal number still equals two). For each analysis hyperparameters of DNA copy number and spline(s) are re-estimated by the empirical Bayes procedure.

We first discuss the number of features with differential temporal expression (given in the second row of Table 1). As in the analysis without DNA copy number, the different spline model identifies substantially more features than the model with a common spline for the cell lines. Orthogonalization of the (common) spline basis onto the DNA copy number data misses one feature in comparison to the non-orthogonalized analysis. This feature is found only with the non-orthogonalized spline basis. In the latter analysis it only passes the significance threshold by a small margin. Hence, in these data the effect of orthogonalization on the identification of temporal differential expression is limited.

Turning to the effect of DNA copy number, we first analyzed the data with Model (2) containing only the fixed cell line and DNA copy number effect. This analysis identified 568 features with a significant gene dosage effect on expression. Inclusion of the time effect in Model (2) reduces the number of features with a significant gene dosage effect on expression (third row of Table 1). This is due to the fact that the spline competes with DNA copy number to explain the variation in expression levels: the former (being more flexible) captures variation caused by the latter. That aside, here too we see that the improved fit (now due to the increased flexibility of a different spline per cell line) yields a surge in the number of findings. The orthogonalization of the spline basis identify only one additional feature (using common splines) compared to the standard analysis. The effect of orthogonalization is more visible in the gene dosage effect β_j, as can be witnessed from Figure 2. Clearly, orthogonalization moves the distribution of the β_j’s to the right (the positive domain, which corroborates with the biologically expected direction of the effect). Finally, on the full data set (not shown) the analysis using the orthogonalized spline basis gives a modest improvement in the number of genes significantly affected by DNA copy number.

The effect of DNA copy number changes is also analyzed with the spatial prior discussed in the Section ‘Spatial multivariate prior’. This prior aims to capture roughly the spatial dependency between consecutive features, thus hoping to do more justice to the underlying biology. Clearly, estimation of the gene dosage effects with the spatial prior improves the lag one partial correlation among these effects (confer Figure 3). Changes in the actual parameters are noticeable but small. This has limited effect on the fit of the model. Consequently, the significance analysis of the DNA copy number effect identifies almost the same number of significant features (fourth row of Table 1). Spatial prior imposed reduces the variation among contiguous features as, effectively, it has a smoothing effect on estimated DNA copy number effects. Figure 4 illustrates the spatial effect by showing the fit of the model with and without spatial priors on the DNA copy number parameter (in Additional file 1, Section 7 illustrates this effect in all four cell lines). As in the temporal differential expression analysis, the number of features identified with standard or orthogonal spline basis hardly differs.

Another striking feature of Table 1 is the difference between the number of significant features with differential temporal expression and those with a gene dosage effect: the former exceeds the latter (in case of the full model with different splines per cell line). In part this difference is explained by the flexibility of the splines. But also by the presence of many other regulators of gene expression (e.g. microRNA, methylation, transcription factors) that result in temporal differential expression are captured by the splines. On the other hand, comparing the analyses with a common spline more significant gene dosage effect than temporal differential expression is found. This is due to the fact that DNA copy numbers may strongly correlate with gene expression over time. Features with expression levels that do not consistently (over cell lines) co-vary with DNA copy number are missed by the common spline model (and not including the gene dosage effect).

To give some more tangible insight into the results from the analyses above, we single out the CADM1 gene. It is among the genes with the highest significance for temporal differential expression (using Model (2) with a common spline). Indeed, CADM1 is down-regulated in all four cell lines over time (confer Figure 5). This gene is well-known in cervical cancer to be down-regulated during progression due to an increasingly methylated promoter region [1, 30]. The fits of Model (2) with and without DNA copy number hardly differ. The estimate of the DNA copy number parameter (obtained with the common spline model) confirms this ( ${\hat{β}}_{CADM1} = 0.1$ ). This suggests that the down-regulation of CADM1 is not due to a DNA copy number loss, which corresponds with previous findings (e.g. [1]).

To contrast the results for CADM1, we focus on the SLC25A36 gene. SLC25A36 exhibits temporal differential expression (irrespective of the choice for common or different spline model). SLC25A36 is also identified as a gene with a significant DNA copy number effect with estimate ${\hat{β}}_{SLC25A36} = 1.6$ . Inclusion of DNA copy number in the model improves the fit substantially (confer Figure 6, illustration on all four cell lines in Additional file 1, Section 7). This is seen in Figure 6 as the difference in fit for the model with and without DNA copy number. These results for SLC25A36 corroborate with existing medical literature: the gene dosage effect for this gene has already been reported in cervical cancer [31].

Finally, we want to assess the sensitivity of the results with respect to the choice of the prior distribution. For illustration purposes we focus on the hyperprior of the random effect γ_k,j. In the Section ‘Estimation’ we suggest to use the Gamma distribution. However, we have also implemented a mixture of a point mass at zero and Gamma distribution, which one expects to lead to more shrinkage. Model (2) using different splines and a standard design matrix is refitted now with this mixture prior for the random spline effect. Application of our empirical Bayes procedure with the Gamma prior identified 421 features, while the Dirac-Gamma mixture prior selected 396 features. The latter 396 are all included in the former 421 features. The slight reduction in the number of selected features is of course due to the inclusion of the point mass at zero. The fit of both resulting models is almost identical for most features, but for some features with a slightly less flexible spline as in Figure 7 (Additional file 1, Section 7 illustrates effect in all four cell lines).

Head-and-neck cancer

To illustrate the wide applicability of our framework, we present the analysis of sequencing data from a head-and-neck cancer study endowed with a time-course set-up. Oshlack et al. [32] noticed that there are currently no appropriate methods for the analysis of RNA-seq data from time-course experiments. The head-and-neck study aims to identify temporal differential expression due to overexpression of a particular microRNA. MicroRNAs are small 20-22nt non-coding RNAs that inhibit expression of their target genes. The microRNAs recognize these genes by their seed sequence that is complementary to a sequence in the 3’ untranslated region (UTR) of the target transcripts. One gene can be targeted by multiple microRNAs and one microRNA may target a multitude of genes. It is therefore not simple to find a specific relevant target gene of a microRNA. In a previous functional screen microRNAs were identified that specifically kill head and neck cancer cells, but not normal cells [33]. The respective target genes of these microRNAs were to be identified in a follow-up experiment. In this follow-up experiment cells of a squamous cell carcinoma cell line were transfected by a microRNA mimic and a control. The transfected cells were grown in vitro and sampled at six time points. Transcript levels of the 2×6 samples were sequenced. Data were mapped to the human genome and raw count data (reads) per gene transcript were used and not summarized per gene. Their normalization comprises rescaling by a the trimmed mean of each sample’s library (following [32]). Normalized data are rounded to the closest integer to retain the count interpretation of the data.

Model (2) cannot be directly applied to the sequencing data, as the normal distribution is often a poor approximation for the distribution of counts. The normality assumption is replaced by the (zero-inflated) negative binomial [17, 34]: $Y_{i, j, t} \sim Z I$ - $N ℬ (μ_{i, j, t}, ϕ_{ε, j})$ . The mean μ_i,j,t of the counts is (after transformation by the inverse of the link function) still modeled by the right-hand side of Model (2) with assumptions on model parameters in place. Hyperparameters are then estimated via the empirical Bayes procedure previously described.

The analysis of the head-and-neck cancer data concentrates on two main questions: identification of tags with temporal variation and those different between the two conditions. To answer this, Model (2) is used without the DNA copy number term (which is not included in the experiment). Common and different spline models are employed as in the Section ‘HPV-induced transformation’. Parameter γ_j is the main parameter of interest and the analysis compares the model with and without time effect. The optimal number of knots (again two, for both models) is determined using the procedure described in the Section ‘Practical considerations’. Prior distributions for cell line and time effect are as in the Section ‘Estimation’. Hyperparameters are estimated for each analysis separately, but only the variance of the random time effect is shrunken via the empirical Bayes procedure. Counts of are fitted using the model with same and different splines as illustrated for one RNA-seq tag in Figure 8.

The number of tags with a significant (at the 5% FDR level) temporal variation identified equals 8416 (10951) for the common (different) spline model. As observed in the analysis of the HPV-induced transformation data, the use of a different spline leads to many more findings. Again, this is explained by the improved fit due to a more flexible model. In particular, all the tags identified with the same spline model are also found by its flexible counterpart.

Comparison

The proposed method is compared to three well-known alternatives for significance analysis of time-course microarray data: EDGE, [11], timecourse, [9], BATS [13, 35], and a reference method. The reference method comprises a standard frequentist approach. It fits a linear mixed-effect model, while the null hypothesis is evaluated through an analysis of variance approach (anova), comparing two nested models (with and without random effects).

These competitors have not been designed for the analysis of integrative genomics studies with a time-course set-up. Hence, our method is applicable to a wider class of studies. Besides this qualitative argument, we wish to have a quantitative comparison of the methods. To this end the comparison is restricted to time-course genomics studies involving only a single molecular level. Moreover, to avoid bias of any of the methods by a particular model choice, the comparison is done on two real data sets. The first is the HPV-induced transformation data from the Section ‘HPV-induced transformation’, limited to the gene expression levels only. The other data set is included in the EGDE-package [11], where gene expression has been monitored in four individuals from a control and endotoxin-treated group. Samples have been collected at five different time points: 2, 4, 6, 9 and 24 hours after treatment. One individual lacks data for the control group at two time points (4 and 6 hours). Since the timecourse method cannot deal with missing time-points this individual is omitted from the analysis. At each time point expression levels of 800 genes are available.

We now briefly describe the other methods used in the comparison: EDGE, BATS and timecourse. For a more detailed description please refer to the corresponding references.

EDGE ([11]) captures the temporal variation in the expression levels of gene j by means of a p-dimensional B-spline basis. Temporal differential expression is evaluated by an F-statistic measuring the goodness-of-fit of the null hypothesis (a flat or constant spline) in comparison to the alternative hypothesis. In the comparison EDGE is used with default parameter settings.

Method timecourse ([9]) uses novel multivariate empirical Bayes statistic to rank time-course gene expression profiles. Gene expression in timecourse method is assumed to follow a multivariate normal distribution with gene-specific mean and covariance. Conjugate priors are assumed on the unknown parameters. Hyperparameters of the conjugates are estimated from the data. Timecourse yields stable variance estimates by borrowing (co)variance information across genes. The posterior distribution and test statistics are obtained in an analytic form. Genes may be ranked using either Hotelling T² or MB-statistics. For the comparison we used the timecourse R-package with standard settings and Hotelling T²-statistic, due to the balancedness of the study design (equal number of replicates per gene).

Finally, BATS ([13]) which combines characteristics from previously described methods. Similar to EDGE gene expression variation over time is modelled by a polynomial function, while imposing a hierarchical Bayesian model on the parameters (as timecourse). BATS is flexible in its choice of the prior for dispersion related parameters, it offers delta, inverse Gamma and exponential priors. Significance analysis is based on the genes’ Bayes factors, while multiplicity correction is addressed in Bayesian manner ([36]).

Sensitivity and specificity of the four aforementioned methods are compared in both data sets. Hereto knowledge of the genes with true temporal differential expression is needed. In its absence we constructed a consensus set which fulfills this role. That consensus set comprises of the features identified by all four methods. Sensitivity is then the proportion of features with temporal differential expression correctly identified as such. On the other hand, specificity is the proportion of features which are correctly identified as features without differential expression over time (hence, rightly not significant). Sensitivity and specificity of each method are assessed for various numbers of significant features.

Figure 9 presents the resulting sensitivity and specificity for the HPV-induced transformation data. The left panel of Figure 9 compares the sensitivity. While BATS, timecourse and EDGE are more or less on a par, they all have a lower true positive rate than tigaR. With respect to the specificity, the methods are more or less on a par with tigaR having a slightly lower false positive rate than the other methods. This is confirmed (though much less pronounced) by the results from the EDGE-package data (see Additional file 1, Section 6). For both sensitivity and specificity the methods perform similarly with tigaR having a marginal lead.

Furthermore, we compared the reproducibility of the five methods (now including reference method). Hereto each data set was divided into two equally sized groups. We assessed how well the results of the two splits coincided. This boils down to the application of each method on both splits. The overlap in significant features for each method was determined.

Figure 10 shows the reproducibility of each method on the data set from the Section ‘HPV-induced transformation’. It reveals that BATS and tigaR reproduce substantially better than the competitors. This is confirmed in the other data set (see Additional file 1, Section 6). The superior reproducibility is most likely a consequence of the empirical Bayes approach (borrowing of information stabilizes estimates) in tigaR and very informative priors in BATS. The tigaR, BATS and timecourse-methods do much better than EDGE in both data sets. The former three all exploit the Bayes principle (in different ways though) which improves estimates of the variance parameters, while EDGE does not.

Conclusions

We presented a method for the analysis of integrative (onco) genomics studies with a time-course experimental design. The method identifies temporal differential gene expression while accounting for time-varying molecular covariates like DNA copy number changes. Simultaneously, the method assesses which of these covariates significantly contributes to temporal differential gene expression. The method employs a mixed model describing the temporal changes in gene expression in terms of DNA copy number and a (low-rank thin-plate) spline which captures additional temporal variation in the transcript levels. The method estimates the parameters of this model by means of an empirical Bayes procedure that ‘borrows information’ across genes. The empirical Bayesian procedure shrinks the parameter estimates (towards zero), thus accounting for multiplicity. This shrinkage enhances the reproducibility of the results. In a direct comparison with other methods for the identification of temporal differential expression, the proposed method proved to be a strong competitor, particularly in terms of reproducibility. In addition existing methods cannot incorporate additional genomics data. Furthermore, our method is straightforwardly applicable to count data resulting from RNA-seq experiments. Application to an integrative oncogenomics study, involving HPV-transformed cell lines, confirmed genes CADM1 and SLC25A36, known to be implicated in the development of cervical cancer. The presented methodology also identified other, novel and potentially interesting genes. These are currently under investigation and will be reported in a follow-up medical paper. Preliminary pathway analysis already showed that genes identified from this dataset by tigaR but not by the other methods were enriched for genes involved in cellular transformation.

Our ongoing research concentrates on two extensions of the proposed method. First, we are considering the inclusion of microRNA data. MicroRNAs affect expression levels post-transcriptionally. However, which microRNA targets which mRNA is only partially known. Hence, integration of temporal microRNA expression data also needs to address the problem of selecting the microRNAs targets. With the number of microRNAs known and typically measured in time-course integrative genomics studies being larger than the number of samples (# time points × # cell lines) this adds an additional layer of complexity to the problem.

The second extension comprises the integration of pathway information. This requires a multivariate formulation of the model for temporal changes in gene expression. Next to DNA copy number changes now the changes in transcript levels of other genes in the pathway may need to be included. A key challenge here is to ‘borrow information’ within and between pathways.

The methodology described in this paper is implemented in the R-package tigaR available upon request from the first author (v.miok@vumc.nl).

References

Steenbergen RDM, Kramer D, Braakhuis BJM, Stern PL, Verheijen RHM, Meijer CJLM, Snijders PJF: TSLC1 gene silencing in cervical cancer cell lines and cervical neoplasia. J Natl Cancer Inst. 2004, 96: 294-305. 10.1093/jnci/djh031.
Article PubMed CAS Google Scholar
Wilting SM, Snijders PJF, Meijer GA, Ylstra B, Van den Ijssel PR, Snijders AM, Albertson DG, Coffa J, Schouten JP, Van de Wiel MA, Meijer CJLM, Steenbergen RDM: Increased gene copy number at chromosome 20q are frequent in both squamous cell carcinomas and adenocarcinomas of the cervix. J Pathol. 2006, 209: 220-230. 10.1002/path.1966.
Article PubMed CAS Google Scholar
Henken FE, Wilting SM, Overmeer RM, Van Rietschoten JG, Nygren AOH, Errami A, Schouten JP, Meijer CJLM, Snijders PJF, Steenbergen RDM: Sequential gene promoter methylation during HPV-induced cervical carcinogenesis. Br J Cancer. 2007, 97: 1457-1464. 10.1038/sj.bjc.6604055.
Article PubMed Central PubMed CAS Google Scholar
Vogelstein B, Kinzler KW: Cancer genes and the pathways they control. Nat Med. 2004, 10: 789-799. 10.1038/nm1087.
Article PubMed CAS Google Scholar
Hanash S: Integrated global profile of cancer. Nat Rev Canc. 2004, 4: 638-644. 10.1038/nrc1414.
Article CAS Google Scholar
Efron B, Tibshirani R, Storey JD, Tusher V: Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc. 2001, 96: 1151-1160. 10.1198/016214501753382129.
Article Google Scholar
Lönnstedt I, Speed TP: Replicated microarray data. Statistica Sinica. 2002, 12: 31-46.
Google Scholar
Eckel JE, Gennings C, Chinchilli VM, Burgoon LD, Zacharewski TR: Empirical Bayes gene screening tool for time-course or dose-response microarray data. J Biopharm Stat. 2004, 14: 647-670. 10.1081/BIP-200025656.
Article PubMed CAS Google Scholar
Tai YC, Speed TP: A multivariate empirical Bayes statistics for replicated microarray time course data. Ann Stat. 2006, 34: 2387-2412. 10.1214/009053606000000759.
Article Google Scholar
Bar-Joseph Z, Gerber GK, Lee TI, Rinaldi NJ, Yoo JY, Robert F, Gordon DB, Fraenkel E, Jaakkola TS, Young RA, Gifford DK: Computational discovery of gene modules and regulatory networks. Nat Biotechnol. 2003, 21: 1337-1324. 10.1038/nbt890.
Article PubMed CAS Google Scholar
Storey JD, Xiao W, Leek JT, Tompkins RG, Davis RW: Significance analysis of time course microarray experiments. Proc Natl Acad Sci U S A. 2005, 102: 12837-12842. 10.1073/pnas.0504609102.
Article PubMed Central PubMed CAS Google Scholar
Hong F, Li H: Functional hierarchical models for identifying genes with different time-course expression profiles. Biometrics. 2006, 62: 534-544. 10.1111/j.1541-0420.2005.00505.x.
Article PubMed CAS Google Scholar
Angelini C, De Canditiis D, Mutarelli M, Pensky M: A Bayesian approach to estimation and testing in time-course microarray experiments. Stat Appl Genet Mol Biol. 2007, 6: 1-30.
Google Scholar
Bierkens M, Krijgsman O, Wilting SM, Bosch L, Jaspers A, Meijer GA, Meijer CJLM, Snijders PJF, Ylstra B, Steenbergen RDM: Focal aberrations indicate EYA2 and hsa-miR-375 as oncogene and tumor suppressor in cervical carcinogenesis. Gene Chromosome Canc. 2013, 52: 56-68. 10.1002/gcc.22006.
Article CAS Google Scholar
Van Wieringen WN, Unger K, Leday GGR, Krijgsman O, de Menezes R, Ylstra B, Van de Wiel MA: Matching of array CGH and gene expression microarray features for the purpose of integrative genomic analysis. BMC Bioinformatics. 2012, 13: 80-10.1186/1471-2105-13-80.
Article PubMed Central PubMed Google Scholar
Crainiceanu CM, Ruppet D, Wand MP: Bayesian analysis for penalized spline regression using WinBUGS. J Stat Software. 2005, 14: 1-47.
Article Google Scholar
Van de Wiel MA, Leday GGR, Pardo L, Rue H, Van der Vaart AW, Van Wieringen WN: Bayesian analysis of RNA sequencing data by estimating multiple shrinkage priors. Biostatistics. 2013, 14: 113-128. 10.1093/biostatistics/kxs031.
Article PubMed Google Scholar
Rue H, Matrino S, Chopin N: Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximation. J Roy Stat Soc B Stat Meth. 2009, 71: 319-392. 10.1111/j.1467-9868.2008.00700.x.
Article Google Scholar
Ruppert D, Wand MP, Carroll RJ: Semiparametric regression during 2003–2007. Electron J Stat. 2009, 3: 1193-10.1214/09-EJS525.
Article PubMed Central PubMed Google Scholar
Peel D, McLachlan GJ: Robust mixture modeling using the t distribution. Stat Comput. 2000, 10: 339-348. 10.1023/A:1008981510081.
Article Google Scholar
Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc B. 1995, 57: 289-300.
Google Scholar
Smyth GK: Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 2004, 3: 3-
Google Scholar
Robinson MD, McCarthy DJ, Smyth GK: edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010, 26: 139-140. 10.1093/bioinformatics/btp616.
Article PubMed Central PubMed CAS Google Scholar
Van Wieringen WN, Berkhof J, Van de Wiel MA: A random coefficients model for regional co-expression associated with DNA copy number. Stat Appl Genet Mol Biol. 2010, 9: 1-30.
Google Scholar
Spiegelhalter DJ, Best NG, Carlin BP, van der Liner A: Bayesian measures of model complexity and fit. J Roy Stat Soc B Stat Meth. 2002, 64: 583-639. 10.1111/1467-9868.00353.
Article Google Scholar
Steenbergen RDM, Walboomers JMM, Meijer CJLM, Van der Raaij-Helmer EMH, Parker JN, Chow LT, Broker TR, Snijders PJF: Transition of human papillomavirus type 16 and 18 transfected human foreskin keratinocytes towards immortality: activation of telomerase and allele loss at 3p, 10p, 11q and/or 18q. Oncogene. 1996, 13: 1249-1257.
PubMed CAS Google Scholar
Olshen AB, Venkatraman ES, Lucito R, Wigler M: Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics. 2004, 5: 557-572. 10.1093/biostatistics/kxh008.
Article PubMed Google Scholar
Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003, 4: 249-264. 10.1093/biostatistics/4.2.249.
Article PubMed Google Scholar
Huber W, Von Heydebreck A, Sültmann H, Poustka A, Vingron M: Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics. 2002, 18: 96-104. 10.1093/bioinformatics/18.suppl_1.S96.
Article Google Scholar
Overmeer RM, Henken FE, Snijders PJF, Claassen-Kramer D, Berkhof J, Helmerhorst TJM, Heideman DAM, Wilting SM, Murakami Y, Ito A, Meijer CJLM, Steenbergen RDM: Association between dense CADM1 promoter methylation and reduced protein expression in high-grade CIN and cervical SCC. J Pathol. 2008, 215: 388-397. 10.1002/path.2367.
Article PubMed CAS Google Scholar
Wilting SM, de Wilde J, Meijer CJL, Berkhof J, Yi Y, Van Wieringen WN, Braakhuis BJM, Meijer GA, Ylstra B, Snijders PJF, Steenbergen RDM: Integrated genomic and transcriptional profiling identifies chromosomal loci with altered gene expression in cervical cancer. Gene Chromosome Canc. 2008, 47: 890-905. 10.1002/gcc.20590.
Article CAS Google Scholar
Oshlack A, Robinson MD, Young MD: From RNA-seq reads to differential expression results. Genome Biol. 2010, 11: 220-10.1186/gb-2010-11-12-220.
Article PubMed Central PubMed CAS Google Scholar
Lindenbergh-Van der Plas M, Martens-de Kemp SR, de Maaker M, Van Wieringen WN, Ylstra B, Agami R, Cerisoli F, Leemans CR, Braakhuis BJM, Brakenhoff RH: Identification of lethal microRNAs specific for head and neck cancer. Clin Cancer Res. 2013, 19: 5647-5657. 10.1158/1078-0432.CCR-12-2295.
Article PubMed CAS Google Scholar
Robinson MD, Smyth GK: Moderated statistical tests for assessing differences in tag abundance. Bioinformatics. 2007, 23: 2881-2887. 10.1093/bioinformatics/btm453.
Article PubMed CAS Google Scholar
Mutarelli M, Cicatiello L, Ferraro L, Grober OMV, Ravo M, Facchiano AM, Angelini C, Weisz A: Time-course analysis of genome-wide gene expression data from hormone-responisve human breast cancer cells. BMC Bioinformatics. 2008, 9: 12-10.1186/1471-2105-9-12.
Article Google Scholar
Abramovich F, Angelini C: Bayesian maximum a posteriori multiple testing procedure. Indian J Stat. 2006, 68: 436-460.
Google Scholar

Download references

Acknowledgements

This research was funded by a grant from the VU University Medical Center-Cancer Center Amsterdam (VUMC-CCA, project CCA2011-5-02).

Author information

Authors and Affiliations

Department of Epidemiology and Biostatistics, VU University Medical Center, P.O. Box 7057, 1007, MB, Amsterdam, The Netherlands
Viktorian Miok, Mark A van de Wiel & Wessel N van Wieringen
Department of Pathology, VU University Medical Center, P.O. Box 7057, 1007, MB, Amsterdam, The Netherlands
Viktorian Miok, Saskia M Wilting, Annelieke Jaspers, Peter JF Snijders & Renske DM Steenbergen
Department of Mathematics, VU University Amsterdam, De Boelelaan 1081a, 1081, HV, Amsterdam, The Netherlands
Mark A van de Wiel & Wessel N van Wieringen
InteRNA Technologies BV, Padualaan 8, 3584, CH, Utrecht, The Netherlands
Paula I van Noort
Department of Otolaryngology/Head-Neck Surgery, VU University Medical Center, P.O. Box 7057, 1007, MB, Amsterdam, The Netherlands
Ruud H Brakenhoff

Authors

Viktorian Miok
View author publications
You can also search for this author in PubMed Google Scholar
Saskia M Wilting
View author publications
You can also search for this author in PubMed Google Scholar
Mark A van de Wiel
View author publications
You can also search for this author in PubMed Google Scholar
Annelieke Jaspers
View author publications
You can also search for this author in PubMed Google Scholar
Paula I van Noort
View author publications
You can also search for this author in PubMed Google Scholar
Ruud H Brakenhoff
View author publications
You can also search for this author in PubMed Google Scholar
Peter JF Snijders
View author publications
You can also search for this author in PubMed Google Scholar
Renske DM Steenbergen
View author publications
You can also search for this author in PubMed Google Scholar
Wessel N van Wieringen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wessel N van Wieringen.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

VM, WNvW, SMW wrote the manuscript. VM implemented the method in R, and performed the data analysis. This work uses an original idea by WNvW, further developed by WNvW, SMW and VM. WNvW, MAvdW, SMW, RDMS and PJFS supervised the work. SMW, AJ, RDMS and PJFS generated HPV-induced transformation data and helped with the biological interpretation of the results. RHB and PIvN generated head and neck cancer data and helped with biological interpretation of the results. All authors read and approved the final manuscript.

Electronic supplementary material

12859_2014_6640_MOESM1_ESM.pdf

Additional file 1: Supplementary material. Supplementary document containing details about the simulation study setup, additional figures and tables. (PDF 1013 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Authors’ original file for figure 9

Authors’ original file for figure 10

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver (https://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Miok, V., Wilting, S.M., van de Wiel, M.A. et al. tigaR: integrative significance analysis of temporal differential gene expression induced by genomic abnormalities. BMC Bioinformatics 15, 327 (2014). https://doi.org/10.1186/1471-2105-15-327

Download citation

Received: 30 May 2014
Accepted: 25 September 2014
Published: 02 October 2014
DOI: https://doi.org/10.1186/1471-2105-15-327

tigaR: integrative significance analysis of temporal differential gene expression induced by genomic abnormalities

Abstract

Background

Results

Conclusion

Background

Methods

Model

Estimation

Hypothesis testing

Spatial multivariate prior

Practical considerations

Results and discussion

HPV-induced transformation

Head-and-neck cancer

Comparison

Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ contributions

Electronic supplementary material

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us