A new insight into underlying disease mechanism through semi-parametric latent differential network model

He, Yong; Ji, Jiadong; Xie, Lei; Zhang, Xinsheng; Xue, Fuzhong

doi:10.1186/s12859-018-2461-2

Research
Open access
Published: 28 December 2018

A new insight into underlying disease mechanism through semi-parametric latent differential network model

Yong He¹,
Jiadong Ji¹,
Lei Xie^2,3,
Xinsheng Zhang⁴ &
…
Fuzhong Xue⁵

BMC Bioinformatics volume 19, Article number: 493 (2018) Cite this article

2246 Accesses
6 Citations
1 Altmetric
Metrics details

Abstract

Background

In genomic studies, to investigate how the structure of a genetic network differs between two experiment conditions is a very interesting but challenging problem, especially in high-dimensional setting. Existing literatures mostly focus on differential network modelling for continuous data. However, in real application, we may encounter discrete data or mixed data, which urges us to propose a unified differential network modelling for various data types.

Results

We propose a unified latent Gaussian copula differential network model which provides deeper understanding of the unknown mechanism than that among the observed variables. Adaptive rank-based estimation approaches are proposed with the assumption that the true differential network is sparse. The adaptive estimation approaches do not require precision matrices to be sparse, and thus can allow the individual networks to contain hub nodes. Theoretical analysis shows that the proposed methods achieve the same parametric convergence rate for both the difference of the precision matrices estimation and differential structure recovery, which means that the extra modeling flexibility comes at almost no cost of statistical efficiency. Besides theoretical analysis, thorough numerical simulations are conducted to compare the empirical performance of the proposed methods with some other state-of-the-art methods. The result shows that the proposed methods work quite well for various data types. The proposed method is then applied on gene expression data associated with lung cancer to illustrate its empirical usefulness.

Conclusions

The proposed latent variable differential network models allows for various data-types and thus are more flexible, which also provide deeper understanding of the unknown mechanism than that among the observed variables. Theoretical analysis, numerical simulation and real application all demonstrate the great advantages of the latent differential network modelling and thus are highly recommended.

Background

In genomic studies, graphical model has been an important tool to capture dependence among different genes. Particularly, Gaussian graphical model has been widely applied to infer the relationship between genes at the transcriptional level [1–4]. Under the Gaussian assumption, estimating the structure of the graphical model is equivalent to recover the support of precision matrix which is defined to be the inverse of the covariance matrix. However, in some cases, compared to focusing on a particular network, it is of greater interest to investigate how the network of connected gene pairs change from one experimental condition to another, which provides deeper insights on an underlying biological process such as identification of pathways that correspond to such a change. For instance, medical experiment usually involves two groups: the patient group and the control group. The analysis of group difference in biological networks or pathways may offer us a new insight into the underlying disease mechanism, which have extensive biomedical and clinical applications, such as identifying effective targets for drug development in a cost-effective and timely manner. Indeed, differential networking modelling has recently emerged as an important tool to analyze a set of changes in graph structure between two conditions (see, for example; [5–17]). In the context of genomic analysis, it is reasonable to assume that two genes are defined to be connected in the differential network if the magnitude of their conditional dependency relationship changes between two conditions. The precision matrix which is defined as the inverse of covariance matrix can capture the conditional dependency relationship. Thus the differential network is typically modelled as the difference of two precision matrices and this type of modelling has been widely used [7–9, 14, 15]. Figure 1a, b, c illustrate the definition of differential network. Each node represents a gene. For two groups depicted in (a) and (b), there is an edge between genes (i,j) if and only if (i,j)-th element of Ω is nonzero. For each edge, there exists a weight which is the magnitude of (i,j)-th element of Ω. Gene pair (i,j) is defined to be connected in the differential network in (c) if the magnitudes of (i,j)-th elements of two precision matrices change between two groups.

One straightforward approach to estimate the difference of two precision matrices is to separately estimate the precision matrices and then subtract the estimates. In the high dimensional setting where the dimension p is much larger than the sample size n, which is often the case for genomic study, many estimation approaches for the precision matrix have been proposed and proved to enjoy nice theoretical properties and computation advantage under the key assumption of sparsity. And this topic has been an active area of research in recent years [18–22].

Another type of approach to estimate the difference of two precision matrices is to jointly estimate the precision matrices. Guo et al. [23] penalized the joint loglikelihood with a hierarchical penalty that targets the removal of common zeros in the inverse covariance matrices across categories. Danaher et al. [24] proposed the joint graphical Lasso, which is based upon maximizing a penalized log-likelihood with generalized fused Lasso or group Lasso penalty. Motivated by the constrained ℓ₁ minimization approach to precision matrix estimation of [22], Zhao et al. [7] proposed an estimation approach to directly estimate the difference of the precision matrices.

For the separately estimating methods, Liu et al. [25] proposed the nonparanormal family to relax the Gaussian assumption. While the nonparanormal family is much larger than the standard parametric Gaussian family, the independence relations among the variables are still encoded in the precision matrix. In addition, Liu et al. [26] proposed a semiparametric approach called nonparanormal SKEPTIC to estimate high dimensional undirected graphical models efficiently and robustly and proved that the nonparanormal SKEPTIC achieves the optimal parametric rates of convergency in terms of precision matrix estimation and graph recovery. Xue and Zou [27] proposed a similar regularized rank-based estimation idea for estimating nonparanormal graphical models and analyzed adaptive versions of rank-based Dantzig selector and CLIME estimators. He et al. [28] proposed a multiple testing procedure to estimate high-dimensional nonparanormal graphical model and proved that the proposed procedure can control the false discovery rate (FDR) asymptotically.

The disadvantage of Gaussian or nonparanormal graphical models lies in that they are only tailored for modeling continuous data. However, in genomic studies, we may encounter discrete data (e.g. CNV data and SNP data), continuous data (e.g. gene expression and methylation data) or data of hybrid types with both discrete and continuous variables. Besides, in some circumstances, even if the data are continuous, we still need to transform the data into discrete data to remove the heterogeneity (e.g. batch effect, outliers and population stratification). For instance, in the analysis of gene expression data collected from different platforms, to remove the unwanted variation among different experiments known as the batch effects, numerical expression data are often transformed into 0/1 binary data, where lower expression values are encoded as 0 and higher expression values are encoded as 1. In this setting, it is reasonable to assume that the discrete variable is obtained by discretizing a latent variable. Fan et al. [29] proposed a general model named the latent Gaussian copula graphical model, assuming that the observed discrete data are generated by discretizing a latent continuous variable at some unknown cutoff.

In this paper, we consider estimating differential network for various types of biological data in a joint way. We propose a unified semi-parametric latent variable differential network model. The latent differential network model is illustrated in Fig. 1e-f. For biological data, there exist continuous data, discrete data or data of hybrid types with both continuous and discrete data. It is assumed that these data are collected by transforming latent continuous variables which are unobservable. We are interested in the differential network of the latent variables, which provide deeper understanding of the unknown mechanism than that among the observed variables. To the best of our knowledge, our work provides the first method for differential network estimation for binary or mixed data with theoretical guarantees under the high dimensional scaling. The advantages of the proposed methods lie in the following aspects: (I) Our method provides a way to infer the differential network structure among latent variables, which provides deeper understanding of the unknown mechanism than that among the observed variables. (II) Theoretical analysis shows that the proposed methods achieve the same parametric rates of convergence for both difference matrix estimation and differential graph recovery, as if the latent variables were observed. (III) The proposed methods are much more robust to outliers due to the rank-based correlation matrix estimator. (IV) The proposed approaches do not require precision matrices to be sparse, and thus can allow the individual networks to contain hub nodes. Simulation result shows that the proposed method performs much better and more robustly than several state-of-the-art methods. The proposed methods are applied on a gene expression data set associated with lung cancer. A target gene WIF1 stands out by the proposed method, which indeed is verified as a frequent target for epigenetic silencing in various human cancers [30]. The real data example illustrates the great usefulness of the current work.

Methods

In this part, we propose novel definitions of latent differential network model for various types of data. In essence, we define the differential network as the difference of two precision matrices of the latent variables, which greatly generalizes the applicability in areas such as bioinformatics, medical research and so on.

Gaussian copula differential graphical model

We first review the definition of the Gaussian copula distribution. Let f={f₁,…,f_p} be a set of strictly increasing univariate functions. A p dimensional random variable X=(X₁,…,X_p)^⊤ is said to follow the Gaussian copula distribution if and only if f(X):=(f₁(X₁),…,f_p(X_p))^⊤ :=Z∼N_p(μ,Σ) and is noted as X∼NPN(μ,Σ,f), where μ=(μ₁,…,μ_p),Σ=[Σ_jk] are respectively the mean vector and the correlation matrix of the Gaussian variate Z. The conditional independence structure of X is encoded by the sparsity pattern of Ω=Σ⁻¹. Specifically, it can be shown that X_i is conditionally independent of X_j given all other variables if and only if ω_ij=0, where ω_ij is the (i,j)-th element of Ω. Therefore, the differential network of the Gaussian copula variables can be defined to be the difference between the two precision matrices, just the same as for the parametric Gaussian case.

Assume X_i=(X_i1,…,X_ip)^⊤ for i=1,…,n_X are independent observations of the expression levels of p genes from one group denoted by X and Y_i=(Y_i1,…,Y_ip)^⊤ for i=1,…,n_Y from the other denoted by Y, X∼NPN(μ^X,Σ^X,f^X) and Y∼NPN(μ^Y,Σ^Y,f^Y). The differential network is defined to be the difference between the two precision matrices, denoted by Δ₀=Ω^Y−Ω^X, where Ω^Y and Ω^X are the inverse matrices of Σ^Y and Σ^X separately.

We propose a rank-based estimator of Σ^X. It is known that if Z∼NPN(μ,Σ,f), then we have $\Sigma _{jk}=\sin \left (\frac {\pi }{2}\tau _{jk}\right)$, where τ_jk is Kendall’s tau correlation between Z_j and Z_k. Thus we can estimate the unknown correlation matrix Σ^X by:

$$ \hat{S}^{X}_{jk}=\left\{ \begin{array}{ccc} \sin\left(\frac{\pi}{2}\hat{\tau}^{X}_{jk}\right) & & {j\neq k}\\ 1 & & {j=k} \end{array} \right., $$

(1)

where $\hat {\tau }^{X}_{jk}$ is the sample Kendall’s tau correlation between X_j and X_k. Similarly, we can estimate Σ^Y in the same way and obtain the estimator $\hat {\boldsymbol {S}}^{Y}$. Motivated by the direct estimation method of the difference of two precision matrices proposed by [7], one can obtain the estimator of Δ₀ by solving

$$\text{arg\ min} |\boldsymbol{\Delta}|_{1}, \ \ \text{subject to} \ \ |\hat{\boldsymbol{S}}^{X}\boldsymbol{\Delta}\hat{\boldsymbol{S}}^{Y}-\hat{\boldsymbol{S}}^{X}+\hat{\boldsymbol{S}}^{Y}|_{\infty}\leq \lambda_{n}, $$

which is equivalent to the optimization problem:

$$ \begin{array}{ll} \text{arg\ min} |\boldsymbol{\Delta}|_{1}, \ \ \ \text{subject to} \\ \\ \left|\left(\hat{\boldsymbol{S}}^{X}\otimes \hat{\boldsymbol{S}}^{Y}\right)\text{Vec}(\boldsymbol{\Delta})-\text{Vec} \left(\hat{\boldsymbol{S}}^{X}-\hat{\boldsymbol{S}}^{Y}\right)\right|_{\infty}\leq \lambda_{n}, \end{array} $$

(2)

where ⊗ denotes the Kronecker product, $|\boldsymbol {\Delta }|_{1}=\sum _{jk}\delta _{jk}$ is the element-wise ℓ₁ norm of the matrix Δ. Here, for a matrix A=[A_jk], |A|_∞= maxjk|A_jk| and for a vector a=(a_j), |a|_∞= maxj|a_j|.

As seen from Eq. (2), the proposed approach can directly estimate the difference matrix without implicitly estimating the individual precision matrices. Thus there is no need to assume the sparsity of (Σ^Y)⁻¹ and (Σ^X)⁻¹. We only need to assume that Δ₀ is sparse. Besides, compared to the sample covariance matrix, the rank-based estimators here can enjoy modelling flexibility and estimation robustness, especially when outliers exist.

Latent Gaussian copula differential graphical model for binary data

In the analysis of gene expression data, to remove the batch effects, numerical expression data are often transformed into 0/1 binary data, where lower expression values are encoded as 0 and higher expression values are encoded as 1. To estimate the underlying differential network for the binary data from two different groups, we assume that the observed discrete data are generated by discretizing a latent continuous variable at some unknown cutoff. To make the model more flexible, we assume the latent continuous variable is Gaussian copula distributed instead of Gaussian. Let B=(B₁,B₂,…,B_p)^⊤∈{0,1}^p be a p-dimensional 0/1-random vector. The 0/1-random vector B satisfies the latent Gaussian copula model (LGCM) for binary data, if there exists a p dimensional random vector X∼NPN(0,Σ,f) such that

$$B_{j}=I\left(X_{j}>C_{j}\right), \ \ \text{for} \ \ \text{all} \ \ j=1,\ldots,p, $$

where I(·) is the indicator function and the cutoff C=(C₁,…,C_p) is a vector of constants. Then we denote B∼ LGCM(Σ,f,C). We call Σ the latent correlation matrix. The latent Gaussian copula model involves parameters (Σ,f,C). Merely based on the binary random vector B, only f_j(C_j),j=1,…,p are identifiable. Denote Λ=(Λ₁,…,Λ_p), where Λ_j=f_j(C_j). For notational simplicity, we write LGCM(Σ,Λ) for LGCM(Σ,f,C).

Assume ${\boldsymbol {B}}^{1}_{i}=\left (B^{1}_{i1},\ldots,B^{1}_{ip}\right)^{\top }$ for i=1,…,n₁ are independent observations of the binary expression levels of p genes from one group denoted by B¹ and $\boldsymbol {B}^{2}_{i}=\left (B^{2}_{i1},\ldots,B^{2}_{ip}\right)^{\top }$ for i=1,…,n₂ from the other denoted by B², where B¹∼LGCM(Σ¹,Λ¹) and B²∼LGCM(Σ²,Λ²). The differential network is defined to be the difference between the two precision matrices, denoted by $\boldsymbol {\Delta }_{0}^{\mathrm {B}}=\left ({\boldsymbol {\Sigma }^{2}}\right)^{-1}-\left ({\boldsymbol {\Sigma }^{1}}\right)^{-1}$. Motivated by Eq. (2), we should first derive estimators for Σ¹ and Σ². For ease of presentation, we only present the procedure to construct the estimator for Σ¹, estimator for Σ² can be obtained similarly. Denote the Kendall’s tau correlation between $B_{j}^{1}$ and $B_{k}^{1}$ by $\tau _{jk}^{1}$, it can be shown that $\tau _{jk}^{1}$ satisfies:

$$\tau_{jk}^{1}=2\left\{\Phi_{2}\left(\Lambda_{j}^{1},\Lambda_{k}^{1},\Sigma^{1}_{jk}\right)-\Phi\left(\Lambda_{j}^{1}\right)\Phi\left(\Lambda_{k}^{1}\right)\right\}, $$

where

$$\Phi_{2}(u,v,t)=\int_{x_{1}< u}\int_{x_{2}< v}\phi_{2}(x_{1},x_{2};t){dx}_{1}{dx}_{2}, $$

is the cumulative distribution function of the standard bivariate normal distribution, ϕ₂(x₁,x₂;t) is the probability density function of the standard bivariate normal distribution with correlation t. Denote by

$$ F\left(t;\Lambda_{j}^{1},\Lambda_{k}^{1}\right)=2\left\{\Phi_{2}\left(\Lambda_{j}^{1},\Lambda_{k}^{1},t\right)-\Phi \left(\Lambda_{j}^{1}\right)\Phi\left(\Lambda_{k}^{1}\right)\right\}. $$

For any fixed $\Lambda _{j}^{1}$ and $\Lambda _{k}^{1}$, it can be shown that $F\left (t;\Lambda _{j}^{1},\Lambda _{k}^{1}\right)$ is a strictly monotonic increasing function on t∈(−1,1) and thus is invertible. Given $\Lambda _{j}^{1}$ and $\Lambda _{k}^{1}$, one can estimate $\Sigma _{jk}^{1}$ by $F^{-1}\left (\hat {\tau }_{jk}^{1};\Lambda _{j}^{1},\Lambda _{k}^{1}\right)$. However, the cutoff values are unknown in practice. As $E\left (B^{1}_{ij}\right)=1-\Phi \left (\Lambda _{j}^{1}\right)$, we can estimate $\Lambda _{j}^{1}$ by $\hat {\Lambda }_{j}^{1}=\Phi ^{-1}\left (1-\bar {B}^{1}_{j}\right)$, where $\bar {B}_{j}^{1}=\sum _{i=1}^{n}B^{1}_{ij}/n$. Thus the Kendall’s tau rank-based correlation matrix estimator $\hat {\boldsymbol {b}}^{1}=\left [\hat {\mathrm {R}}^{1}_{jk}\right ]$ for Σ¹ is a p×p matrix with element entry given by

$$ \hat{\mathrm{R}}_{jk}^{1}=\left\{ \begin{array}{ccc} F^{-1}\left(\hat{\tau}_{jk}^{1};\hat{\Lambda}_{j}^{1},\hat{\Lambda}_{k}^{1}\right) & & {j\neq k},\\ 1, & & {j=k}. \end{array} \right. $$

(3)

Similarly, the Kendall’s tau rank-based correlation matrix estimator $\hat {\boldsymbol {b}}^{2}=\left [\hat {\mathrm {R}}^{2}_{jk}\right ]$ for Σ² is a p×p matrix with element entry given by

$$ \hat{\mathrm{R}}_{jk}^{2}=\left\{ \begin{array}{ccc} F^{-1}\left(\hat{\tau}_{jk}^{2};\hat{\Lambda}_{j}^{2},\hat{\Lambda}_{k}^{2}\right) & & {j\neq k},\\ 1, & & {j=k}. \end{array} \right. $$

(4)

Motivated by Eq. (2), we can obtain an estimator of $\boldsymbol {\Delta }_{0}^{\mathrm {B}}$ by solving the following optimization problem:

$$ \begin{array}{ll} \text{arg\ min} |\boldsymbol{\Delta}|_{1}, \ \ \text{subject to} \\ \left|\left(\hat{\boldsymbol{b}}^{1}\otimes \hat{\boldsymbol{b}}^{2}\right)\text{Vec}(\boldsymbol{\Delta})-\text{Vec}\left(\hat{\boldsymbol{b}}^{1}-\hat{\boldsymbol{b}}^{2}\right)\right|_{\infty}\leq \lambda_{n}. \end{array} $$

(5)

For the binary data, we aim to infer the differential network among latent variables, which provides deeper understanding of the unknown mechanism than that among the observed binary variables. Thus, our model complements the existing work on high dimensional differential network estimation, which mostly focused on learning differential network among observed variables including, for example, the Ising model.

Latent Gaussian copula differential graphical model for mixed data

In the analysis of biological data, there also exists the case where some biological data are discrete while some others are continuous. For instance, multi-level omics data integrative analysis involves gene mutation, expression, methylation, metabolome and phenome data. In this case, mixed data appear naturally. We start with the definition of the latent Gaussian copula model for mixed data. Assume that M=(M₁,M₂), where M₁ represents the p₁-dimensional binary variables and M₂ represents the p₂-dimensional continuous variables. The random vector M satisfies the latent Gaussian copula model for mixed data, if there exists a p₁ dimensional random vector X₁ such that X=(X₁,M₂)∼ NPN(0,Σ,f) and

$$M_{j}=I\left(X_{j}>C_{j}\right) \ \ \text{for} \ \ \text{all} \ \ j=1,\ldots,p_{1}, $$

where $\boldsymbol {C}=\left (C_{1},\ldots,C_{p_{1}}\right)$ is a vector of constants. Then we denote M∼ LGCM(0,Σ,f,C), and call Σ the latent correlation matrix. In the latent Gaussian copula regression model, the binary components M₁ are generated by a latent continuous random vector X₁ truncated at C, and combining with the continuous components M₂, X=(X₁,M₂) satisfies the Gaussian copula model. For the binary data M₁, only Λ_j=f_j(C_j),j=1,…,p₁ are identifiable. For the continuous components M₂, the marginal transformations f_j(·),j=p₁+1,…,p are identifiable.

Assume ${{\boldsymbol {M}}^{1}_{i}=\left (M^{1}_{i1},\ldots,M^{1}_{ip}\right)^{\top }}$ for i=1,…,n₁ are independent observations of the expression levels of p genes from one group denoted by M¹ and ${\boldsymbol {M}^{2}_{i}=\left (M^{2}_{i1},\ldots,M^{2}_{ip}\right)^{\top }}$ for i=1,…,n₂ from the other denoted by M², where M¹∼LGCM(Σ¹,Λ¹) and M²∼LGCM(Σ²,Λ²). The differential network is defined to be the difference between the two precision matrices, denoted by $\boldsymbol {\Delta }_{0}^{\mathrm {M}}=\left ({\boldsymbol {\Sigma }^{2}}\right)^{-1}-\left ({\boldsymbol {\Sigma }^{1}}\right)^{-1}$. Similar to the discussions in the last sections, we first need to construct estimators for Σ¹ and Σ². For ease of presentation, we only present the procedure to construct the estimator for Σ¹, estimator for Σ² can be obtained similarly. For discrete components $M^{1}_{ij},M^{1}_{ik}(1\leq j,k\leq p_{1})$, as what we have discussed in the last subsection with a slight change of notation, we can estimate $\Sigma ^{1}_{jk}$ by:

$$ \hat{\mathrm{T}}^{1}_{jk}=\left\{ \begin{array}{ccc} F^{-1}\left(\hat{\tau}^{1}_{jk};\hat{\Lambda}_{j}^{1},\hat{\Lambda}_{k}^{1}\right) & & 1\leq{j\neq k}\leq p_{1},\\ 1, & & 1\leq {j=k}\leq p_{1}. \end{array} \right. $$

(6)

For continuous components $M_{ij}^{1},M_{ik}^{1}$, as what we have discussed, we can estimate $\Sigma ^{1}_{jk}$ by:

$$ \hat{\mathrm{T}}^{1}_{jk}=\left\{ \begin{array}{ccc} \text{sin}\left(\frac{\pi}{2}\hat{\tau}_{jk}\right) & & p_{1}+1\leq {j\neq k}\leq p,\\ 1, & & p_{1}+1\leq {j=k}\leq p. \end{array} \right. $$

(7)

where $\hat {\tau }_{jk}$ is defined as follows:

$$ \hat{\tau}_{jk}^{1}\,=\,\frac{2}{n_{1}(n_{1}\,-\,1)}\sum_{1\leq i\leq i^{\prime}\leq n_{1}}\text{sign}\!\left(\!M^{1}_{ij}\,-\,M^{1}_{i^{\prime} j}\!\right)\cdot\text{sign}\!\left(M^{1}_{ik}\,-\,M^{1}_{i^{\prime} k}\right)\!. $$

We still need to consider the mixed case. Without loss of generality, we assume that $M^{1}_{ij}$ is binary and $M^{1}_{ik}$ is continuous. In this case, the Kendall’s tau correlation can be expressed by

$$ \hat{\tau}_{jk}^{1}=\frac{2}{n_{1}(n_{1}-1)}\sum_{1\leq i\leq i^{\prime}\leq n_{1}}\left(M^{1}_{ij}-M^{1}_{i^{\prime} j}\right)\cdot\text{sign}\left(M^{1}_{ik}\,-\,M^{1}_{i^{\prime} k}\right). $$

The population version of Kendall’s tau correlation $\tau _{jk}^{1}=E\left (\hat {\tau }^{1}_{jk}\right)$ can be expressed by $\tau ^{1}_{jk}=H\left (\Sigma ^{1}_{jk};\Lambda ^{1}_{j}\right)$, where

$$H\left(t;\Lambda_{j}^{1}\right)=4\Phi_{2}\left(\Lambda_{j}^{1},0,t/\sqrt{2}\right)-2\Phi\left(\Lambda_{j}^{1}\right). $$

Moreover, for fixed $\Lambda _{j}^{1}$, $H\left (t;\Lambda _{j}^{1}\right)$ is an invertible function of t. The parameter $\Lambda _{j}^{1}$ could be estimated by $\Lambda _{j}^{1}=\Phi ^{-1}\left (1-\bar {M}^{1}_{j}\right)$, where $\bar {M}^{1}_{j}=\sum _{i=1}^{n}M^{1}_{ij}/n$. Thus when $M^{1}_{ij}$ is binary and $M^{1}_{ik}$ is continuous, $\Sigma ^{1}_{jk}$ can be estimated by the Kendall’ tau rank-based estimator:

$$ \hat{\mathrm{T}}_{jk}^{1}= H^{-1}\left(\hat{\tau}^{1}_{jk};\hat{\Lambda}^{1}_{j}\right), \ \ 1\leq j\leq p_{1} < k\leq p, $$

(8)

where $H^{-1}\left (\tau,\Lambda ^{1}_{j}\right)$ is the inverse function of $H\left (t,\Lambda ^{1}_{j}\right)$ for fixed $\Lambda ^{1}_{j}$. Thus the Kendall’s tau rank-based correlation matrix estimator $\hat {\mathbf {T}}^{1}=\left [\hat {\mathrm {T}}^{1}_{jk}\right ]$ for Σ¹ is a p×p matrix with corresponding element entry given by Eqs. (6), (7), and (8) respectively. Similarly, we can obtain estimator $\hat {\mathbf {T}}^{2}$ for Σ². Motivated by Eq. (2), we can obtain an estimator of Δ₀ by solving the following optimization problem:

$$ \begin{array}{ll} \text{arg\ min} |\boldsymbol{\Delta}|_{1}, \ \ \text{subject to}\\ \\ \left|\left(\hat{\mathbf{T}}^{1}\otimes \hat{\mathbf{T}}^{2}\right)\text{Vec}(\boldsymbol{\Delta})-\text{Vec}\left(\hat{\mathbf{T}}^{1}-\hat{\mathbf{T}}^{2}\right)\right|_{\infty}\leq \lambda_{n}. \end{array} $$

(9)

We show that the rank-based covariance matrix estimators achieve the same parametric rate of convergence for both difference matrix estimation and differential graph recovery in the Additional file 1. Thus the extra modelling flexibility comes at almost no cost of statistical efficiency. Besides, for the binary data or data of hybrid types with both binary and continuous variables, the differential network among latent variables can be well estimated, which provides deeper understanding of the unknown mechanism than that among the observed variables.

Implementation

In this section we will present how to solve the optimization problems in Eqs. (2), (5), and (9). For ease of presentation, we only present the procedure to obtain the solution to optimization problem in Eq. (2) and solutions to optimization problems in Eqs. (5) and (9) can be obtained in the similar way.

Recall that in Eq. (2), the optimization problem is

$$\begin{array}{ll} \text{arg\ min} |\boldsymbol{\Delta}|_{1}, \ \ \text{subject to} \\ \\ \left|\left(\hat{\boldsymbol{S}}^{X}\otimes \hat{\boldsymbol{S}}^{Y}\right)\text{Vec}(\boldsymbol{\Delta})-\text{Vec}\left(\hat{\boldsymbol{S}}^{X}-\hat{\boldsymbol{S}}^{Y}\right)\right|_{\infty}\leq \lambda_{n}. \end{array} $$

Let Δ= [δ_jk]_1≤j,k≤p and define θ to be the p(p+1)/2×1 vector with θ=(δ_jk)_{1≤j≤k≤p}. Estimating a symmetric Δ is thus equivalent to estimating θ, which alleviates the computation burden especially when p is large. Define the p²×p(p+1)/2 matrix Γ with columns indexed by 1≤j≤k≤p and with rows indexed by l=1,…,p and m=1,…,p, so that each entry is labeled by Γ_lm,jk. For j≤k, let Γ_jk,jk=Γ_kj,jk=1 and set all other entries of Γ equal to zero. With these notations, one may consider the following optimization problem:

$$ \begin{array}{ll} \hat{\boldsymbol{\theta}}=\text{arg\ min}|\boldsymbol{\theta}|_{1} \ \ \text{subject to} \\ \left\{ \begin{array}{l} \left|\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{S}}\boldsymbol{\Gamma}\boldsymbol{\theta}-\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{s}}\right|_{O\infty}\leq \lambda_{n},\\ \left|\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{S}}\boldsymbol{\Gamma}\boldsymbol{\theta}-\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{s}}\right|_{D\infty}\leq \lambda_{n}/2, \end{array} \right. \end{array} $$

(10)

where $\hat {\boldsymbol {S}}=\hat {\boldsymbol {S}}^{X}\otimes \hat {\boldsymbol {S}}^{Y}$, $\hat {\boldsymbol {s}}=\text {Vec}\left (\hat {\boldsymbol {S}}^{X}-\hat {\boldsymbol {S}}^{Y}\right)$ and for a p(p+1)/2×1 vector c, |c|_O∞ denotes the sup-norm of the entries of c corresponding to the off diagonal elements of its matrix form, while |c|_D∞ denotes the sup-norm of the entries of c corresponding to the diagonal elements. The matrix form of $\hat {\boldsymbol {\theta }}$ will be denoted by $\hat {\boldsymbol {\Delta }}$ in the following sections. The optimization problem in Eq. (10) can be solved by the alternating direction method of multipliers (ADMM), for a thorough discussion, we refer to [31]. For the optimization problem in Eq. (10), to apply the ADMM algorithm, we rewrite it as:

$$\begin{array}{ll} \hat{\boldsymbol{\theta}}=\text{arg\ min}_{\boldsymbol{\theta}, \boldsymbol{z}} \left\{|\boldsymbol{\theta}|_{1}+g(\boldsymbol{z})\right\} \\ \text{subject to} ~~\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{S}}\boldsymbol{\Gamma}\boldsymbol{\theta}+\boldsymbol{z}=\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{s}}, \end{array} $$

where the function g(·) is defined by

$$g(\boldsymbol{z})=\left\{ \begin{array}{ccc} \infty & &|\boldsymbol{z}_{O\infty}|>\lambda_{n}\ \ \text{or} \ \ |\boldsymbol{z}_{D\infty}|>\lambda_{n}/2.\\ 0, & & \text{otherwise}. \end{array} \right.$$

The augmented Lagrangian can be written as

$$ \begin{array}{ll} L_{\rho}(\boldsymbol{\theta},\boldsymbol{z},\boldsymbol{u})&=\boldsymbol{u}^{\top}\left(\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{S}}\boldsymbol{\Gamma}\boldsymbol{\theta}+\boldsymbol{z}-\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{s}}\right)+|\boldsymbol{\theta}|_{1}\\ &\quad+\frac{\rho}{2} \left|\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{S}}\boldsymbol{\Gamma}\boldsymbol{\theta}+\boldsymbol{z}-\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{s}}\right|_{2}^{2}+g(\boldsymbol{z}), \end{array} $$

(11)

where u is the Lagrange multiplier and ρ is a positive penalty parameter which can be specified by users. The ADMM algorithm is based on minimizing the augmented Lagrangian in (11) over θ and z and then applying a dual variable update to the Lagrange multiplier u, which yields the updates

$$\begin{array}{ccl} \boldsymbol{z}^{(t+1)}& = &\underset{\boldsymbol{z}}{\text{arg\ min}}\left|\boldsymbol{u}^{(t)}/\rho + \boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{s}}-\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{S}} \boldsymbol{\Gamma}\boldsymbol{\theta}^{(t)}-\boldsymbol{z}\right|_{2}^{2}\\ \\ &&+2{g(\boldsymbol{z})}/{\rho}\\ \boldsymbol{\theta}^{(t+1)}& = &\underset{\boldsymbol{\theta}}{\text{arg\ min}}\Big|\frac{\boldsymbol{u}^{(t)}}{\rho} + \boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{s}}-\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{S}} \boldsymbol{\Gamma}\boldsymbol{\theta}-\boldsymbol{z}^{(t+1)}\Big|_{2}^{2}\\ \\ &&+2{|\boldsymbol{\theta}|_{1}}/{\rho}\\ \boldsymbol{u}^{(t+1)}& = &\boldsymbol{u}^{(t)}+\rho\left(\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{s}}-\boldsymbol{\Gamma}^{\top} \hat{\boldsymbol{S}}\boldsymbol{\Gamma}\boldsymbol{\theta}^{(t+1)}-\boldsymbol{z}^{(t+1)}\right)\\ \end{array} $$

for iterations t=0,1,2…. As for the tuning parameter λ_n in (10), it can be chosen by an approximate Akaike information criterion (AIC). λ_n is chosen to minimize

$$(n_{X}+n_{Y})L(\lambda_{n})+2k, $$

where k is the effective degrees of freedom that can be approximated by $|\hat {\boldsymbol {\theta }}|_{0}$ and L(λ_n) represents the loss function either L_∞ or L_F which are defined by

$$L_{\infty}(\lambda_{n})=\left|\hat{\boldsymbol{S}}^{X}\hat{\boldsymbol{\Delta}} (\lambda_{n})\hat{\boldsymbol{S}}^{Y}-\hat{\boldsymbol{S}}^{X}+\hat{\boldsymbol{S}}^{Y}\right|_{\infty}, $$

$$\,\, L_{\mathrm{F}}(\lambda_{n})=\left\|\hat{\boldsymbol{S}}^{X}\hat{\boldsymbol{\Delta}}(\lambda_{n}) \hat{\boldsymbol{S}}^{Y}-\hat{\boldsymbol{S}}^{X}+\hat{\boldsymbol{S}}^{Y}\right\|_{\mathrm{F}}. $$

In this paper we focus on the loss functions with the supremum and Frobenius norms for further theoretical development. One may also use other matrix norms, such as spectral norm:

$$L_{\text{sp}}(\lambda_{n})=\left\|\hat{\boldsymbol{S}}^{X}\hat{\boldsymbol{\Delta}}(\lambda_{n})\hat{\boldsymbol{S}}^{Y}-\hat{\boldsymbol{S}}^{X}+\hat{\boldsymbol{S}}^{Y}\right\|_{2}. $$

Similarly, for the latent Gaussian copula model for binary data, one can solve the following optimization problem:

$$ \begin{array}{ll} \hat{\boldsymbol{\theta}}^{\mathrm{B}}=\text{arg\ min}|\boldsymbol{\theta}|_{1} \ \ \text{subject to} \\ \left\{ \begin{array}{l} \left|\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{b}}\boldsymbol{\Gamma}\boldsymbol{\theta}-\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{r}}\right|_{O\infty}\leq \lambda_{n},\\ \left|\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{b}}\boldsymbol{\Gamma}\boldsymbol{\theta}-\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{r}}\right|_{D\infty}\leq \lambda_{n}/2, \end{array} \right. \end{array} $$

(12)

where $\hat {\boldsymbol {b}}=\hat {\boldsymbol {b}}^{1}\otimes \hat {\boldsymbol {b}}^{2}$, $\hat {\boldsymbol {r}}=\text {Vec}\left (\hat {\boldsymbol {b}}^{1}-\hat {\boldsymbol {b}}^{2}\right)$. The matrix form of $\hat {\boldsymbol {\theta }}^{\mathrm {B}}$ will be denoted by $\hat {\boldsymbol {\Delta }}^{\mathrm {B}}$ in the following sections. For the latent Gaussian copula model for mixed data, one can solve the following optimization problem:

$$ \begin{array}{ll} \hat{\boldsymbol{\theta}}^{\mathrm{M}}=\text{arg\ min}|\boldsymbol{\theta}|_{1} \ \ \text{subject to} \\ \left\{ \begin{array}{l} \left|\boldsymbol{\Gamma}^{\top}\hat{\mathbf{T}}\boldsymbol{\Gamma}\boldsymbol{\theta}-\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{t}}\right|_{O\infty}\leq \lambda_{n},\\ \left|\boldsymbol{\Gamma}^{\top}\hat{\mathbf{T}}\boldsymbol{\Gamma}\boldsymbol{\theta}-\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{t}}\right|_{D\infty}\leq \lambda_{n}/2, \end{array} \right. \end{array} $$

(13)

where $\hat {\mathbf {T}}=\hat {\mathbf {T}}^{1}\otimes \hat {\mathbf {T}}^{2}$, $\hat {\boldsymbol {t}}=\text {Vec}\left (\hat {\mathbf {T}}^{1}-\hat {\mathbf {T}}^{2}\right)$. The matrix form of $\hat {\boldsymbol {\theta }}^{\mathrm {M}}$ will be denoted by $\hat {\boldsymbol {\Delta }}^{\mathrm {M}}$ in the following sections. Besides, corresponding Akaike information criterion can be proposed to choose the tuning parameter λ_n.

Simulation

Simulation for Gaussian copula differential graphical model In this part, we conduct simulation study for differential network estimation under Gaussian copula model. We mainly focus on the graphs that contain hub nodes. First we generate the edge set E^X for the group X. We partition p features into 5 equally-sized and non-overlapping sets: C₁∪C₂⋯∪C₅={1,…,p}, |C_k|=p/5, C_i∩C_j=∅. For the smallest i∈C_k, we set (i,j)∈C_k for all {j≠i:j∈C_k}. The non-zero entries of Ω^X is then determined by the edge set E^X, where Ω^X=(Σ^X)⁻¹. Next, the value of each nonzero entry of Ω^X was generated from a uniform distribution with support [−0.75,−0.25]∪[0.25,0.75]. To ensure positive definiteness of Ω^X, let Ω^X=Ω^X+(0.2+|λ_min(Ω^X)|)I. At last the Ω^X is rescaled such that Σ^X is a correlation matrix. Then we proceed to generate the differential network Δ₀. We randomly select two hub nodes from the 5 equally-sized and non-overlapping sets. The differential network Δ₀ is generated such that the connections of these two hub nodes change sign between Ω^X and Ω^Y. The correlation matrix Σ^X and Σ^Y are generated by (Ω^X)⁻¹ and (Ω^Y)⁻¹ respectively. Finally we generate n_X i.i.d observations of Z^X from the N(0,Σ^X) distribution and n_Y i.i.d observations of Z^Y from the N(0,Σ^Y) distribution. Next we sample n_X i.i.d samples from the nonparanormal distribution NPN(0,Σ^X,f^X) and n_Y i.i.d samples from the nonparanormal distribution NPN(0,Σ^Y,f^Y). For simplicity, we use the same univariate transformations on each dimension: $f_{1}^{X}=f_{2}^{X}=\cdots =f_{p}^{X}=f$ and f^X=f^Y. To sample data from the nonparanormal distribution, we also need g:=f⁻¹. We consider the Gaussian CDF Transformation of g which is used in [26].

In the simulation study, we let p= 50,80,100,120 and n_X=n_Y=100. The simulation result is based on 100 replications. For each simulated data set, we apply three estimation methods. That is, the direct differential network estimator (DDN) in [7], the rank-based differential network estimator (RDN) and the direct differential network estimator based on the latent variable Z and Pearson correlation (ZP-DDN). In ZP-DDN, we assume that Z^X and Z^Y are observed and the Pearson correlation estimator of cov(Z^X) and cov(Z^Y) are plugged into the direct estimation procedure. While ZP-DDN are often not available in real applications, we use ZP-DDN as benchmarks for quantifying the information loss of the remaining estimators.

We evaluate the performance of the estimation methods from two aspects: support recovery and estimation error. The support recovery results are evaluated by true positive rate (TPR) and true negative rate (TNR) along a range of tuning parameter λ. Suppose the true difference matrix Δ₀ has the support $\mathcal {S}_{0}=\{(j,k):\delta _{jk}^{0}\neq 0, \ \text {and} \ j\neq k\}$ and its estimator $\hat {\boldsymbol {\Delta }}$ has the support set $\hat {\mathcal {S}}$. TPR and TNR are defined as follows:

$$\text{{tpr}}=\frac{\text{{tp}}}{{|\mathcal{S}_{0}|}}, \ \ \ \text{{tnr}}=\frac{\text{{tn}}}{p(p-1)-|\mathcal{S}_{0}|}, $$

where TP and TN are the numbers of true positives and true negatives respectively, which are defined as

$$\begin{array}{*{20}l} \text{{tp}}&=\#\left\{(i,j):(i,j)\in \mathcal{S}_{0}\cap\hat{\mathcal{S}}\right\}, \\ \text{{tn}}&=\#\left\{(i,j):(i,j)\in \mathcal{S}_{0}^{c}\cap\hat{\mathcal{S}}^{c}\right\}. \end{array} $$

To evaluate the support recovery performance, we use the true discovery rate, which is defined as TD =TP$/{{|\hat {\mathcal {S}}_{0}|}}$. As for the estimation error, we calculate the element-wise L_∞ norm and Frobenius norm of $\hat {\boldsymbol {\Delta }}-\boldsymbol {\Delta }_{0}$.

Simulation for latent Gaussian copula differential graphical model In this part, we conduct simulation study for differential network estimation under Latent Gaussian copula model. We assume that the cutoff vector C∼Unif [0,1] and let Σ¹ and Σ² be generated in the same way as Σ^X and Σ^Y described in the last subsection. We consider the following three Scenarios:

∙Scenario 1 Generate data $\boldsymbol {B}^{1}=\left (B_{1}^{1},\ldots,B_{p}^{1}\right)^{\top }$, where $B^{1}_{j}=I(X_{j}>C_{j}),j=1,\ldots,p$ and X∼NPN(0,Σ¹,f¹); Generate data $\boldsymbol {B}^{2}=\left (B_{1}^{2},\ldots,B_{p}^{2}\right)^{\top }$, where $B^{2}_{j}=I(Y_{j}>C_{j}),j=1,\ldots,p$ and Y∼NPN(0,Σ²,f²). The transformation functions f¹ and f² are Gaussian CDF transformation.

∙Scenario 2 Generate data $\boldsymbol {M}^{1}=\left (M_{1}^{1},\ldots,M_{p}^{1}\right)^{\top }$, where $M^{1}_{j}=I\left (X_{j}>C_{j}\right),j=(p/2+1),\ldots,p$, X∼NPN(0,Σ¹,f¹) and $M^{1}_{j}=X_{j},j=1,\ldots,p/2$;

Generate data $\boldsymbol {M}^{2}=\left (M_{1}^{2},\ldots,M_{p}^{2}\right)^{\top }$, where $M^{2}_{j}=I\left (Y_{j}>C_{j}\right),j=p/2+1,\ldots,p$, and Y∼NPN(0,Σ²,f²) and $M^{2}_{j}=Y_{j},j=1,\ldots,p/2$. The transformation functions f¹ and f² are Gaussian CDF transformation.

∙Scenario 3 Generate data $\boldsymbol {B}^{1}=\left (B_{1}^{1},\ldots,B_{p}^{1}\right)^{\top }$, where $B^{1}_{j}=I\left (Z^{1}_{j}>C_{j}\right),j=1,\ldots,p$ and Z¹∼N(0,Σ¹), where 10 entries in each Z¹ is randomly sampled and replaced by -5 or 5;

Generate data $\boldsymbol {B}^{2}=\left (B_{1}^{2},\ldots,B_{p}^{2}\right)^{\top }$, where $B^{2}_{j}=I\left (Z^{2}_{j}>C_{j}\right),j=1,\ldots,p$ and Z²∼N(0,Σ²), where 10 entries in each Z² is randomly sampled and replaced by -5 or 5.

In Scenario 1 and Scenario 3, we generate binary data. Scenario 1 corresponds to the latent Gaussian copula model and Scenario 3 corresponds to the setting where the binary data can be misclassified due to the outliers of the latent Gaussian variable. Scenario 3 is designed to investigate the robustness of the proposed approach. Scenario 2 corresponds to the mixed data generated from the latent Gaussian copula model.

Application to gene expression data sets related to lung cancer

In this section we consider the differential network estimation for a gene expression data set related to lung cancer. The data set is publicly available from the Gene Expression Omnibus at accession number GDS2771 and was studied in [24]. It includes 22,283 microarray-derived gene expression measurements from large airway epithelial cells sampled from 97 patients with lung cancer and 90 controls in the data set. It is of interest to investigate how the structure of the gene co-expression network differs between the group of patients with lung cancer and the control group. It may shed light on underlying lung cancer mechanisms. In this real example study, we limited our analysis to the 122 genes in the Wnt signaling pathway. The Wnt signaling pathway has recently emerged as a critical pathway in lung carcinogenesis as already demonstrated in many cancers and particularly in colorectal cancer [32]. The Gene expression levels were analyzed on a logarithmic scale. Each gene feature was standardized to have mean zero and standard deviation 1 within the cancer samples and the controls separately.

Results

Simulation results for Gaussian copula differential graphical model

The receiver operating characteristic (ROC) curves of the three estimation methods are depicted in Fig. 2. It shows that the proposed method RDN compares favourably with the benchmark method ZP-DDN, which means that the information loss is negligible. Besides, Fig. 2 also shows that DDN performs pretty bad in the non-Gaussian case.

Table 1 gives the true discovery rates with different loss functions. The results also show the method RDN compares favourably with the benchmark method ZP-DDN. For all the methods, tuning using the L_F gives better true discovery rates than tuning using the L_∞. Table 1 depicts the elementwise L_∞ norm estimation accuracies of the thresholded estimators tuned using the loss functions L_∞ and L_F. From Table 1, we can see that the L_F loss function gives slightly better results than the L_∞ loss function. For all the methods, the elementwise L_∞ norm estimation accuracy are comparable. We point out that it is possible for RDN to simultaneously give better support recovery but similar estimation than DDN. The reason is that estimation error depends on the magnitudes of the estimated entries, while support recovery depends only on whether the entries are nonzero. Besides, RDN has comparable performance with the benchmark method ZP-DDN in terms of both support recovery and estimation accuracy, which indicates that the information loss of the estimator RDN is negligible.

Table 1 Average true discovery rates (%) and average estimation errors over 100 simulations

Full size table

Simulation results for Latent Gaussian copula differential graphical model

The ROC curves for Scenario 1 and Scenario 2 with different dimensionality p (varying from 50 to 120) is presented in Fig. 3. Table 2 give the true discovery rates with different loss functions and the elementwise L_∞ norm estimation accuracies of the thresholded estimators tuned using the loss functions L_∞ and L_F, respectively. For method ZR-RDN, we assume that the latent Gaussian copula variables are observed. In particular, the rank-based correlation matrix estimator of the latent Gaussian copula variables are plugged into the direct estimation procedure. With a slight abuse of notation, the RDN method here refers to either the rank-based method for binary data or for mixed data. The ROC curves in Fig. 3 show that the rank-based methods RDN proposed for latent Gaussian copula model (binary and mixed) perform pretty well even when the dimensionality is larger than the sample size.

Table 2 Simulation results over 100 replications for Scenario 1 and Scenario 2

Full size table

By the ROC curves in Fig. 4, we can find that RDN is more robust to the data misclassification than the benchmark estimator ZP-DDN. The robustness of RDN to outliers illustrates the advantage of the dichotomization method. In the absence of misclassification, it is seen that the ROC curves of RDN and ZR-RDN are similar, which indicates little information loss for differential network recovery due to the dichotomization procedure. Table 3 gives the true discovery rates with different different loss functions for Scenario 3 and presents the elementwise L_∞ norm estimation accuracies of the thresholded estimators tuned using the loss functions L_∞ and L_F for Scenario 3. From Table 3, we can see that the L_F loss function gives slightly better results than the L_∞ loss function. Besides, we can see that the elementwise L_∞ norm estimation accuracy are comparable. This is also true for Scenario 1 and Scenario 2.

Table 3 Simulation results over 100 replications for Scenario 3

Full size table

Theoretical results

The estimators $\hat {\boldsymbol {\Delta }}$, $\hat {\boldsymbol {\Delta }}^{\mathrm {B}}$ and $\hat {\boldsymbol {\Delta }}^{\mathrm {M}}$, after an additional threshold step, are shown to be able to recover not only the support of the true Δ₀ but also the signs of its nonzero entries as long as those entries are sufficiently large. Besides, under mild conditions, the estimation errors bounds in terms of matrix Frobenius norm and elementwise ℓ_∞ norm both achieve the parametric rate $\sqrt {{\log p}/{\min (n_{X},n_{Y})}}$, see details in Additional file 1. It indicates that the extra modeling flexibility and robustness come at almost no cost of statistical efficiency and it seems as if the latent variable can be observed. Thus these new estimators can be used as a safe replacement of Gaussian estimators even when the data are truly Gaussian. Compared to the separate and joint approaches to estimating differential networks (e.g. [22, 23],) which require sparsity on each Σ⁻¹, the proposed direction estimation methods for different types of data only require the sparsity of the difference matrix Δ₀. The detailed theorems and proofs are in the Additional file 1 available online.

Results of application

In the real application part, we compare three estimation methods. The first method is the Gaussian copula RDN method, which we denote as C-RDN. The second method is the latent Gaussian copula RDN method, which we denote as B-RDN. In specific, we first apply the adaptive dichotomization method implemented by the ArrayBin package in R to remove the batch effect in the gene expression data. The adaptive dichotomization method transforms the numerical gene expression data into 0/1 binary data. The genes with high expression level are encoded as 1 and the genes with lower expression level are encoded as 0. Then we apply the B-RDN to the 0/1 binary data. The third method is the direct differential network estimation method proposed by [7] with Gaussian assumption, which we denote as DDN.

We conduct Shapiro-Wilk test on the gene data set and 63% of the genes reject the normality null hypothesis. Therefore, the Gaussian assumption of DDN method is violated in this real data example. Thus we expect that C-RDN which relaxes the Gaussian assumption may provide a more reliable result. The deficiency of the C-RDN method lies in that it does not take the batch effect of the genes expression data from different platforms into consideration. For the B-RDN method, it removes the batch effect.

Figure 5 depicts the differential network estimated by the three methods. Table 4 gives the hub genes selected out by different estimation methods. For method C-RDN, the tuning parameter λ is selected by the AIC criterion with the elementwise ℓ₁ norm loss function. To ensure a fair comparison, the tuning parameter λ for method B-RDN and DDN are selected such that the number of edges in the estimated differential graphs by all three methods are almost the same. The number of edges selected by the three methods are 56, 59 and 52, respectively. From Fig. 5, we can see that B-RDN identifies an obvious hub gene WIF1 that is an extracellular antagonist of WNT. WIF1 is a frequent target for epigenetic silencing in various human cancers [30]. WIF1 promoter is frequently methylated in non-small cell lung cancer (NSCLC) cells to down-regulate its mRNA expression [33]. Both C-RDN and B-RDN select out a common hub gene APC. APC expression in lung cancer are associated with survival time and is also related to cancer metastasis [34]. Both C-RDN and DDN select out a common hub gene, MAPK8, which plays a significant role in the promotion of lung inflammation and tumorigenesis subsequent to tobacco smoke exposure [35]. The expression level of DVL2 was reported significantly higher in lung adenocarcinomas than in squamous carcinomas, and was associated with poor tumor differentiation [36]. Winn et al. [37] reported that the restoration of FZD9 signaling inhibited both cell proliferation and anchorageindependent growth, promoted cellular differentiation, and reversed the transformed phenotype in NSCLC. The overexpression of MMP7 was associated with tumor proliferation, and a poor prognosis in NSCLC [38]. RAC1 generally plays an important role in cancer progression and metastasis [39].

Table 4 Hub genes selected by different methods

Full size table

By comparing (a) and (b) in Fig. 5, we can see that the estimated differential network can be very different with/without considering the batch effect. Although it is inevitable to result in information loss in the discretization procedure for method B-RDN, [40] argued that this procedure can potentially improve the accuracy of the statistical analysis. In real data example, we recommend to use the B-RDN method to remove the batch effect despite the little information loss. At last we argue that statistical comparison of group difference in this biological network or pathway can provide new insight into the underlying lung cancer mechanism, which may further offer more effective targets for drug development.

To further interpret the underlying biological implications of the identified hub genes, we conducted Gene Ontology (GO) enrichment analysis. Table 5 shows the common GO terms enriched by C-RDN, B-RDN and DDN. The GO enrichment analysis is performed using R package “clusterProfiler” with the P-value adjusted by Benjamini-Hochberg method. It shows that our methods (C-RDN, B-RDN) have smaller P-value than DDN. The common molecular function and cellular component suggest that the change of frizzled binding, Wnt-protein binding and beta-catenin destruction complex are important in the etiology of lung cancer. These predictions are supported by the literatures [41–43], which indicates that the proposed differential network model can provide biological meaningful underlying signals.

Table 5 Gene Ontology (GO) enrichment analysis result

Full size table

Discussion

A complex disease phenotype (e.g. diabetes, cancer) often reflects various pathobiological processes that interact in a network rather than the abnormality of a single gene. Such interactions are not static processes, instead they are dynamic in response to changing genetic, epigenetic and environmental factors, which further entails the analysis of differential network. In this paper, we propose adaptive estimation approaches for latent variable differential network model with the assumption that the true differential network is sparse, which do not require precision matrices to be sparse. The latent variable differential network model is fundamentally different from the existing ones in the literature in the sense that the differential structure in the unobserved latent variables are of primary interest. Theoretical analysis shows that the proposed methods achieve the same parametric convergence rate for both the difference of the precision matrices estimation and differential structure recovery, which means that the extra modelling flexibility comes at almost no cost of statistical efficiency. The unified latent variable differential network model provides deeper understanding of the unknown genomic mechanism than that among the observed variables.

The current work could be extended in the following two aspects. First, in this paper, we consider the following optimization problem to directly estimate the difference matrix Δ:

$$\begin{array}{ll} \text{arg\ min} |\boldsymbol{\Delta}|_{1}, \ \ \ \text{subject to} \\ \left|\hat{\boldsymbol{S}}^{X}\boldsymbol{\Delta}\hat{\boldsymbol{S}}^{Y}-\hat{\boldsymbol{S}}^{X}+\hat{\boldsymbol{S}}^{Y}\right|_{\infty}\leq \lambda_{n}, \end{array} $$

where $\hat {\boldsymbol {S}}^{X}$ and $\hat {\boldsymbol {S}}^{Y}$ denote the rank-based estimators of the covariance matrices. The D-trace loss function [15, 44] can also be applied to to directly estimate the precision matrix difference. Thus, we may also consider the D-trace loss function to estimate the Gaussian copula and latent Gaussian copula differential graphical models. In specific, the difference matrix Δ could be eatimated by:

$$ \text{arg\ min}_{\boldsymbol{\Delta}}\frac{1}{2}\text{Tr} \left(\boldsymbol{\Delta}\hat{\boldsymbol{S}}^{X}\boldsymbol{\Delta}\hat{\boldsymbol{S}}^{Y}\right) - \text{Tr}\left(\boldsymbol{\Delta}\left(\hat{\boldsymbol{S}}^{X}-\hat{\boldsymbol{S}}^{Y}\right)\right) +\mathcal{G}_{\lambda}(\boldsymbol{\Delta}), $$

where λ>0 is a regularization parameter and $\mathcal {G}_{\lambda }$ is a decomposable non-convex penalty function which has the form $\mathcal {G}_{\lambda }=\sum _{j,k}g_{\lambda }\left (\Delta _{jk}\right)$, such as smoothly clipped absolute deviation (SCAD) penalty [45]. The theoretical guarantees are still needed to be investigated, but we expect that the empirical performance could be comparable.

Second, for the latent Gaussian copula differential graphical model, we focus on the binary data. In fact, the methods can be extended to the discrete data with more than two categories. The properties of this procedure are left for future investigation as there are a lot of work still needed to be done.

Conclusions

The proposed latent variable differential network models are very flexible and provide deeper understanding of the unknown biological mechanism. It is demonstrated latent differential network models enjoy great advantages over existing models and thus are highly recommended in real application.

Abbreviations

CNV:: Copy Number Variation
FDR:: False Discovery Rate
GO:: Gene Ontology
ROC:: Receiver Operating Characteristic
SNP:: Single Nucleotide Polymorphisms
TDR:: True Discovery Rate
TNR:: True Negative Rate
TPR:: True Positive Rate

References

Li H, Gui J. Gradient directed regularization for sparse gaussian concentration graphs, with applications to inference of genetic networks. Biostatistics. 2006; 7(2):302–17.
Article PubMed Google Scholar
Segal E, Friedman N, Kaminski N, Regev A, Koller D. From signatures to models: understanding cancer using microarrays. Nat Genet. 2005; 37:38–45.
Article CAS Google Scholar
Peng J, Wang P, Zhou N, Zhu J. Partial correlation estimation by joint sparse regression models. J Am Stat Assoc. 2009; 104(486):735.
Article CAS PubMed PubMed Central Google Scholar
Cai T, Li H, Liu W, Xie J. Covariate-adjusted precision matrix estimation with an application in genetical genomics. Biometrika. 2013; 100(1):139–56.
Article PubMed Google Scholar
Fuente ADL. From differential expression to differential networking–identification of dysfunctional regulatory networks in diseases. Trends Genet. 2010; 26(7):326–33.
Article PubMed CAS Google Scholar
Ideker T, Krogan NJ. Differential network biology. Mol Syst Biol. 2012; 8(1):565.
Article PubMed PubMed Central Google Scholar
Zhao SD, Cai T, Li H. Direct estimation of differential networks. Biometrika. 2014; 101(2):253–68.
Article PubMed PubMed Central Google Scholar
Tian D, Gu Q, Jian M. Identifying gene regulatory network rewiring using latent differential graphical models. Nucleic Acids Res. 2016; 44(17):140.
Article CAS Google Scholar
Xia Y, Cai T, Cai T. Testing differential networks with applications to detecting gene-by-gene interactions. Biometrika. 2015; 102(2):247–66.
Article PubMed PubMed Central Google Scholar
Ji J, Yuan Z, Zhang X, Li F, Xu J, Liu Y, Li H, Wang J, Xue F. Detection for pathway effect contributing to disease in systems epidemiology with a case-control design. Bmj Open. 2015; 5(1):006721.
Article Google Scholar
Ji J, Yuan Z, Zhang X, Xue F. A powerful score-based statistical test for group difference in weighted biological networks. BMC Bioinformatics. 2016; 17(1):86.
Article PubMed PubMed Central CAS Google Scholar
Yuan Z, Ji J, Zhang T, Liu Y, Zhang X, Chen W, Xue F. A novel chi-square statistic for detecting group differences between pathways in systems epidemiology. Stat Med. 2016; 35(29):5512–24.
Article PubMed Google Scholar
Yuan Z, Ji J, Zhang X, Xu J, Ma D, Xue F. A powerful weighted statistic for detecting group differences of directed biological networks. Sci Rep. 2016; 6:34159.
Article CAS PubMed PubMed Central Google Scholar
Liu W. Structural similarity and difference testing on multiple sparse gaussian graphical models. Ann Stat. 2017; 45(6):2680–2707.
Article Google Scholar
Yuan H, Xi R, Chen C, Deng M. Differential network analysis via the lasso penalized d-trace loss. Biometrika. 2017; 104:755–70.
Article Google Scholar
He Y, Zhang X, Ji J, Liu B. Joint estimation of multiple high-dimensional gaussian copula graphical models. Aust N Z J Stat. 2017; 59:289–310.
Article Google Scholar
Ji J, He D, Feng Y, He Y, Xue F, Xie L. Jdinac: joint density-based non-parametric differential interaction network analysis and classification using high-dimensional sparse omics data. Bioinformatics. 2017; 33(19):3080–87.
Article CAS PubMed PubMed Central Google Scholar
Meinshausen N, Bühlmann P. High-dimensional graphs and variable selection with the lasso. Ann Stat. 2006; 34:1436–62.
Article Google Scholar
Yuan M, Lin Y. Model selection and estimation in the gaussian graphical model. Biometrika. 2007; 94(1):19–35.
Article Google Scholar
Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics. 2008; 9(3):432–41.
Article PubMed Google Scholar
Yuan M. High dimensional inverse covariance matrix estimation via linear programming. J Mach Learn Res. 2010; 11(12):2261–86.
Google Scholar
Cai T, Liu W, Luo X. A constrained ℓ ₁ minimization approach to sparse precision matrix estimation. J Am Stat Assoc. 2011; 106(494):594–607.
Article CAS Google Scholar
Guo J, Levina E, Michailidis G, Zhu J. Joint estimation of multiple graphical models. Biometrika. 2011; 98(1):1–15.
Article PubMed PubMed Central Google Scholar
Danaher P, Wang P, Witten DM. J R Stat Soc Ser B (Stat Methodol). 2014; 76(2):373–97.
Liu H, Lafferty J, Wasserman L. The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. J Mach Learn Res. 2009; 10(3):2295–328.
Google Scholar
Liu H, Han F, Yuan M, Lafferty J, Wasserman L. High-dimensional semiparametric gaussian copula graphical models. Ann Stat. 2012; 40(4):2293–326.
Article Google Scholar
Xue L, Zou H. Regularized rank-based estimation of high-dimensional nonparanormal graphical models. Ann Stat. 2012; 40(5):2541–71.
Article Google Scholar
He Y, Zhang X, Wang P, Zhang L. High dimensional Gaussian copula graphical model with FDR control. Comput Stat Data Anal. 2017; 113:457–74.
Article Google Scholar
Fan J, Liu H, Ning Y, Zou H. High dimensional semiparametric latent graphical model for mixed data. J R Stat Soc. 2017; 79(2):405–21.
Article Google Scholar
Ying Y, Tao Q. Epigenetic disruption of the wnt/ß-catenin signaling pathway in human cancers. Epigenetics. 2009; 4(5):307–12.
Article CAS PubMed Google Scholar
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn. 2011; 3(1):1–122.
Article Google Scholar
Mazieres J, He B, You L, Xu Z, Jablons DM. Wnt signaling in lung cancer. Cancer Lett. 2005; 222(1):1–10.
Article CAS PubMed Google Scholar
Lee SM, Park J, Kim DS. Wif1 hypermethylation as unfavorable prognosis of non-small cell lung cancers with egfr mutation. Mol Cells. 2013; 36(1):69–73.
Article CAS PubMed PubMed Central Google Scholar
Brabender J, Usadel H, Danenberg KD, Metzger R, Schneider PM, Lord RV, Wickramasinghe K, Lum CE, Park J, Salonga D, et al. Adenomatous polyposis coli gene promoter hypermethylation in non-small cell lung cancer is associated with survival. Oncogene. 2001; 20(27):3528–32.
Article CAS PubMed Google Scholar
Takahashi H, Ogata H, Nishigaki R, Broide DH, Karin M. Tobacco smoke promotes lung tumorigenesis by triggering ikkbeta- and jnk1-dependent inflammation. Cancer Cell. 2010; 17(1):89.
Article CAS PubMed PubMed Central Google Scholar
Wei Q, Zhao Y, Yang ZQ, Dong QZ, Dong XJ, Han Y, Zhao C, Wang EH. Dishevelled family proteins are expressed in non-small cell lung cancer and function differentially on tumor progression. Lung Cancer. 2008; 62(2):181–92.
Article PubMed Google Scholar
Winn RA, Marek L, Han SY, Rodriguez K, Rodriguez N, Hammond M, Scoyk MV, Acosta H, Mirus J, Barry N. Restoration of wnt-7a expression reverses non-small cell lung cancer cellular transformation through frizzled-9-mediated growth inhibition and promotion of cell differentiation. J Biol Chem. 2005; 280(20):19625.
Article CAS PubMed Google Scholar
Liu D, Nakano J, Ishikawa S, Yokomise H, Ueno M, Kadota K, Urushihara M, Huang CL. Overexpression of matrix metalloproteinase-7 (mmp-7) correlates with tumor proliferation, and a poor prognosis in non-small cell lung cancer. Lung Cancer. 2007; 58(3):384–91.
Article PubMed Google Scholar
Kaneto N, Yokoyama S, Hayakawa Y, Kato S, Sakurai H, Saiki I. Rac1 inhibition as a therapeutic target for gefitinib-resistant non-small-cell lung cancer. Cancer Sci. 2014; 105(7):788–94.
Article CAS PubMed PubMed Central Google Scholar
McCall MN, Irizarry RA. Thawing frozen robust multi-array analysis (frma). BMC Bioinformatics. 2011; 12(1):1.
Article Google Scholar
Stewart DJ. Wnt signaling pathway in non-small cell lung cancer. J Natl Cancer Inst. 2014; 106(1):356.
Article CAS Google Scholar
Nakayama S, Sng N, Carretero J, Welner R, Hayashi Y, Yamamoto M, Tan AJ, Yamaguchi N, Yasuda H, Li D. β-catenin contributes to lung tumor development induced by egfr mutations. Cancer Res. 2014; 74(20):5891–902.
Article CAS PubMed PubMed Central Google Scholar
Rapp J, Jaromi L, Kvell K, Miskei G, Pongracz JE. Wnt signaling-lung cancer is no exception. Respir Res. 2017; 18(1):167.
Article PubMed PubMed Central CAS Google Scholar
Wadsworth JL, Tawn JA. Sparse precision matrix estimation via lasso penalized d-trace loss. Biometrika. 2014; 1(1):103–20.
Google Scholar
Fan J, Li R. Variable selection via nonconvave penalized likelihood and its oracle properties. J Am Stat Assoc. 2001; 96(456):1348–60.
Article Google Scholar

Download references

Funding

This work was supported by grants from the National Natural Science Foundation of China (grant number 81803336, 11801316, 11571080 and 81573259) and Natural Science Foundation of Shandong Province (ZR2018BH033). Publication of this article was sponsored by 81803336 grant. The funding body played no role in the design, writing or decision to publish this manuscript.

Availability of data and materials

The gene expression data set related to lung cancer is publicly available from the Gene Expression Omnibus at accession number GDS2771.

About this supplement

This article has been published as part of BMC Bioinformatics Volume 19 Supplement 17, 2018: Selected articles from the International Conference on Intelligent Biology and Medicine (ICIBM) 2018: bioinformatics. The full contents of the supplement are available online at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-19-supplement-17.

Author information

Authors and Affiliations

School of Statistics, Shandong University of Finance and Economics, Jinan, 250014, China
Yong He & Jiadong Ji
Department of Computer Science, Hunter College, The City University of New York, New York, 10065, USA
Lei Xie
Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, 10016, USA
Lei Xie
School of Management, Fudan University, Shanghai, 200433, China
Xinsheng Zhang
School of Public Health, Shandong University, Jinan, 250012, China
Fuzhong Xue

Authors

Yong He
View author publications
You can also search for this author in PubMed Google Scholar
Jiadong Ji
View author publications
You can also search for this author in PubMed Google Scholar
Lei Xie
View author publications
You can also search for this author in PubMed Google Scholar
Xinsheng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Fuzhong Xue
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

YH and JJ contributed to the study design, analytical preparation and the writing of the manuscript. YH and JJ performed the simulation studies. JJ analyzed the data, LX, XZ and FX wrote and revised the manuscript. All authors read and approved this version of the manuscript.

Corresponding author

Correspondence to Jiadong Ji.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional file

Additional file 1

Contains the theoretical guarantee of of the proposed methods and proofs. (PDF 284 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

He, Y., Ji, J., Xie, L. et al. A new insight into underlying disease mechanism through semi-parametric latent differential network model. BMC Bioinformatics 19 (Suppl 17), 493 (2018). https://doi.org/10.1186/s12859-018-2461-2

Download citation

Published: 28 December 2018
DOI: https://doi.org/10.1186/s12859-018-2461-2

A new insight into underlying disease mechanism through semi-parametric latent differential network model

Abstract

Background

Results

Conclusions

Background

Methods

Gaussian copula differential graphical model

Latent Gaussian copula differential graphical model for binary data

Latent Gaussian copula differential graphical model for mixed data

Implementation

Simulation

Application to gene expression data sets related to lung cancer

Results

Simulation results for Gaussian copula differential graphical model

Simulation results for Latent Gaussian copula differential graphical model

Theoretical results

Results of application

Discussion

Conclusions

Abbreviations

References

Funding

Availability of data and materials

About this supplement

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Publisher’s Note

Additional file

Additional file 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us