A new insight into underlying disease mechanism through semi-parametric latent differential network model

Background In genomic studies, to investigate how the structure of a genetic network differs between two experiment conditions is a very interesting but challenging problem, especially in high-dimensional setting. Existing literatures mostly focus on differential network modelling for continuous data. However, in real application, we may encounter discrete data or mixed data, which urges us to propose a unified differential network modelling for various data types. Results We propose a unified latent Gaussian copula differential network model which provides deeper understanding of the unknown mechanism than that among the observed variables. Adaptive rank-based estimation approaches are proposed with the assumption that the true differential network is sparse. The adaptive estimation approaches do not require precision matrices to be sparse, and thus can allow the individual networks to contain hub nodes. Theoretical analysis shows that the proposed methods achieve the same parametric convergence rate for both the difference of the precision matrices estimation and differential structure recovery, which means that the extra modeling flexibility comes at almost no cost of statistical efficiency. Besides theoretical analysis, thorough numerical simulations are conducted to compare the empirical performance of the proposed methods with some other state-of-the-art methods. The result shows that the proposed methods work quite well for various data types. The proposed method is then applied on gene expression data associated with lung cancer to illustrate its empirical usefulness. Conclusions The proposed latent variable differential network models allows for various data-types and thus are more flexible, which also provide deeper understanding of the unknown mechanism than that among the observed variables. Theoretical analysis, numerical simulation and real application all demonstrate the great advantages of the latent differential network modelling and thus are highly recommended. Electronic supplementary material The online version of this article (10.1186/s12859-018-2461-2) contains supplementary material, which is available to authorized users.


Web Appendix A: Theoretical analysis
In Web Appendix A we will present theoretical analysis of the estimators ∆, ∆ B and ∆ M separately. We investigate the properties of the proposed estimators by considering the convergence rates of ∆ − ∆ 0 , ∆ B − ∆ B 0 and ∆ M − ∆ M 0 , including estimation error bounds and support recovery. Before we present the theoretical results, we first present some notations which are useful in the following sections. In this paper, notations | · | and · are used to denote element-wise norms and matrix norms respectively. For any vector µ = (µ 1 , . . . , µ d ) ∈ R d , let µ −i denote the (d−1)×1 vector by removing the i-th entry from µ.
We use λ min (A) and λ max (A) to denote the smallest and largest eigenvalues of A respectively. Vec(A) denotes the d 2 × 1 vector obtained by stacking the columns of A. For a set H, we use |H| to denote the cardinality of H. For two sequences of real numbers {a n } and {b n }, we write a n = O(b n ) if there exists a constant C such that |a n | ≤ C|b n | holds for all n, write a n = o(b n ) if lim n→∞ a n /b n = 0, and write a n b n if there exist constants c and C such that c ≤ a n /b n ≤ C for all n.

Theoretical properties for Gaussian copula model
First we introduce some basic conditions which are required to obtain the theoretical result for ∆. Condition (C1) requires the difference matrix to have essentially constant sparsity. This is reasonable in gene expression data analysis as it is expected that the underlying genetic networks share many common edges and do not differ much between two conditions. Condition (C2) requires that the covariates can not be too highly correlated, which is closely related to the mutual incoherence property introduce by Donoho and Huo (2001).
(C1) The true difference matrix ∆ 0 has s < p nonzero entries in its upper triangular part, and Under these conditions, an additional thresholding step on the estimator ∆ with a careful chosen threshold leads to more accurate recovery of the differential network. Define the thresholded estimator Let ∆ τn = [ δ τn jk ] and define the function Denote by M( ∆ τn ) = {sgn( δ τn jk ) : j = 1, . . . , p; k = 1, . . . , p} and M(∆ 0 ) = {sgn(δ 0 jk ) : j = 1, . . . , p; k = 1, . . . , p} the vectors of the signs of the entries of the estimated and true difference matrices, respectively. The following theorem establishes that ∆ τn can recover not only the support of ∆ 0 but also the signs of its nonzero entries as long as those entries are sufficiently large.
Theorem 1.1. Assume that conditions (C1) and (C2) hold. If min(n X , n Y ) > log p, and min j,k:δ 0 jk =0 |δ 0 jk | > 2τ n , then M( ∆ τn ) = M( ∆ 0 ) with probability at least 1 − p −1 , where C is a sufficiently large constant independent of (p, n X , n Y ) and σ P min , σ are defined in condition (C2). In the context of genetic networks, Theorem 1.1 guarantees that ∆ τn can identify genes whose conditional dependencies change in magnitude as well as the directions of those changes between two conditions under certain regularity conditions. The τ n is tuning parameter which is set to be 10 −4 in the simulation and data analysis.
The following theorem establishes the convergence rate of ∆ − ∆ 0 in the Frobenius norm.
Theorem 1.2. Suppose that conditions (C1) and (C2) hold. If min(n X , n Y ) > log p, with probability at least 1−2p −2 , where C is a sufficiently large constant independent of (p, n X , n Y ) and σ P min , σ are defined in condition (C2).
The following theorem establishes the elementwise ∞ norm bound of the estimation error, which is critical in the proof of Theorem 1.1 and 1.2. Theorem 1.3. Suppose that conditions (C1) and (C2) hold. If min(n X , n Y ) > log p, with probability at least 1−2p −2 , where C is a sufficiently large constant independent of (p, n X , n Y ) and σ P min , σ are defined in condition (C2).
The same parametric convergence rate for differential network under Gaussian assumption was established by Zhao et al. (2014). This shows that the estimator ∆ for Gaussian copula model achieves the optimal parametric rate log p/min(n X , n Y ) in terms of difference matrix estimation. The extra modeling flexibility and robustness come at almost no cost of statistical efficiency. Thus this new estimator can be used as a safe replacement of Gaussian estimators even when the data are truly Gaussian. This is one main contribution of the current paper.

Theoretical properties for latent Gaussian copula model for binary data
First we introduce some basic conditions which is required to obtain the theoretical results for binary data. Conditions (B1) and (B2) are similar conditions to conditions (C1) and (C2) in the last section. Conditions (B3) and (B4) are mainly adopted for technical considerations and impose little restriction in practice. Specifically, Condition (B3) rules out the singular case that there exist variables which are perfectly collinear. Condition (B4) is used to control the variation of F −1 (τ ; Λ j , Λ k ) with respect to (τ ; Λ j , Λ k ).
(B1) The true difference matrix ∆ B 0 has s < p nonzero entries in its upper triangular part, and Under these conditions, an additional thresholding step on the estimator ∆ B with a careful chosen threshold leads to more accurate recovery of the differential network. Define the thresholded estimator Denote by M( ∆ B τn ) = {sgn( δ τnB jk ) : j = 1, . . . , p; k = 1, . . . , p} and M(∆ B 0 ) = {sgn(δ 0B jk ) : j = 1, . . . , p; k = 1, . . . , p} the vectors of the signs of the entries of the estimated and true difference matrices, respectively. The following theorem establishes that ∆ B τn can recover not only the support of ∆ B 0 but also the signs of its nonzero entries as long as those entries are sufficiently large.
In the context of genetic networks, Theorem 2.1 guarantees that ∆ B τn can identify genes whose conditional dependencies change in magnitude as well as the directions of those changes between two conditions under certain regularity conditions even if we only see the binary data 0/1. The τ n is a tuning parameter which is set to be 10 −4 in the simulation and data analysis.
The following theorem establishes the convergence rate of ∆ B − ∆ B 0 in the Frobenius norm.
The following theorem establishes the elementwise ∞ norm bound of the estimation error, which is critical in the proof of Theorem 2.2 and 2.3. Theorem 2.3. Suppose that conditions (B1) (B4) hold. If min(n 1 , n 2 ) > log p, with probability at least 1 − p −1 , where C is a sufficiently large constant independent of (p, n 1 , n 2 ) and σ P min , σ are defined in condition (C2).
Theorem 2.1 Theorem 2.3 establish that the proposed differential network estimator ∆ B achieves the same rate of convergence for both matrix estimation and graph recovery, as if the latent Gaussian copula random variable X were observed.
Under these conditions, an additional thresholding step on the estimator ∆ M with a careful chosen threshold leads to more accurate recovery of the differential network. Define the thresholded estimator Denote by M( ∆ M τn ) = {sgn( δ τnB jk ) : j = 1, . . . , p; k = 1, . . . , p} and M(∆ M 0 ) = {sgn(δ 0M jk ) : j = 1, . . . , p; k = 1, . . . , p} the vectors of the signs of the entries of the estimated and true difference matrices, respectively. The following theorem establishes that ∆ M τn can recover not only the support of ∆ M 0 but also the signs of its nonzero entries as long as those entries are sufficiently large.
The following theorem establishes the convergence rate of ∆ M − ∆ M 0 in the Frobenius norm.
The following theorem establishes the elementwise ∞ norm bound of the estimation error, which is critical in the proof of Theorem 3.1 and 3.2.
Theorem 3.1 Theorem 3.3 establish that the proposed differential network estimator ∆ M achieves the same parametric rate of convergence for both matrix estimation and graph recovery.
Remark 3.4. Compared to the separate and joint approaches to estimating differential networks Cai et al. (2011);Guo et al. (2011) which require sparsity on each Σ −1 , the proposed direction estimation methods for different types of data only require the sparsity of the difference matrix ∆ 0 . Thus the theoretical results in Theorem 1.1-3.3 can still hold in the presence of hub nodes.

Web Appendix B: Proofs of Main Theorems
Before we give the detailed proofs of main theorems in Web Appendix A, we first present some useful lemmas. Lemma 3.5 is established in Zhao et al. (2014).
Lemma 3.6. With probability at least 1 − 2/p 2 , we have where S X jk , S Y jk are defined in Equation (2.1) in the main paper and C is a constant independent of (n X , n Y , p).
Proof. We prove the result for S X jk . The result for S Y jk can be obtained in a similar way. By definition, Denote by T X = [τ X jk ] and T X = [ τ X jk ]. By Taylor expansion, we have where ξ jk ∈ min{ τ X jk , τ X jk }, max{ τ X jk , τ X jk } . Denote by T X = [ξ jk ], then we have where • denotes the matrix element-wise multiplication, which implies that By Hoeffding inequality, we have that By letting t = 4 log p/n X , the above inequality implies that with probability 1 − 2p −2 , Thus with probability at least 1 − 2p −2 , where C 1 = π/2C. This concludes Lemma 3.6.
Lemma 3.7. Suppose that conditions (B3) and (B4) hold. With probability at least 1 − 1/p, we have where R 1 jk , R 2 jk are defined in Equation (2.5) and (2.6) in the main paper and C is a constant independent of (n 1 , n 2 , p).
Proof. We prove the result for R 1 jk . The result for R 2 jk can be obtained in a similar way. By the argument in Fan et al. (2017), we have that for any t > 0, where L 1 , L 2 are two positive constants independent of (n 1 , p). Thus let t = C log p n 1 with a sufficiently large constant C, we have Lemma 3.8. Fan et al. (2017) Suppose that conditions (M3) and (M4) hold. With probability greater than 1 − 1/p, we have that where T 1 jk , T 2 jk are defined in Equation (2.8), (2.9) and (2.10) in the main paper and C is a constant independent of (n 1 , n 2 , p).

Proof of Theorem 1.3
Let the entries of ∆ 0 be denoted by δ 0 jk and define the p(p + 1)/2 × 1 vector is obtained by following the similar argument as in Zhao et al. (2014).

Proof of Theorem 1.2
In the proof of Theorem 1.3, we obtain a result that |w Q c 0 | 1 ≤ |w Q 0 | 1 . Cai et al. (2010) showed that |w| 2 ≤ 2|w Q 0 ∪Q * | 2 , where Q * is the set of indices corresponding to the s/4 largest components of w Q c 0 . Then |w| 2 ≤ 2(1.25s) 1/2 |w| ∞ , and combing this with Theorem 1.3 concludes the the result.

Proof of other Theorems
For the binary data and mixed data, theoretical analysis can be conducted by the similar way as for the continuous data case due to the critical results established in Lemma 3.7 and Lemma 3.8. For saving space, we omit the proofs here.