Disease gene prioritization by integrating tissue-specific molecular networks using a robust multi-network model

Background Accurately prioritizing candidate disease genes is an important and challenging problem. Various network-based methods have been developed to predict potential disease genes by utilizing the disease similarity network and molecular networks such as protein interaction or gene co-expression networks. Although successful, a common limitation of the existing methods is that they assume all diseases share the same molecular network and a single generic molecular network is used to predict candidate genes for all diseases. However, different diseases tend to manifest in different tissues, and the molecular networks in different tissues are usually different. An ideal method should be able to incorporate tissue-specific molecular networks for different diseases. Results In this paper, we develop a robust and flexible method to integrate tissue-specific molecular networks for disease gene prioritization. Our method allows each disease to have its own tissue-specific network(s). We formulate the problem of candidate gene prioritization as an optimization problem based on network propagation. When there are multiple tissue-specific networks available for a disease, our method can automatically infer the relative importance of each tissue-specific network. Thus it is robust to the noisy and incomplete network data. To solve the optimization problem, we develop fast algorithms which have linear time complexities in the number of nodes in the molecular networks. We also provide rigorous theoretical foundations for our algorithms in terms of their optimality and convergence properties. Extensive experimental results show that our method can significantly improve the accuracy of candidate gene prioritization compared with the state-of-the-art methods. Conclusions In our experiments, we compare our methods with 7 popular network-based disease gene prioritization algorithms on diseases from Online Mendelian Inheritance in Man (OMIM) database. The experimental results demonstrate that our methods recover true associations more accurately than other methods in terms of AUC values, and the performance differences are significant (with paired t-test p-values less than 0.05). This validates the importance to integrate tissue-specific molecular networks for studying disease gene prioritization and show the superiority of our network models and ranking algorithms toward this purpose. The source code and datasets are available at http://nijingchao.github.io/CRstar/. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1317-x) contains supplementary material, which is available to authorized users.

Generally, h is much smaller than n and can be regarded as constants. Hence we can regard the time and space complexities of Algorithm 1 as O(T * (m + n)) and O(m + n), respectively.

Matrix Form of J CR
The objective function J CR is jointly convex in r 1 , ..., r h . This can be shown by first deriving its matrix form.
Let r = (r T 1 , ..., r T h ) T , e = (e T 1 , ..., e T h ) T , i.e., we concatenate all ranking and seed vectors. LetG = diag(G 1 , ...,G h ) be a diagonal block matrix. Then we have where I n is an n × n identity matrix and n = h i=1 n i . Define a common gene mapping matrix O ij ∈ {0, 1} ni×nj where O ij (x, y) = 1 if node x in G i and node y in G j represent the same gene; O ij = 0 otherwise. Then Y is a block matrix whose (i, j) th block is According to Eq. (1) and Eq. (2), we have the following theorem.
Theorem 1 Matrix Form of J CR . J CR has the following matrix form Proof The proof of Theorem 1 includes two equivalence validations: (1) is obvious, we only need to prove the equivalence (2).
According to the definition of X and r, we have This completes the proof.

Optimization Solution to J CR
From Theorem 1, J CR is a quadratic function of r. We can derive a power method to minimize J CR as follows.
Using gradient descent, if we set r ← r − η ∂JCR ∂r , where η = 1 2(1+2β) , we have Eq. (5) is a fixed-point approach to compute r that converges to the global optimal solution of J CR . Algorithm 1 summarizes our approach according to the optimization solution.

Theoretical Analysis of CR
In this section, we show that Algorithm 1 converges to the global minimum of J CR by Theorem 2 and Theorem 3.
Theorem 2 Convergence of CR. Algorithm 1 converges to the closed-form solution Proof First, the closed-form solution can be obtained by solving ∂JCR ∂r = 0. Then let M = c 1+2βG + 2β 1+2βỸ , the CR updating rule in Eq. (5) becomes r = Mr + 1−c 1+2β e. Next, we show that the eigenvalues of M are in the range of (−1, 1).
Based on this property, we can show the convergence of the fixed-point approach. Without loss of generality, let r (0) = e, and t be the iteration index (t ≥ 1). According to the CR updating rule in Eq. (5), we have Proof This can be proved by showing that the function in Eq. (3) is convex. The Hessian matrix of Eq. (3) is ▽ 2 J CR = 2((1 + 2β)I n − (cG + 2βỸ)). Following the similar idea as in the proof of Theorem 2, we have that the eigenvalues of ▽ 2 J CR are no less than 2(1 − c). Since 0 < c < 1, ▽ 2 J CR is positive-definite. Therefore, Eq.
Author details 1 Department of Electrical Engineering and Computer Science, Case Western Reserve University, 10900 Euclid