Efficient proximal gradient algorithm for inference of differential gene networks

Background Gene networks in living cells can change depending on various conditions such as caused by different environments, tissue types, disease states, and development stages. Identifying the differential changes in gene networks is very important to understand molecular basis of various biological process. While existing algorithms can be used to infer two gene networks separately from gene expression data under two different conditions, and then to identify network changes, such an approach does not exploit the similarity between two gene networks, and it is thus suboptimal. A desirable approach would be clearly to infer two gene networks jointly, which can yield improved estimates of network changes. Results In this paper, we developed a proximal gradient algorithm for differential network (ProGAdNet) inference, that jointly infers two gene networks under different conditions and then identifies changes in the network structure. Computer simulations demonstrated that our ProGAdNet outperformed existing algorithms in terms of inference accuracy, and was much faster than a similar approach for joint inference of gene networks. Gene expression data of breast tumors and normal tissues in the TCGA database were analyzed with our ProGAdNet, and revealed that 268 genes were involved in the changed network edges. Gene set enrichment analysis identified a significant number of gene sets related to breast cancer or other types of cancer that are enriched in this set of 268 genes. Network analysis of the kidney cancer data in the TCGA database with ProGAdNet also identified a set of genes involved in network changes, and the majority of the top genes identified have been reported in the literature to be implicated in kidney cancer. These results corroborated that the gene sets identified by ProGAdNet were very informative about the cancer disease status. A software package implementing the ProGAdNet, computer simulations, and real data analysis is available as Additional file 1. Conclusion With its superior performance over existing algorithms, ProGAdNet provides a valuable tool for finding changes in gene networks, which may aid the discovery of gene-gene interactions changed under different conditions. Electronic supplementary material The online version of this article (10.1186/s12859-019-2749-x) contains supplementary material, which is available to authorized users.


Background
Genes in living cells interact and form a complex network to regulate molecular functions and biological processes. Gene networks can undergo topological changes depending on the molecular context in which they operate [1,2]. For example, it was observed that transcription factors (TFs) can bind to and thus regulate different target genes under varying environmental conditions Although a number of computational methods have been developed to infer the structure of gene regulatory networks from gene expression and related data [9][10][11][12], they are mainly concerned with the static structure of gene networks under a single condition. These methods rely on similarity measures such as the correlation or mutual information [13,14], Gaussian graphical models (GGMs) [15,16], Bayesian networks [17,18], or linear regression models [19][20][21][22]. Refer to [12] for description of more network inference methods and their performance. Existing methods for the analysis of differential gene interactions under different conditions typically attempt to identify differential co-expression of genes based on correlations between their expression levels [23]. While it is possible to use an existing method to infer a gene network under different conditions separately, and then compare the inferred networks to determine their changes, such an approach does not jointly leverage the data under different conditions in the inference; thus, it may markedly sacrifice the accuracy in the inference of network changes.
In this paper, we develop a very efficient proximal gradient algorithm for differential network (ProGAdNet) inference, that jointly infers gene networks under two different conditions and then identifies changes in these two networks. To overcome the challenge of the small sample size and a large number of unknowns, which is common to inference of gene networks, our method exploits two important attributes of gene networks: i) sparsity in the underlying connectivity, meaning that the number of gene-gene interactions in a gene network is much smaller than the number of all possible interactions [19,[24][25][26]; and, ii) similarity in the gene networks of the same organism under different conditions [4,7], meaning that the number of interactions changed in response to different conditions is much smaller than the total number of interactions present in the network. A similar network inference setup was considered in [27] for inferring multiple gene networks, but no new algorithm was developed there; instead [27] adopted the lqa algorithm of [28] that was designed for generalized linear models. Our computer simulations demonstrated superior performance of our ProGAdNet algorithm relative to existing methods including the lqa algorithm. Analysis of a set of RNA-Seq data from normal tissues and breast tumors with ProGAdNet identified genes involved in changes of the gene network.
The differential gene-gene interactions identified by our ProGAdNet algorithm yield a list of genes that may not be differentially expressed under two different conditions. Comparing with the set of differentially expressed genes under two conditions, this set of genes may provide additional insight into the molecular mechanism behind the phenotypical difference of the tissue under different conditions. Alternatively, the two gene networks inferred by our ProGAdNet algorithm can be used for further differential network analysis (DiNA). DiNA has received much attention recently; the performance of ten DiNA algorithms was assessed in [29] using gene networks and gene/microRNA networks. Given two networks with the same set of nodes, a DiNA algorithm computes a score for each node based on the difference of global and/or local topologies of the two networks, and then ranks nodes based on these scores. Apparently, DiNA relies on the two networks that typically need to be constructed from certain data. Our ProGAdNet algorithm provides an efficient and effective tool for constructing two gene networks of the same set of genes from gene expression data under two different conditions, which can be used by a DiNA algorithm for further analysis.

Gene network model
Suppose that expression levels of p genes have been measured with microarray or RNA-seq, and let X i be the expression level of the ith gene, where i = 1, . . . , p. To identify the possible regulatory effect of other genes on the ith gene, we employ the following linear regression model as also used in [19][20][21][22] where μ i is a constant and E i is the error term, and unknown regression coefficients (b ji )'s reflect the correlation between X i and X j after adjusting the effects of other variables, X k 's, k / ∈ {i, j}. This adjusted correlation may be the result of possible interaction between genes i and j. The nonzero (b ji )'s define the edges in the gene network. Suppose that n samples of gene expression levels of the same organism (or the same type of tissue of an organism) under two different conditions are available, and let n × 1 vectors x i andx i contain these n samples of the ith gene under two conditions, respectively. Define n × p matrices . . ,μ p ] T , and p×p matrices B andB whose element on the ith column and the jth row are b ji andb ji , respectively. Letting b ii =b ii = 0, model (1) yields the following where 1 is a vector with all elements equal to 1, and n × p matrices E andẼ contain error terms. Matrices B andB characterize the structure of the gene networks under two conditions. Our main goal is to identify the changes in the gene network under two conditions, namely, those edges from gene j to gene i such that b ji −b ji = 0, j = i. One straightforward way to do this is to estimate B andB separately from two linear models in (2), and then find gene pairs (i, j) for which b ji −b ji = 0. However, this approach may not be optimal, since it does not exploit the fact that the network structure does not change significantly under two conditions, that is, most entries of B andB are identical. A better approach is apparently to infer gene networks under two conditions jointly, which can exploit the similarity between two network structures and thereby improve the inference accuracy.
If we denote the ith column of B andB as b i andb i , we can also write model (2) for each gene separately as follows: where e i andẽ i are the ith column of E and E, respectively. We can estimate μ i andμ i using the least square estimation method and substitute the estimates into the linear regression model, which is equivalent to centering each column of X andX, i.e., subtracting the mean of each column from each element of the column. From now on, we will drop μ i andμ i from the model and assume that columns of X andX have been centered. To remove the constraints b ii = 0, i = 1, . . . , p, we define matrices The regression model for the gene network under two conditions can be written as Based on (3), we will develop a proximal gradient algorithm to infer β i andβ i jointly, and identify changes in the network structure.

Network inference Optimization formulation
As argued in [19,30,31], gene regulatory networks or more general biochemical networks are sparse, meaning that a gene directly regulates or is regulated by a small number of genes relative to the total number of genes in the network. Taking into account sparsity, only a relatively small number of entries of B andB, or equivalently entries of β i andβ i , i = 1, . . . , p, are nonzero. These nonzero entries determine the network structure and the regulatory effect of one gene on other genes. As mentioned earlier, the gene network of an organism is expected to have similar structure under two different conditions. For example, the gene network of a tissue in a disease (such as cancer) state may have changed, comparing to that of the same tissue under the normal condition, but such change in the network structure is expected to be small relative to the overall network structure. Therefore, it is reasonable to expect that the number of edges that change under two conditions is small comparing with the total number of edges of the network. Taking into account sparsity in B andB and also the similarity between B andB, we formulate the following optimization problem to jointly infer gene networks under two conditions: where · stands for Euclidean norm, · 1 stands for l 1 norm, and λ 1 and λ 2 are two positive constants. The objective function in (4) consists of the squared error of the linear regression model (1) and two regularization terms λ 1 ( β i 1 + β i 1 ) and λ 2 β i −β i 1 . Note that unlike the GGM, the regularized least squared error approach here does not rely on the Gaussian assumption. The two regularization terms induce sparsity in the inferred networks and network changes, respectively. This optimization problem is apparently convex, and therefore it has a unique and globally optimal solution. Note that the term λ 2 β i −β i 1 is reminiscent of the fused Lasso [32]. However, all regression coefficients in the fused Lasso are essentially coupled, whereas here the term λ 2 β i −β i 1 only couples each pair of regression coefficients, β ij and β ij . As will be described next, this enables us to develop an algorithm to solve optimization problem (4) that is different from and more efficient than the algorithm for solving the general fused Lasso problem. Note that an optimization problem similar to (4) was formulated in [27] for inferring multiple gene networks, but no new algorithm was developed, instead the problem was solved with the lqa algorithm [28] that was developed for general penalized maximum likelihood inference of generalized linear models including the fused Lasso. Our computer simulations showed that our algorithm not only is much faster than the lqa algorithm, but also yields much more accurate results.

Proximal Gradient Solver
, and let us separate the objective function in (4) into the differentiable part g 1 (α i ) and the non-differentiable part g 2 (α i ) given by Applying the proximal gradient method [33] to solve the optimization problem (4), we obtain an expression for α i in the rth step of the iterative procedure as follows: where prox stands for the proximal operator defined as prox λf (t) := arg min x f (x) + 1 2λ ||x − t|| 2 for function f (x) and a constant vector t, and ∇g 1 (α i ) is the gradient of g 1 (α i ). Generally, the value of step size λ (r) can be found using a line search step, which can be determined from the Lipschitz constant [33]. For our problem, we will provide a closed-form expression for λ (r) later. Since g 1 (α i ) is simply in a quadratic form, its gradient can be obtained readily as (6) can be written as It is seen that the optimization problem in proximal operator (7) can be decomposed into p − 1 separate problems as follows where β ij andβ ij are the jth element of β i andβ i , respectively, and t j andt j are the jth element of t andt, respectively. The optimization problem (8) is in the form of the fused Lasso signal approximator (FLSA) [34]. The general FLSA problem has many variables, and numerical optimization algorithms were developed to solve the FLSA problem [34,35]. However, our problem has only two variables, which enables us to find the solution of (8) in closed form. This is then used in each step of our proximal gradient algorithm for network inference. Let us define a soft-thresholding operator S(x, a) as follows where a is a positive constant. Then as shown in [34], if the solution of (8) at ij , the solution of (8) at λ 1 > 0 is given byβ . Therefore, if we can solve the problem (8) at λ 1 = 0, we can find the solution of (8) at any λ 1 > 0 from (10). It turns out that the solution of (8) at λ 1 = 0 can be found as . Therefore, our proximal gradient method can solve the network inference problem (6) efficiently through an iterative process, where each step of the iteration solves the optimization problem (6) in closed form specified by (10) and (11). To obtain a complete proximal gradient algorithm, we need to find the step size λ (r) as will be described next.

Stepsize
As mentioned in [33], if the step size λ (r) ∈[ 0, 1/L], where L is the Lipschitz constant of ∇g 1 (α i ), then the proximal gradient algorithm converges to yield the optimal solution. We next derive an expression for L. Specifically, we need to find L such that for any β (1) i ,β have the same set of eigenvalues. And thus, γ can be found using a numerical algorithm with a computational complexity of O((min(n, p)) 2 ). After obtaining L, the step size of our proximal gradient algorithm can be chosen to be λ (r) = 1/L. Note that λ (r) does not change across iterations, and it only needs to be computed once. Since the sum of the eigenvalues of a matrix is equal to the trace of matrix, another possible value for L is , which can save the cost of computing γ andγ . However, this value of L is apparently greater than 2(γ +γ ), which reduces the step size λ (r) , and may affect the convergence speed of the algorithm.

Algorithm
The proximal gradient solver of (4) for inference of differential gene networks is abbreviated as ProGAdNet, and is summarized in the following table.

Maximum values of λ 1 and λ 2
The ProGAdNet solver of (4) is outlined in Algorithm 1 with a specific pair of values of λ 1 and λ 2 . However, we typically need to solve the optimization problem (4) over a set of values of λ 1 and λ 2 , and then either use cross validation to determine the optimal values of λ 1 and λ 2 , or use the stability selection technique to determine nonzero elements of β i andβ i , as will be described later. Therefore, we also need to know the maximum values of λ 1 and λ 2 . In the following, we will derive expressions for the maximum values of λ 1 and λ 2 .
When we determine the maximum values of λ 1 (denoted as λ 1 max ), λ 2 can be omitted in our optimization problem, since when λ 1 = λ 1 max , we have β ij = 0 andβ ij = 0, ∀i and j. Thus, we can use the same method for determining the maximum value of λ in the Lasso problem [36] to find λ 1 max , which leads to The maximum value of λ 2 , λ 2 max depends on λ 1 . It is difficult to find λ 2 max exactly. Instead, we will find an upper-bound for λ 2 max . Let us denote the objective function in (4) as J(β i ,β i ), and let the jth column of X −i (X −i ) be z i (z i ). If the optimal solution of (4) is β i =β i = β * , then the subgradient of J(β i ,β i ) at the optimal solution should contain the zero vector, which yields where the maximum value of λ 2 can be written as To find λ 2 max from (15), we need to know β * . This can be done by solving the Lasso problem that minimizes using an efficient algorithm such as glmnet [37].

Stability selection
As mentioned earlier, parameter λ 1 encourages sparsity in the inferred gene network, while λ 2 induces sparsity in the changes of the network under two conditions. Generally, larger values of λ 1 and λ 2 induce a higher level of sparsity. Therefore, appropriate values of λ 1 and λ 2 need to be determined, which can be done with cross validation [37]. However, the nonzero entries of matrices B andB, estimated with a specific pair of values of λ 1 and λ 2 determined by cross validation, may not be stable in the sense that small perturbation in the data may result in considerably different B andB. We can employ an alternative technique, named stability selection [38], to select stable variables, as described in the following.
ij ,m (k) ij , and m (k) ij be the number of nonzerob ij 's andb ij 's, and (b ij −b ij )'s, respectively, obtained with the kth pair of (λ 1 , λ 2 ). Then, ij /(NK) give the frequency of an edge from gene j to gene i detected under two conditions, and the frequency of the changes for an edge from gene j to gene i, respectively. A larger r ij ,r ij , or r ij indicates a higher likelihood that an edge from gene j to gene i exists, or the edge from gene j to gene i has changed. Therefore, we will use r ij ,r ij and r ij to rank the reliability of the detected edges and the changes of edges, respectively. Alternatively, we can declare an edge from gene j to gene i exists if r ij ≥ c orr ij ≥ c; and similarly the edge between gene j to gene i has changed if r ij ≥ c, where c is constant and can be any value in [ 0.6, 0.9] [38].
The software package in Additional file 1 includes computer programs that implement Algorithm 1, as well as stability selection and cross validation. The default values for parameters α 1 , α 2 , k 1 , and k 2 in stability selection are 0.7, 0.8, 10, and 10, respectively. In cross validation, a set S 1 of k 1 values of λ 1 and a set S 2 (λ 1 ) of k 2 values of λ 2 for each λ 1 were created similarly, and the default values of α 1 , α 2 , k 1 , and k 2 are 0.6952, 0.3728, 20, and 8, respectively.

Software glmnet and lqa
Two software packages, glmnet and lqa, were used in computer simulations. The software glmnet [37] for solving the Lasso problem is available at https://cran.r-project. org/web/packages/glmnet. The software lqa [28] used in [27] for inferring multiple gene networks is available at https://cran.r-project.org/web/packages/lqa/.

Computer simulation with linear regression model
We generated data from one of p pairs of linear regression models in (3) instead of all p pairs of simultaneous equations in (2), or equivalently (3), as follows. Without loss of generality, let us consider the first equation in (3). The goal was to estimate β 1 andβ 1 , and then identify pairs (β i1 ,β i1 ), where β i1 =β i1 . Entries of n × (p − 1) matrices X −1 andX −1 were generated independently from the standardized Gaussian distribution. In the first simulation setup, we chose n = 100 and p − 1 = 200. Taking into account the sparsity in β 1 , we let 10% of β 1 's entries be nonzero. Therefore, twenty randomly selected entries of β 1 were generated from a random variable uniformly distributed over the intervals [ 0.5, 1.5] and [ −1.5, −0.5], and remaining entries were set to zero. Similarly, twenty entries ofβ 1 were chosen to be nonzero. Since the two regression models are similar, meaning that most entries ofβ 1 are identical to those of β 1 ,β 1 was generated by randomly changing 10 entries of β 1 as follows: 4 randomly selected nonzero entries were set to zero, and 6 randomly selected zero entries were changed to a value uniformly distributed over the intervals [ 0.5, 1.5] and [ −1.5, −0.5]. Of note, since the number of nonzero entries in β 1 is greater than the number of zero entries, the number of entries changed from zero to nonzero (which is 6) is greater than the number of entries changed from nonzero to zero (which is 4). The noise vectors e 1 andẽ 1 were generated from a Gaussian distribution with mean zero and variance σ 2 varying from 0.01 to 0.05, 0.1, and 0.5, and then x 1 andx 1 were calculated from (3).
Simulated data x 1 ,x 1 , X −1 andX −1 were analyzed with our ProGAdNet, lqa [28] and glmnet [37]. Since lqa was employed by [27], the results of lqa represent the performance of the network inference approach in [27]. The glmnet algorithm implements the Lasso approach in [39]. Both ProGAdNet and lqa estimate β 1 andβ 1 jointly by solving the optimization problem (4), but glmnet estimates β 1 andβ 1 separately, by solving the following two problems separately,β 1 = arg min β 1 The lqa algorithm uses a local quadratic approximation of the nonsmooth penalty term [40] in the objective function, and therefore, it cannot shrink variables to zero exactly. To alleviate this problem, we setβ i1 = 0 if |β i1 | < 10 −4 , and similarlyˆβ i1 = 0 if |ˆβ i1 | < 10 −4 , wherê β i1 andˆβ i1 represent the estimates of β i1 andβ i1 , respectively. Five fold cross validation was used to determine the optimal values of parameters λ 1 and λ 2 in the optimization problem. Specifically, for ProGAdNet and lqa, the prediction error (PE) was estimated at each pair of values of λ 1 and λ 2 , and the smallest PE along with the corresponding values of λ 1 and λ 2 , λ 1 min and λ 2 min , were determined. Then, the optimal values of λ 1 and λ 2 were the values corresponding to the PE that was two standard error (SE) greater than the minimum PE, and were greater than λ 1 min and λ 2 min , respectively. For glmnet, the optimal values of λ 1 and λ 2 were determined separately also with the two-SE rule. The inference process was repeated for 50 replicates of the data, and the detection power and the false discovery rate (FDR) for (β 1 ,β 1 ) and β = β 1 −β 1 calculated from the results of the 50 replicates in the first simulation setup are plotted in Fig. 1. It is seen that all three algorithms offer almost identical power equal or close to 1, but exhibit different FDRs. The FDR of lqa is the highest, whereas the FDR of ProGAdNet is almost the same as that of glmnet for β 1 andβ 1 , and the lowest for β 1 . In the second simulation setup, we let sample size n = 150, noise variance σ 2 = 0.1, and the number of variables p − 1 be 500, 800, and 1000. Detection power and FDR are depicted in Fig. 2. Again, the three algorithms have almost identical power, and ProGAdNet offers an FDR similar to that of glmnet, but lower than that of lqa for β 1 andβ 1 , and the lowest FDR for β 1 . Simulation results in Figs. 1 and 2 demonstrate that our ProGAdNet offers the best performance when compared with glmnet and lqa. The CPU times of one run of ProGAdNet, lqa, and glmnet for inferring a linear model with n = 150, p − 1 = 1, 000, and σ 2 = 0.1 at the optimal values of λ 1 and λ 2 were 5.82, 145.15, and 0.0037 s, respectively.

Computer Simulation with Gene Networks
The GeneNetWeaver software [41] was used to generate gene networks whose structures are similar to those of real gene networks. Note that GeneNetWeaver was also employed by the DREAM5 challenge for gene network inference to simulate golden standard networks [12]. GeneNetWeaver outputs an adjacency matrix to characterize a specific network structure. We chose the number of genes in the network to be p = 50, and obtained a p × p adjacency matrix A through GeneNetWeaver. The number of nonzero entries of A, which determined the edges of the network, was 62. Hence the network is sparse, as the total number of possible edges is p(p − 1) = 2, 450. We randomly changed 6 entries of A to yield another matrix A as the adjacency matrix of the gene network under another condition. Note that the number of changed edges is small relative to the number of existing edges.
After the two network topologies were generated, the next step was to generate gene expression data. Letting a ij be the entry of A on the ith row and the jth column, we generated a p × p matrix B such that b ij = 0 if a ij = 0, and b ij was randomly sampled from a uniform random variable on the intervals [ −1, 0) and (0, 1] if a ij = 0. Another p × p matrixB was generated such thatb ij = b ij ifã ij = a ij , orb ij was randomly generated from a uniform random variable on the intervals [ −1, 0) and (0, 1] if a ij = a ij . Note that (2) gives X = E(I − B) −1 andX = E(I −B) −1 . These relationships suggest generating first entries of E andẼ independently from a Gaussian distribution with zero mean and unit variance, and then finding matrices X andX from these two equations, respectively. With real data, gene expression levels X andX are measured with techniques such as microarray or RNA-seq, and there are always measurement errors. Therefore, we simulated measured gene expression data as Y = X + V andỸ =X +Ṽ, where V andṼ model measurement errors that were independently generated from a Gaussian distribution with zero mean and variance σ 2 that will be specified later. Fifty pairs of network replicates and their gene expression data were generated independently.
Finally, gene networks were inferred with our ProGAd-Net algorithm by solving the optimization problem (4), where x i , X −i ,x i , andX −i were replaced with the measured gene expression data y i , Y −i ,ỹ i , andỸ −i . Stability selection was employed to rank the edges that were changed under two conditions. As comparison, we also used Lasso to infer the network topology under each condition by solving the following optimization problemŝ Note that each optimization problem can be decomposed into p separate problems that can be solved with Lasso. The glmnet algorithm [37] was again used to implement Lasso. The stability selection technique was employed again to rank the differential edges detected by Lasso. The lqa algorithm was not considered to infer simulated gene networks, because it is very slow and its performance is worse than ProGAdNet and Lasso as shown in the previous section. We also employed the GENIE3 algorithm in [42] to infer B andB separately, because GENIE3 gave the best overall performance in the DREAM5 challenge [12]. Finally, following the performance assessment procedure in [12], we used the precision-recall (PR) curve and the area under the PR curve (AUPR) to compare the performance of ProGAdNet with that of Lasso and GENIE3. For ProGAdNet and Lasso, the estimate of B = B −B was obtained, and the nonzero entries of B were ranked based on their frequencies obtained in stability selection. Then, the PR curve for changed edges was obtained from the ranked entries of B from pooled results for the 50 network replicates. Two lists of ranked network edges were obtained from GENIE3: one for B and the other forB. For each cutoff value of the rank, we obtain an adjacency matrix A from B as follows: the (i, j)th entry of A a ij = 1 if b ij is above the cutoff value, and otherwise a ij = 0. Similarly, another adjacency matrixÃ was obtained fromB. Then, the PR curve for changed edges detected by GENIE3 was obtained from A −Ã, again from pooled results for the 50 network replicates. Figures 3 and 4 depict the PR curves of ProGAdNet, Lasso, and GENIE3 for measurement noise variance σ 2 = 0.05 and 0.5, respectively. The number of samples varies from 50, 100, 200 to 300. It is seen from Fig. 3 that our ProGAdNet offers much better performance than Lasso and GENIE3. When the noise variance increases from 0.05 to 0.5, the performance of all three algorithms degrades, but our ProGAdNet still outperforms Lasso and GENIE3 considerably, as shown in Fig. 4. Table 1 lists AUPRs of ProGAdNet, Lasso and GENIE3, which again shows that our ProGAdNet outperforms Lasso and GENIE3 consistently at all sample sizes.

Analysis of breast cancer data
We next use the ProGAdNeT algorithm to analyze RNAseq data of breast tumors and normal tissues. In The Cancer Genome Atlas (TCGA) database, there are RNAseq data for 1098 breast invasive carcinoma (BRCA) samples and 113 normal tissues. The RNA-seq level 3 data for 113 normal tissues and their matched BRCA tumors were downloaded. The TCGA IDs of these 226 samples are given in Additional file 2. The scaled estimates of gene expression levels in the dataset were extracted, and they were multiplied by 10 6 , which yielded transcripts per million value of each gene. The batch effect was corrected with the removeBatchEffect function in the Limma package [43] based on the batch information in the TCGA barcode of each sample (the "plate" field in the barcode). The RNA-seq data include expression levels of 20,531 genes. Two filters were used to obtain informative genes for further network analysis. First, genes with their expression levels in the lower 30 percentile were removed. Second, the coefficient of variation (CoV) was calculated for each of the remaining genes, and then genes with their CoVs in the lower 70 percentile were discarded. This resulted in 4310 genes, and their expression levels in 113 normal tissues and 113 matched tumor tissues were used by the ProGAdNet algorithm to jointly infer the gene networks in normal tissues and tumors, and then to identify the difference in the two gene networks. The list of the 4310 genes is in Additional file 3, and their expression levels in tumors and normal tissues are in two data files in the software package in Additional file 1.  Since small changes in b ji in the network model (1) may not have much biological effect, we regarded the regulatory effect from gene j to gene i to be changed using the following two criteria rather than the simple criterioñ b ji = b ji . The first criterion is |b ji − b ji | ≥ min{|b ji |, |b ji |}, which ensures that there is at least one-fold change relative to min{|b ji |, |b ji |}. However, when one ofb ji and b ji is zero or near zero, this criterion does not filter out very small |b ji − b ji |. To avoid this problem, we further considered the second criterion. Specifically, nonzerob ji and b ji for all j and i were obtained, and the 20 percentile value of all |b ji | and |b ji | , T, was found. Then, the second criterion is max{|b ji |, |b ji |} ≥ T. As in computer simulations, the stability selection was employed to identify network changes reliably. As the number of genes, 4310, is quite big, it is time consuming to repeat 100 runs per λ 1 and λ 2 pair. To reduce the computational burden, we used fivefold cross validation to choose the optimal values of λ 1 and λ 2 based on the two-SE rule used in computer simulation, and then performed stability selection with 100 runs for the pair of optimal values. Note that stability selection at an appropriate point of hyperparameters is equally valid compared with that done along a path of hyperparameters [38]. The threshold for r ij for determining network changes as described in the Method section was chosen to be c = 0.9.
Our network analysis with ProGAdNeT identified 268 genes that are involved in at least one changed edge. Names of these genes are listed in Additional file 4. We named the set of these 268 genes as the dNet set. We also extracted the raw read count of each gene from the RNA-seq dataset and employed DESeq2 [44] to detect the differentially expressed genes. The list of 4921 differentially expressed genes detected at FDR < 0.001 and fold change ≥ 1 is also in Additional file 4. Among 268 dNet genes, 196 genes are differentially expressed, and the remaining 72 genes are not differentially expressed, as shown in Additional file 4.
To assess whether the dNet genes relate to the disease status, we performed gene set enrichment analysis (GSEA) with the C2 gene sets in the molecular signatures database (MSigDB) [45,46]. C2 gene sets consist of 3777 human gene sets that include pathways in major pathway dabases such as KEGG [47], REACTOME [48], and BIOCARTA [49]. After excluding gene sets with more than 268 genes or less than 15 genes, 2844 gene sets remained. Of note, the default value for the minimum gene set size at the GSEA website is 15. Here we also excluded the gene sets whose size is greater than 268 (the size of the dNet set), because large gene sets may tend to be enriched in a small gene set by chance. Searching over the names of these 2844 gene sets with key words "breast cancer", "breast tumor", "breast carcinoma" and "BRCA" through the "Search Gene Sets" tool at the GSEA website identified 258 gene sets that are related to breast cancer. Using Fisher's exact test, we found that 121 of 2844 C2 gene sets were enriched in the dNet gene set at a FDR of < 10 −3 . The list of the 121 gene sets is in Additional file 5. Of these 121 gene sets, 31 are among the 258 breast cancer gene sets, which is highly significant (Fisher's exact test p-value 2 × 10 −7 ). The top 20 enriched gene sets are listed in Table 2. As seen from names of these gene sets, 11 of the 20 gene sets are breast cancer gene sets, and 7 sets are related to other types of cancer. These GSEA results clearly show that the dNet gene set that our ProGAdNet algorithm identified is very relevant to the breast cancer.

Analysis of kidney cancer data
We also analyzed another dataset in the TCGA database, the kidney renal clear cell carcinoma (KIRC) dataset, which contains the RNA-seq data of 463 tumors and 72 normal tissues. The RNA-seq level 3 data for the 72 normal tissues and their matched tumors were downloaded. The TCGA IDs of these 144 samples are given in Additional file 6. We processed the KIRC data in the same way as in processing the BRCA data. After the two filtering steps, we again got expression levels of 4310 genes. The list of the 4310 genes is in Additional file 7, and their expression levels in 72 tumors and 72 normal tissues are in two data files in Additional file 1.
Analysis of the KIRC data with ProGAdNet identified 1091 genes that are involved in at least one changed edge. We chose the top 460 genes that are involved in at least 3 changed edge to do further GSEA. Names of these 460 genes are listed in Additional file 8. We named the set of these 460 genes as the dNetK set. We also extracted the raw read count of each gene from the RNA-seq dataset and employed DESeq2 [44] to detect the differentially Table 2 Top 20 MSigDB C2 gene sets that are enriched in the dNet gene set identified from the BRCA data Eleven gene sets related to BRCA are highlighted expressed genes. The list of 5432 differentially expressed genes detected at FDR < 0.001 and fold change ≥ 1 is also in Additional file 8. Among 460 dNetK genes, 395 genes are differentially expressed, and the remaining 65 genes are not differentially expressed, as shown in Additional file 8.
After excluding genes sets with more than 460 genes or less than 15 genes from the 3777 human C2 gene sets in MSigDB, we obtained 3019 gene sets for GSEA. Using Fisher's exact test, we found 251 of the 3019 C2 gene sets were enriched in the dNetK set of 460 genes at a FDR of < 10 −3 . The list of the 251 gene sets is in Additional file 9. The top 20 enriched gene sets are listed in Table 3. Among the top 20 gene sets, 2 gene sets are related to kidney diseases, 8 gene sets are related to several different types of cancer, and 5 gene sets are related to the immune system. The 460 genes were ranked according to the number of changed edges that these genes are involved in, and the top 10 genes are NADH Dehydrogenase

Discussion
Computer simulations demonstrated that our ProGAd-Net significantly outperformed three other algorithms, glmnet, GENIE3, and lqa, in detecting network changes. The performance gain of ProGAdNet over glmnet and GENIE3 is expected, because glmnet and GENIE3 infer two gene networks separately, while ProGAdNet infers two networks jointly and takes into account the similarity between two networks. Although ProGAdNet and lqa solve the same optimization problem, ProGAdNet significantly outperforms lqa. The performance gap is due to the fact that the lqa algorithm uses an approximation of the objective function, whereas our algorithm solves optimization problem (4) exactly. In other words, our Pro-GAdNet algorithm can always find the optimal solution to the optimization problem, since the objective function is convex, but the lqa algorithm generally cannot find the optimal solution. Moreover, our computer simulations show that our ProGAdNet algorithm is much faster than the lqa algorithm.
As mentioned earlier, eight of the top 10 genes in the differential gene network of KIRC except C3 and C4A have been reported to be implicated in renal cell carcinoma (RCC). Specifically, NDUFA4L2 is overexpressed in clear cell RCC (ccRCC); its mRNA level is correlated with tumor stage and overall survival time [50,51]; and the association of NDUFA4L2 with ccRCC is regulated by ELK1 [52]. UMOD expression is downregulated in RCC [53,54], and the homozygous genotype of UMOD is associated with more aggressive RCC [55]. ANGPTL4 plays an important role in several cancers [56][57][58][59][60]. It has been showed that the serum ANGPTL4 level in RCC patients were higher than the level in patients with other types of solid tumor, such as bladder cancer, breast cancer, gastrointestinal cancer and lung adenocarcinoma, suggesting that the serum ANGPTL4 may be a diagnostic and prognostic biomarker for RCC [61]. NNMT is over-expressed in RCC; its high expression level is significantly associated with unfavorable prognosis of RCC [62]; and it can potentially serve as a biomarker for early detection of RCC [63]. CA9 is a transmembrane member of the carbonic anhydrase family. It is not expressed in healthy renal tissue, but is overexpressed in most ccRCC; it is a promising molecular marker for diagnosis, prognosis and therapy of CCRCC [64,65]. IGFBP3 shows overexpression in association with markers of poor prognosis in many tumour types [66], including RCC [67]. Single nucleotide polymorphisms (SNPs) at the APOE/C1 locus are found to be associated with RCC risk [68]. The expression of vimentin was upregulated significantly in RCC [69,70], and the expression level of vimetin was positively correlated with the pathological grade and clinical stage of RCC [70]. These results show that the dNetK gene set that our ProGAdNet algorithm identified from the KIRC dataset is informative about the cancer disease status.

Conclusion
In this paper, we developed a very efficient algorithm, named ProGAdNet, for inference of two gene networks based on gene expression data under two different conditions, which were further used to identify differential changes in the network. Computer simulations showed that our ProGAdNet offered much better inference accuracy than existing algorithms. Analysis of a set of RNA-seq data of breast tumors and normal tissues with ProGAdNet identified a set of genes involved in differential changes of the gene network. A number of gene sets of breast cancer or other types of cancer are significantly enriched in the identified gene set. Network analysis of a kidney cancer dataset also identified a set of genes involved in network changes, and the majority of the top genes identified have been reported to be implicated in kidney cancer. These results show that the identified gene sets are very informative about the disease status of the tissues. As gene network rewiring occurs frequently under different molecular context, our ProGAdNet algorithm provides a valuable tool for identifying changed gene-gene interactions.