Volume 13 Supplement 9
Selected articles from the IEEE International Conference on Bioinformatics and Biomedicine 2011: Bioinformatics
Inference of gene regulatory subnetworks from time course gene expression data
 XiJun Liang^{1},
 Zhonghang Xia^{2}Email author,
 LiWei Zhang^{1} and
 FangXiang Wu^{3}
DOI: 10.1186/1471210513S9S3
© Liang et al; licensee BioMed Central Ltd. 2012
Published: 11 June 2012
Abstract
Background
Identifying gene regulatory network (GRN) from time course gene expression data has attracted more and more attentions. Due to the computational complexity, most approaches for GRN reconstruction are limited on a small number of genes and low connectivity of the underlying networks. These approaches can only identify a single network for a given set of genes. However, for a largescale gene network, there might exist multiple potential subnetworks, in which genes are only functionally related to others in the subnetworks.
Results
We propose the network and community identification (NCI) method for identifying multiple subnetworks from gene expression data by incorporating community structure information into GRN inference. The proposed algorithm iteratively solves two optimization problems, and can promisingly be applied to largescale GRNs. Furthermore, we present the efficient Block PCA method for searching communities in GRNs.
Conclusions
The NCI method is effective in identifying multiple subnetworks in a largescale GRN. With the splitting algorithm, the Block PCA method shows a promosing attempt for exploring communities in a largescale GRN.
Background
Rapid advances in highthroughput DNA microarray technology generate a huge amount of time course gene expression data which, in turn, calls for efficient computational models to characterize the network of genetic regulatory interactions. A number of methods have been proposed to infer GRNs from gene expression data. Boolean networks [1] use two states, "ON" or "OFF" to represent the state of each gene, and each state at the next time step is determined by Boolean logical rules. Bayesian Networks [2] infer causal relationships between two genes according to conditional probability functions. The stochastic nature makes them more accurate in modeling the dynamics and nonlinearity of gene regulation in largescale systems. Bayesian Networks, however, usually do not include cycles and, thus, are difficult to deal with feedback motifs. Ordinary differential equations (ODEs) models [3–5] overcome this problem by modeling GRNs as a set of differential equations. Some other models such as signed directed graphs, multiple regression, state space model, etc., are addressed in the survey [6].
Whereas most of the existing work focuses on smallsized GRNs, limited attention has been given to interactions among large scale genes. Conventional approaches are usually designed for the network with connectivity less than a small fixed number [7]. Computational complexity is a major obstacle in reconstruction of large scale GRNs as determining the parameters in such a network is timeconsuming. Sparsity is a common assumption used in modeling GRNs to reduce the computational complexity. Typically, in a sparse network, one gene interacts with only a couple of genes [7].
Recently, Yuan et al. [8] proposed a directed partial correlation (DPC) method for regulatory network inference on largescale gene data. The DPC method combines the directed network inference approach and Granger causality concept for causal inference on time series data to reconstruct largescale GRNs. Although modular discovery was provided by biclustering in gene expression data, the DPC method cannot present multiple subnetworks simultaneously.
We propose the NCI method for subnetwork identification by detecting community structures from largescale gene expression data. Usually, GRNs have community structures: genes in the same groups are found with high density of "withingroup" interactions and genes in different groups with low density of "betweengroup" interactions [9]. Many algorithms have been proposed to detect community structures by clustering [9–15]. To accomodate the largescale GRN inference, we particularly propose a block principal component analysis (Block PCA) method, which explores community structure information for the NCI method.
The NCI method repeats two steps: (1) Nstep: identify possible gene regulatory networks; (2) Cstep: estimate community structure. At the Nstep, a convex quadratic programming, formulated for the community structure, is solved to infer possible GRNs. This quadratic programming can be identically divided into n (the total number of genes) subproblems, each of which has a much smaller dimension, and, thus, adapt to largescale networks. At the Cstep, the NCI method estimates community structure from the GRNs identified at the first step. When the algorithm terminates, a network with community structures is obtained.
Methods
An ODE model for GRNs
where A = e^{C Δt}+ (e^{C Δt} I)C^{1}SF.
where γ is a positive scalar, A_{1} = ∑_{ i, j }a_{ ij } is the l_{1}norm of matrix A.
The NCI method
Since rows of A are independent in the objective function and constraints, problem (7) can be divided into n subproblems and solved individually [16]. However, such a solution does not consider the information of community structure which implies multiple subnetworks. In this section, we propose the NCI method to overcome this problem. An observation is that interactions between genes in a community occur more frequently than those between different communities. We introduce a weighted matrix W = (w_{ ij })_{ n × m }to distinguish genes in the communities with those outside. w_{ ij } is assigned a small positive value or zero if gene i and j located in the same community; a relatively large value, otherwise.
Where μ > 0 is a penalty parameter, A = (a_{ ij })_{ n × m }, 〈W, A〉 = trace (W^{ T } A) = ∑_{ i, j }w_{ ij }a_{ ij }. All elements of matrix W are nonnegative.
Where ${\mathbb{1}}_{n,n}\in {\mathbb{R}}^{n\times n}$ is the matrix with all 1's.
Algorithm 1
Algorithm 1: The NCI algorithm  

Input: X;  
Output:  matrix A and communities of the GRN; 
Step 0.  (Initiation) W : = 0 ∈ ℝ^{ n × n }. Select μ > 0, γ_{ τ } > 0, λ_{1}∈ (0, λ_{2}). 
Step 1.  (Nstep: identify possible N etworks) Solve (8) to find an approximate matrix A. 
Step 2.  (Cstep: estimate C ommunity structure) Calculate weighted matrix W 1 by Eq. (13), then solve the proposed block PCA model (12) to calculate L*. 
Step 3.  (Update weight matrix) Update W by Eq. (9). If stop criteria are not satisfied, go to Step 1. 
Step 4.  (GRN identification) Identify the communities of the computed GRN by SSVD. 
Some additional details about Algorithm 1:
1. Stop criteria. The NCI algorithm stops when either of the following two criteria meet. (1) Weighted matrix W converges, that is, W^{(k)} W^{(k+1)} ≤ tol for a predefined constant tol > 0, where W^{(k)}denotes W at iteration k; (2) The number of iteration reaches the threshold.
After the iteration terminates, model (8) is solved again to compute the matrix A with γ = γ_{ τ }, where γ_{ τ }, is a parameter in Algorithm 1.
for i = 1, ..., n, where ${X}_{2,i}^{T}$ is the ith row of the matrix ${X}_{2},\phantom{\rule{0.3em}{0ex}}\mathbb{1}={\left[1,\phantom{\rule{2.77695pt}{0ex}}\dots ,\phantom{\rule{2.77695pt}{0ex}}1\right]}^{T}\in {\mathbb{R}}^{n},{w}_{i}^{T},{a}_{i}^{T}$ is the ith row of W and A, respectively. Subproblem (10) can be transformed into a standard (convex) quadratic programming, and solved by software packages such as Mosek or CVX [18].
The Block PCA model
which L_{*} is the nuclear norm of the matrix L, and D is a given matrix.
where W 1 is a weight matrix with all elements nonnegative.
p_{ ij } is the length of the shortest path between the node i and j, p_{0} ≥ 0 is a parameter less than the diameter of the network.
As in Robust PCA (11), the nuclear norm  · _{*} usually induces a low rank matrix and the l_{1} norm  · _{1} induces a sparse matrix [19, 20]. The constraint D = L + E enforces to split matrix D into a low rank matrix L and a sparse matrix E. Different with Robust PCA, the Block PCA adds an extra term λ_{1} 〈W 1, L〉 = λ_{1} ∑w 1_{ ij }·L_{ ij } to (11). The nonnegative weight matrix W 1 stands for the prior knowledge about low rank matrix L.
Splitting algorithm for solving Block PCA
Where θ_{ i }: ${\mathbb{R}}^{{n}_{i}}\to \mathbb{R}$ are closed convex functions, ${A}_{i}\in {\mathbb{R}}^{l\times {n}_{i}}$, b∈ℝ^{ l }.
By letting θ_{1}(·): = ·_{*}, θ_{2}(·): = λ_{1}〈W 1, ·〉, θ_{3}(·): = λ_{2}·_{1}, and $b=\left[\begin{array}{c}\hfill D\hfill \\ \hfill 0\hfill \end{array}\right],{\mathcal{A}}_{1}L=\left[\begin{array}{c}\hfill L\hfill \\ \hfill L\hfill \end{array}\right],{\mathcal{A}}_{2}U=\left[\begin{array}{c}\hfill 0\hfill \\ \hfill U\hfill \end{array}\right]$ and ${\mathcal{A}}_{3}E=\left[\begin{array}{c}\hfill E\hfill \\ \hfill 0\hfill \end{array}\right],$ Block PCA (12) can be treated as a generalized case of (15) with matrix variables L, E, U and linear operators ${\mathcal{A}}_{1},\phantom{\rule{0.3em}{0ex}}{\mathcal{A}}_{2},\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}{\mathcal{A}}_{3}.$
Under the framework of [21], we next present an implementable splitting algorithm for the Block PCA model (12).
t ∈ ℝ and ε > 0. It can be extended to an arbitrary matrix X ∈ ℝ^{ n, n }by applying elementwise operation, denoted by ${\mathcal{S}}_{\epsilon}\left[X\right]$.
Particularly, if W is the matrix with all elements 1, X_{ w }degenerates to X_{1}, and ${\mathcal{S}}_{\epsilon W}\left[X\right]$ degenerates to ${\mathcal{S}}_{\epsilon}\left[X\right]$.
Let $\beta >0,\mu >2,{\Lambda}^{k}=\left[\begin{array}{c}\hfill \begin{array}{c}{\Lambda}_{1}^{k}\\ {\Lambda}_{2}^{k}\end{array}\hfill \end{array}\right],$ where ${\Lambda}_{1}^{k},{\Lambda}_{2}^{k}\in {\mathbb{R}}^{n,n}$. Then, for the calculated (L^{ k }, E^{ k }, U^{k}, Λ^{ k }), the steps for each iterative (L^{k+1}, E^{k+1}, U^{k+1},Λ^{k+1}) for solving (12) are as follows.
where $\tau =\frac{1}{2\beta},Y=\frac{1}{2}\left[D{E}^{k}+{U}^{k}\right]+\frac{1}{2\beta}\left[{\Lambda}_{1}^{k}+{\Lambda}_{2}^{k}\right].$
where $\alpha =\frac{{\lambda}_{2}}{\beta \mu},\mathit{\u1ebc}={E}^{k}+\frac{1}{\beta \mu}{\Lambda}_{1}^{k+\frac{1}{2}}.$
for tolerance ε_{1} > 0, ε_{2} > 0, where ΔL^{ k } = L^{k+1} L^{ k }, ΔE^{ k } = E^{k+1} E^{ k }, ΔU^{ k } = U^{k+1} U^{ k }.
Algorithm 2
Algorithm 2: Splitting algorithm  

Input: W 1.  
Output:  low rank matrix L. 
Step 0.  Initiation. Set k = 0, β > 0, μ > 2, $D={\mathbb{1}}_{n,n}$. Set ε_{1} > 0, ε_{2} > 0, ${\lambda}_{2}=1/\sqrt{n}$, λ_{1} ∈ (0, λ_{2}), L^{0} = D, E^{0} = 0, U^{0} = D. 
Step 1.  Solve L^{k+1}by Eq. (22). 
Step 2.  Update the Lagrangian multiplier ${{\Lambda}^{k}}^{+\frac{1}{2}}$ by Eq. (23). 
Step 3.  
Step 4.  Update the Lagrange multiplier Λ^{k+1}via Eq. (26). 
Step 5.  Terminate if the stop criteria (27) and (28) are satisfied; otherwise, k: = k + 1, goto Step 1. 
In Algorithm 2, arguments β and μ are currently set constant. Adaptive settings of these arguments may speed up the convergence. The discussion of this issue in a simple case can be referred to [24].
Results and discussion
We examine the NCI method based on two synthetic gene regulatory networks with different sizes. The GRN in first test is a smallsized network consisting of 14 genes and 27 interactions. There exist two communities in this GRN. In the second test, the network consists of 50 genes and 100 interactions and the data come from the Artificial Gene Network database [25]. Since the gene network is synthetic, the corresponding matrix A in (5) is known beforehand. We solve the GRN by the NCI method and compare it with A to evaluate the performance of the algorithm. Moreover, we examine the performance of the proposed splitting algorithm in the third test.
where r_{11}, r_{22}, and r_{33} are the number of correctly identified positives, zeros and negatives, representing promotions, repressions, and no interaction, respectively.
The algorithm runs on a computer with Pentium (R) dualcore CPU E5200 2.50GHz, and RAM 2.0GB. The parameters of the algorithm are chosen as follows. In Test 1, γ in problem (8) is chosen from {0.05, 0.02, 0.008} to find possible GRNs and γ_{ τ } = 0.02. In Test 2, γ is chosen from {0.02, 0.005, 0.001} and γ_{ τ } = 0.005. In the first two tests, μ is chosen as 10γ for problem (8), λ_{1} as 0.2λ_{2} in the Block PCA model, and p_{0} as $\frac{1}{4}d$ for Eq. (9), where d is the diameter of the corresponding network. The algorithm terminates in 3 iterations.
Test 1. A small gene network with 14 genes
To show the effectiveness of the NCI method at Nstep in searching multiple possible GRNs, we compare the accuracy rates with the results of one iteration at Nstep (γ = γ_{ τ } at Nstep). As shown in Figure 2(C), the average accuracy is improved from 78.5% to 83.5% in the 30 runs. In noise case (Figure 2(D)), the average accuracy is improved from 87.6% to 88.9%. Thus, a number of iterations at Nstep is necessary for finding accurate GRNs with the NCI algorithm.
Test 2. A gene network with 50 genes
Characteristics of web100023 network
number of vertices  50  number of arcs  100 

density  0.04  indegree center  Node 1 
diameter  10  outdegree center  Node 1 
characteristic path length  13.0612  closeness center  Node 14 
average clustering coefficient  0.0747  betweeness center  Node 1 
Test 3. The performance of the Block PCA and splitting algorithm
Results of splitting algorithm, CVX and SDPNAL for solving the Block PCA model
Points30  Points50  Points100  Points200  Points500  

funVal  split  146.4654  301.4004  793.5010  2.1104e3  8.0980e3 
cvx  146.4654          
sdpnal  149.9589  307.0276  794.4018  2.1192e3  8.1217e3  
D  L  E  split  3.0041e5  1.0667e5  7.5166e5  1.4488e4  3.7525e4 
cvx  1.2590e6          
sdpnal  4.2797e5  2.3411e4  9.1722e4  0.0033  0.0057  
L_{*}  split  29.5328  54.8451  118.4007  260.8353  720.8204 
cvx  29.5323          
sdpnal  29.8736  55.2079  119.9717  261.0082  707.0530  
〈W, L〉  split  631.1912  1.5986e3  7.0579e3  2.7437e4  1.8056e5 
cvx  631.1811          
sdpnal  700.7054  1.7995e3  7.0395e3  2.8083e4  1.8329e5  
E_{1}  split  451.1092  1.2638e3  4.6336e3  1.7925e4  1.1079e5 
cvx  451.1147          
sdpnal  447.5222  1.2408e3  4.6324e3  1.7854e4  1.1081e5  
elapsed time(s)  split  4.88  19.66  80.43  687.65  1.4342e4 
cvx  6.74          
sdpnal  78.99  265.48  1.3231e3  6.9678e3  5.1222e4 
Conclusion
We have developed the NCI method for gene regulatory network reconstruction from gene expression data. Based on the convex programming technology, the NCI method has shown the capability to identify multiple subnetworks within a largescale gene regulatory network. The NCI method includes two main steps. At the first step, the algorithm infers a gene regulatory network. At the second step, the algorithm estimates potential community structures. These two steps repeat until the algorithm terminates. Furthermore, we have proposed an efficient Block PCA method for exploring communities within a GRN and the splitting algorithm for the Block PCA model. Numerical experiments have validated the effectiveness of the NCI method in identifying GRNs and inferring the communities.
Abbreviations
 GRN:

gene regulatory network
 NCI:

network and community identification
 Block PCA:

block principal component analysis.
Declarations
Acknowledgements
This article has been published as part of BMC Bioinformatics Volume 13 Supplement 9, 2012: Selected articles from the IEEE International Conference on Bioinformatics and Biomedicine 2011: Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/13/S9.
This research is partially supported by the Natural Science Foundation of China, Grant 11071029 and 11171049.
Authors’ Affiliations
References
 Akutsu T, Miyano S, Kuhara S: Identification of genetic networks from a small number of gene expression patterns under the Boolean network model. Pac Symp Biocomput. 1999, 1728.Google Scholar
 Bernard A, Hartemink A: Informative structure priors: joint learning of dynamic regulatory networks from multiple types of data. Pac Symp Biocomput. 2005, 459470.Google Scholar
 Chen T, He H, Church G: Modeling gene expression with differential equations. Pac Symp Biocomput. 1999, 2940.Google Scholar
 De Hoon M, Imoto S, Kobayashi K, Ogasawara N, Miyano S: Inferring gene regulatory networks from timeordered gene expression data of Bacillus subtilis using differential equations. Pac Symp Biocomput. 2003, 1728.Google Scholar
 Hu X, Ng M, Wu F, Sokhansanj B: Mining, modeling, and evaluation of subnetworks from large biomolecular networks and its comparison study. IEEE Trans Inf Technol Biomed. 2009, 13 (2): 184194.View ArticlePubMedGoogle Scholar
 Huang Y, TiendaLuna I, Wang Y: A survey of statistical models for reverse engineering gene regulatory networks. IEEE Signal Process Mag. 2009, 26: 7697.PubMed CentralView ArticlePubMedGoogle Scholar
 Wu F: Inference of gene regulatory networks and its validation. Current Bioinformatics. 2007, 2 (2): 139144. 10.2174/157489307780618240.View ArticleGoogle Scholar
 Yuan Y, Li C, Windram O: Directed partial correlation: inferring largescale gene regulatory network through induced topology disruptions. PLoS One. 2011, 6 (4): e1683510.1371/journal.pone.0016835.PubMed CentralView ArticlePubMedGoogle Scholar
 Newman M: Fast algorithm for detecting community structure in networks. Phys Rev E Stat Nonlin Soft Matter Phys. 2004, 69 (6): 066133View ArticlePubMedGoogle Scholar
 Pothen A, Simon H, Liou K: Partitioning sparse matrices with eigenvectors of graphs. SIAM J Matrix Anal Applic. 1990, 11 (3): 430452. 10.1137/0611030.View ArticleGoogle Scholar
 Kernighan B, Lin S: An efficient heuristic procedure for partitioning graphs. Bell System Technical Journal. 1970, 49 (2): 291307.View ArticleGoogle Scholar
 Girvan M, Newman M: Community structure in social and biological networks. Proc Natl Acad Sci USA. 2002, 99 (12): 78217826. 10.1073/pnas.122653799.PubMed CentralView ArticlePubMedGoogle Scholar
 Radicchi F, Castellano C, Cecconi F, Loreto V, Parisi D: Defining and identifying communities in networks. Proc Natl Acad Sci USA. 2004, 101 (9): 26582663. 10.1073/pnas.0400054101.PubMed CentralView ArticlePubMedGoogle Scholar
 Palla G, Derényi I, Farkas I, Vicsek T: Uncovering the overlapping community structure of complex networks in nature and society. Nature. 2005, 435: 814818. 10.1038/nature03607.View ArticlePubMedGoogle Scholar
 Newman M: Detecting community structure in networks. The European Physical Journal BCondensed Matter and Complex Systems. 2004, 38 (2): 321330. 10.1140/epjb/e200400124y.View ArticleGoogle Scholar
 Wu F, Liu L, Xia Z: Identification of gene regulatory networks from time course gene expression data. Conf Proc IEEE Eng Med Biol Soc. 2010, 795798.Google Scholar
 Lee M, Shen H, Huang J, Marron J: Biclustering via sparse singular value decomposition. Biometrics. 2010, 66: 10871095. 10.1111/j.15410420.2010.01392.x.View ArticlePubMedGoogle Scholar
 Grant M, Boyd S: CVX: Matlab software for disciplined convex programming, version 1.21 (2010).
 Candés E, Li X, Ma Y, Wright J: Robust principal component analysis?. ArXiv:0912.3599.
 Chandrasekaran V, Sanghavi S, Parrilo P, Willsky A: Ranksparsity incoherence for matrix decomposition. SIAM J Optim. 2011, 21 (2): 572596. 10.1137/090761793.View ArticleGoogle Scholar
 He B, Tao M, Yuan X: A splitting method for separate convex programming with linking linear constraints. Tech rep. 2010Google Scholar
 Cai J, Candés E, Shen Z: A singular value thresholding algorithm for matrix completion. SIAM J Optim. 2010, 20 (4): 19561982. 10.1137/080738970.View ArticleGoogle Scholar
 Lin Z, Chen M, Wu L, Ma Y: The augmented lagrange multiplier method for exact recovery of corrupted lowrank matrices. ArXiv:1009.5055.
 Boyd S, Parikh N, Chu E, Peleato B, Eckstein J: Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning. 2010, 3: 1122. 10.1561/2200000016.View ArticleGoogle Scholar
 AGN. [http://www.compsysbio.org/AGN/index.html]
 Zhao X, Sun D, Toh K: A NewtonCG augmented Lagrangian method for semidefinite programming. SIAM J Optim. 2010, 20 (4): 17371765. 10.1137/080718206.View ArticleGoogle Scholar
 web100023. [http://www.compsysbio.org/AGN/Web/Web100023.html]
 SDPNAL. [http://www.math.nus.edu.sg/~matsundf/]
 CVX. [http://cvxr.com/cvx/]
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.