Volume 16 Supplement 13
Proceedings of the 12th Annual MCBIOS Conference
Efficient experimental design for uncertainty reduction in gene regulatory networks
- Roozbeh Dehghannasiri^{1, 2},
- Byung-Jun Yoon^{1, 2, 3} and
- Edward R Dougherty^{1, 2}Email author
https://doi.org/10.1186/1471-2105-16-S13-S2
© Dehghannasiri et al. 2015
Published: 1 December 2015
The Erratum to this article has been published in BMC Bioinformatics 2015 16:410
Abstract
Background
An accurate understanding of interactions among genes plays a major role in developing therapeutic intervention methods. Gene regulatory networks often contain a significant amount of uncertainty. The process of prioritizing biological experiments to reduce the uncertainty of gene regulatory networks is called experimental design. Under such a strategy, the experiments with high priority are suggested to be conducted first.
Results
The authors have already proposed an optimal experimental design method based upon the objective for modeling gene regulatory networks, such as deriving therapeutic interventions. The experimental design method utilizes the concept of mean objective cost of uncertainty (MOCU). MOCU quantifies the expected increase of cost resulting from uncertainty. The optimal experiment to be conducted first is the one which leads to the minimum expected remaining MOCU subsequent to the experiment. In the process, one must find the optimal intervention for every gene regulatory network compatible with the prior knowledge, which can be prohibitively expensive when the size of the network is large. In this paper, we propose a computationally efficient experimental design method. This method incorporates a network reduction scheme by introducing a novel cost function that takes into account the disruption in the ranking of potential experiments. We then estimate the approximate expected remaining MOCU at a lower computational cost using the reduced networks.
Conclusions
Simulation results based on synthetic and real gene regulatory networks show that the proposed approximate method has close performance to that of the optimal method but at lower computational cost. The proposed approximate method also outperforms the random selection policy significantly. A MATLAB software implementing the proposed experimental design method is available at http://gsp.tamu.edu/Publications/supplementary/roozbeh15a/.
Keywords
Experimental design gene regulatory networks mean objective cost of uncertainty objective-based network reduction Boolean networks structural interventionBackground
A key area in the field of translational genomics is to derive therapeutic intervention that can beneficially alter cell dynamics in such a way as to avoid cancerous phenotypes. The first step to derive therapeutic interventions is to understand the regulatory relationships among genes. The interactions among genes are studied in the context of gene regulatory networks (GRNs). GRN models often possess high uncertainty. This inherent uncertainty might be due to many factors such as the complex nature of biological phenomena, lack of enough training data, etc. Uncertainty in GRNs can affect the accuracy and performance of the therapeutic interventions. Therefore, biologists aim at reducing uncertainty of GRN model via conducting additional biological experiments. However, there are many limitations for carrying out experiments. Biological experiments are usually expensive and time consuming as they need to be done on living organisms. On the other hand, the resources are limited which prevents researchers from conducting all experiments they need for identification of GRN. Thus, it is prudent to prioritize potential experiments based on the information they provide and then conduct the most informative. This process is called experimental design.
Boolean networks (BNs) have been extensively used for the study of GRNs. They have been shown to be effective in capturing the multivariate relationships among entities within a cell. BNs also facilitate the use of the well-studied Markov decision theory for deriving beneficial interventions. To date, intervention methods for BNs have been categorized in two different groups: structural interventions, and dynamical interventions. While structural interventions [1–4] permanently change the behavior of a network via one-time change in the underlying regulatory structure, the goal in dynamical interventions [5–8] is to interfere with the signaling of GRN through flipping or not flipping the expression state of genes over time. The assumption behind most of these intervention methods is that the network being studied is perfectly known. Thus, presence of uncertainty can degrade the performance of the intervention methods.
From a translational perspective it is crucial to reduce uncertainty pertinent to the objective, such as intervention. Hence, uncertainty should be quantified in such a way that the objective for modeling GRN is taken into account. Mean objective cost of uncertainty (MOCU) [9] quantifies the uncertainty of model in terms of the expected increased cost due to the presence of uncertainty. An experimental design method based on MOCU has been proposed in [10]. The experimental design method in [10] evaluates the effect of each potential experiment in reducing the model uncertainty, which is measured in terms of MOCU, and suggests that the experiment which results in the minimum expected remaining MOCU should be conducted first. The long-run performance of the experimental design method in [10] is guaranteed to be optimal in terms of reducing the error of interventions obtained after conducting the chosen experiments.
Although, the method in [10] is optimal, it is computationally expensive. Because our final objective is to improve the performance of the therapeutic interventions, method in [10] involves finding optimal interventions for all networks which are compatible with the prior knowledge. Finding optimal interventions is computationally expensive whose complexity grows exponentially with the number of genes in network. Therefore, the computational complexity of finding optimal experiment can be prohibitively high for large networks. Thus, it is inevitable to construct a smaller network via deleting some genes from the original large size network and then estimate the optimal interventions using the resulting reduced network. Generally the goal in network reduction methods is to produce networks of smaller size while the dynamical behavior of the original network is preserved. There have been some efforts for network reduction to reduce the complexity of designing interventions [11–13].
In this paper, we propose a novel cost function for the gene deletion process which takes into account the disruption in the order of potential experiments when they are ranked according to the experimental design method in [10]. Since experiments are ranked based upon the expected remaining MOCU or the MOCU that is expected to remain after performing the experiment, we desire that the network reduction step has a low effect on the expected remaining MOCU corresponding to the potential experiments. When the gene (or genes) suggested by the cost function are deleted from network, the optimal (and robust) interventions are estimated using the reduced networks and then they are used for calculating expected remaining MOCU for prioritizing potential experiments. We show the effectiveness of our proposed cost effective experimental design method through simulations on synthetic and real networks. The simulation results verify that our method can perform comparable to the optimal experimental design method in [10] with much lower computations.
MOCU-based optimal experimental design is very general and does not even require a Markovian network [10]. As we will see, finding the best gene to delete is also very general; however, once the genes are deleted, the regulatory structure of the original network must be mapped onto a corresponding regulatory structure on the reduced network, an optimal intervention must be found on the reduced network, and that intervention must be induced to the full network. Reduction and inducement are nontrivial and depend on the nature of the regulatory structure. The problem has been addressed for Boolean networks in [13], to which we refer, and a theoretical analysis is given in [11], where it is noted that the methodology applies to probabilistic Boolean networks (PBNs) [14] by applying the reduction to each of constituent BN of the PBN. Moreover, whereas we will restrict intervention to rank-one perturbations [2], which provide a one-time alteration of the regulatory logic, the reduction-inducement paradigm applies to other forms of intervention [11, 13].
Methods
Boolean networks
Gene regulatory network models are increasingly used as a tool to study interactions among genes [15, 16]. Boolean networks (BNs) [17] and probabilistic Boolean networks (PBNs) [14] are widely used models for GRNs that have been shown to be effective in capturing these interactions [18–24]. A Boolean network on n genes is defined by a pair (V, F). V = {X_{1}, X_{2},..., X_{ n }} is a set of binary variables that represent the expression state of genes with X_{ i } = 0 and X_{ i } = 1 corresponding to gene i being OFF and ON, respectively. The ON state means that a gene is translated into a protein whereas the OFF state means that the gene is not translated. It has been shown that significant biological information can be extracted from binarized gene expression data [25, 26]. The gene values at each time step are updated according to the list of Boolean functions F = {f_{1}, f_{2},..., f_{ n }} with X_{ i } = f_{ i }(X_{i1}, X_{i2},..., ${X}_{i{k}_{i}}$) where X_{i1}, X_{i2},..., ${X}_{i{k}_{i}}$ are k_{ i }predictor genes for X_{ i }. The state of the BN at time t is the vector X(t) = (X_{1}(t), X_{2}(t),..., X_{ n }(t)), which is called gene activity profile (GAP). A BN with n genes possesses 2^{ n } different states. The state of a BN for the next time instant t + 1 is updated as X(t + 1) = F(X(t)). In a Boolean network with perturbation (BNp), there is a perturbation probability of each gene flipping its value independently from other genes. Thus, in the presence of perturbation X(t+1) = F(X(t)) with probability (1 − p)^{ n } and with probability 1 − (1 − p)^{ n } at least one gene flips its value. The sequence of states of a BNp over time can be viewed as the states of a Markov chain with transition probability matrix (TPM) $\mathbf{\text{P=}}{\left[{p}_{ij}\right]}_{i,j=1}^{{2}^{n}}$ where p_{ ij } is the probability that state i transitions into state j. Positive perturbation probability p >0 makes the Markov chain ergodic and irreducible, thereby possessing a unique steady-state distribution (SSD) π^{ T }= π^{ T }P, where π_{ i } is the steady-state probability of state i and T is the transpose operator. More details on how to compute TPM can be found in [2]. The long-run behavior of a BNp is characterized by its SSD, which is indicative of phenotypes. From a translational perspective, the states of a BNp can be partitioned into the set of desirable states D containing those states associated with healthy phenotypes and undesirable states U containing states associated with cancerous phenotypes. The goal of therapeutic interventions is to drive the network away from undesirable states and consequently reduce the steady-state probability mass of undesirable states, π_{ U } = ∑_{ U }π_{i}. Without loss of generality regarding the experimental design analysis, in this paper we will assume that the undesirable states are determined by a single gene, called the "target gene" with the aim of intervention being to flip an undesirable value of the target gene to a desirable value. Two intervention approaches are commonly considered: (1) Structural interventions change the long-run behavior of the network by altering its underlying rule-based structure [1–4], and (2) dynamical interventions affect the dynamical evolution of the network by manipulating the expression states of some "control genes" [5–8].
where ${\stackrel{\u0303}{\pi}}_{i}\left(u,v\right)$ is the steady-state probability mass of state i after single-gene perturbation (u, v) and z_{ vi }, z_{ wi }, z_{ vu }, and z_{ wu } belong to the fundamental matrix Z = [I − P+e π^{ T }]^{−1} where I is the n × n identity matrix. The undesirable steady state probability mass after single-gene function perturbation (u, v) is ${\stackrel{\u0303}{\pi}}_{U}\left(u,v\right)={\sum}_{i\in U}{\stackrel{\u0303}{\pi}}_{i}\left(u,v\right)$. To find the optimal intervention, one needs to search among all possible 2^{ n } × 2^{ n } state pairs (u, v).
Optimal experimental design
the expression inside the expectation being called the objective cost of uncertainty. MOCU is the expected difference between the performance of the robust intervention and the optimal intervention for each network inside Θ. MOCU is an uncertainty quantification scheme which measures uncertainty by taking into account the objective of operator design, herein, network intervention. MOCU can effectively capture the amount of uncertainty and tries to quantify the uncertainty based on a specific objective (such as intervention) to gauge how much it is going to affect the objective [9].
where ${\theta}_{\varphi}^{\left(i\right)}$, called the conditional uncertainty vector, has ith parameter equal to $\varphi $ and other parameters are still random, and Θ_{ i,ϕ } is the remaining uncertainty class when θ_{ i } = ϕ, i.e., Θ_{ i,ϕ } = {θ|θ ∈ Θ, θ_{ i } = ϕ}. The expectation is taken over the conditional density function $f\left({\theta}_{\varphi}^{\left(i\right)}\right)=f\left(\theta |{\theta}_{i}=\varphi \right)$ that governs the remaining uncertainty class Θ_{ i,ϕ }. ${\psi}_{IBR}\left({\Theta}_{i,\varphi}\right)$ is the robust intervention for Θ_{ i,ϕ } and is found in a similar way as (3).
E_{ i* } is the optimal experiment to be conducted first [10].
Approximate experimental design method
According to (5) and (6), calculating the expected remaining MOCU requires finding the optimal intervention ψ(θ) for each θ ∈ Θ and the robust intervention ${\psi}_{IBR}\left({\Theta}_{i,\varphi}\right)$ for each possible remaining uncertainty class. The complexity of finding optimal interventions grows exponentially with network size n. For finding an optimal single-gene structural intervention, we need to search among all possible 2^{ n } × 2^{ n } state pairs and calculate the new steady-state probability ${\stackrel{\u0303}{\pi}}_{i}$ for each state i in the set of undesirable states U. Thus, the complexity is $\mathcal{O}$(2^{3n}). This heavy computational cost motivates us to reduce the size of network in order to reduce the complexity of finding optimal interventions, thereby reducing the complexity of the experimental design.
Assuming that gene g is deleted from a network with regulatory function F, we define a regulatory function F_{ red } for the reduced network. Doing this for each network with uncertainty vector θ in Θ produces the uncertainty class, Θ^{g}, of reduced networks via the mapping θ → θ^{ g }.
To approximate the optimal intervention for a network in Θ, we use the corresponding network in Θ^{ g }, find the optimal intervention for the reduced network ψ(θ^{ g }), and then induce the intervention to the original network in Θ. This approximate optimal intervention denoted by ψ(θ;g) is called the induced optimal intervention. Also, to find the induced robust intervention, ${\psi}_{IBR}^{ind}\left(\Theta ;g\right)$, for Θ, first we find the robust intervention, ${\psi}_{IBR}\left({\Theta}^{g}\right)$, for Θ^{ g } using (3) and then find the induced robust intervention ${\psi}_{IBR}^{ind}\left(\Theta ;g\right)$ from ${\psi}_{IBR}\left({\Theta}^{g}\right)$.
Algorithm 1 Approximate experimental design
1: input: Θ, ψ, f (θ), θ = (θ_{1},..., θ_{ k })
2: output: E_{ i* }, i^{ ∗ } ∈ {1,..., k}: the estimated optimal experiment to be conducted first
3: for g = 1 : n do
4: cost(g) ← 0
5: for i = 1 : k do
6: for all θ_{ i } do
7: build remaining uncertainty class of reduced networks ${\Theta}_{i,{\theta}_{i}}^{g}$
8: compute conditional density function $f\left({\theta}_{{\theta}_{i}}^{\left(i\right)}\right)$
9: find induced robust intervention ${\psi}_{IBR}^{ind}\left({\Theta}_{i,{\theta}_{i}};g\right)$
10: ${h}_{g}\left({\theta}_{i}\right)\leftarrow {E}_{{\theta}_{{\theta}_{i}}^{\left(i\right)}}\left[{\xi}_{{\theta}_{{\theta}_{i}}^{\left(i\right)}}\left({\psi}_{IBR}^{ind}\left({{\Theta}_{i,\theta}}_{{}_{i}};g\right)\right)\right]$
11: cost(g) ← cost(g) + ${E}_{{\theta}_{i}}$[h_{ g } (θ_{ i })]
12: ${g}^{*}\leftarrow \underset{g=1,2,...,n}{\mathsf{\text{arg}}\phantom{\rule{0.3em}{0ex}}\mathsf{\text{min}}}\phantom{\rule{2.36043pt}{0ex}}\mathit{cost}\left(g\right)$
13: for i = 1 : k do
14: for all θ_{ i } do
15: build remaining uncertainty class ${\Theta}_{i,{\theta}_{i}}$
16: compute conditional density function $f\left({\theta}_{{\theta}_{i}}^{\left(i\right)}\right)$
17: compute ${M}_{\Psi}^{{g}^{*}}({\Theta}_{i,{\theta}_{i}})$ via equation (5) using ${\psi}_{IBR}^{ind}\left({\Theta}_{i,{\theta}_{i}};{g}^{*}\right)$ and ${\psi}^{ind}\left({\theta}_{{\theta}_{i}}^{\left(i\right)}{;g}^{*}\right)$
18: ${M}_{\Psi}^{{g}^{*}}(\Theta ,i)\leftarrow {E}_{{\theta}_{i}}\left[{M}_{\Psi}^{{g}^{*}}({\Theta}_{i,{\theta}_{i}})\right]$
19: ${i}^{*}\leftarrow argmin{M}_{\Psi}^{{g}^{*}}(\Theta ,i)$
20: return i^{ ∗ }
After removing gene g^{ ∗ }, we find the expected remaining MOCU corresponding to each experiment using equation (6) by replacing $\psi \left({\theta}_{{\theta}_{i}}^{\left(i\right)}\right)$ with ${\psi}^{ind}\left({\theta}_{{\theta}_{i}}^{\left(i\right)}{;g}^{*}\right)$ and ${\psi}_{{}_{IBR}}\left({{\Theta}_{i,\theta}}_{{}_{i}}\right)$ with ${\psi}_{{}_{{}_{IBR}}}^{ind}\left({{\Theta}_{i,\theta}}_{{}_{i}};{g}^{*}\right)$. An abstract form of the proposed experimental design method has been given in Algorithm 1. A step by step toy example illustrating Algorithm 1 is also provided in the Additional file 1.
This procedure for estimating optimal experiment via deleting one gene can be easily extended to the deletion of two or more genes. For example, to delete two genes, we need to evaluate the cost function in (14) for all possible two-gene combinations and delete the pair whose cost is minimum.
Reduction mappings and induced interventions
If we want to delete gene g from network, we need to find the regulatory function F_{ red } for the reduced network. Following [13], every two states of the original network that differ only in the value of gene g can be collapsed to find the transition rule of the reduced network. Let ${s}_{1}^{g}$ and ${s}_{0}^{g}$ be two states with value 1 and 0 for gene g, respectively, and identical values for other genes. State $\overline{{s}^{g}}$ can be obtained from either ${s}_{1}^{g}$ or ${s}_{0}^{g}$ by removing the value of gene g. If for the original network, the transition rules for these two states are $F\mathsf{\text{(}}{s}_{1}^{g}\mathsf{\text{)}}\phantom{\rule{0.3em}{0ex}}\mathsf{\text{=}}\phantom{\rule{0.3em}{0ex}}p$ and $F\mathsf{\text{(}}{s}_{0}^{g}\mathsf{\text{)}}\phantom{\rule{0.3em}{0ex}}\mathsf{\text{=}}\phantom{\rule{0.3em}{0ex}}q,$ then for the reduced network, ${F}_{red}\left(\overline{{s}^{g}}\right)=\overline{{p}^{g}}$ if ${\pi}_{{s}_{1}^{g}}>{\pi}_{{s}_{0}^{g}}$ and otherwise ${F}_{red}\left(\overline{{s}^{g}}\right)=\overline{{q}^{g}}$, where $\overline{{p}^{g}}$ and $\overline{{q}^{g}}$ are found from states p and q via removing the value of gene g, respectively. Following this procedure, we find the regulatory function F_{ red } for all states in the reduced network.
Algorithm 2 Finding induced optimal interventions
1: input: $\mathrm{\psi ;}({\theta}^{g})=(\widehat{u},\widehat{v})$, π
2: output: ${\psi}^{ind}\left(\theta ;g\right)=\left({u}_{g}^{ind},{v}_{g}^{ind}\right)$
3: ${\tilde{u}}_{1}^{g}$← place 1 in the gth coordinate of $\widehat{u}$
4: ${\tilde{u}}_{0}^{g}$← place 0 in the gth coordinate of $\widehat{u}$
5: if ${\pi}_{{\tilde{u}}_{1}^{g}}\ge {\pi}_{{\tilde{u}}_{0}^{g}}$ then
6: ${u}_{g}^{ind}\leftarrow {\tilde{u}}_{1}^{g}$
7: else
8: ${u}_{g}^{ind}\leftarrow {\tilde{u}}_{0}^{g}$
9: ${\stackrel{\u0303}{\upsilon}}_{1}^{g}$← place 1 in the gth coordinate of $\widehat{\upsilon}$
10: ${\stackrel{\u0303}{\upsilon}}_{0}^{g}$← place 0 in the gth coordinate of $\widehat{\upsilon}$
11: if ${\pi}_{{\tilde{v}}_{1}^{g}}\ge {\pi}_{{\tilde{v}}_{0}^{g}}$ then
12: ${\upsilon}_{g}^{ind}\leftarrow {\stackrel{\u0303}{\upsilon}}_{1}^{g}$
13: else
14: ${\upsilon}_{g}^{ind}\leftarrow {\stackrel{\u0303}{\upsilon}}_{0}^{g}$
As illustrated in Algorithm 2, we find the induced optimal intervention from the optimal intervention for the reduced network. Suppose that the optimal intervention for the reduced network θ^{ g } is $\psi \left({\theta}^{g}\right)=\left(\widehat{u},\widehat{v}\right).$ The two corresponding states to $\widehat{u}$ in the original network are ${\tilde{u}}_{1}^{g}$ and ${\tilde{u}}_{0}^{g}$, which are found by placing 1 and 0 in the gth coordinate of $\widehat{u}$, respectively. Similarly, there are two states ${\tilde{v}}_{1}^{g}$ and ${\tilde{v}}_{0}^{g}$ in the original network corresponding to state $\widehat{\upsilon}$. The induced optimal intervention for the original network is ${\psi}^{ind}\left(\theta ;g\right)=\left({u}_{g}^{ind},{v}_{g}^{ind}\right),$ where ${u}_{g}^{ind}$ is the one among ${\tilde{u}}_{1}^{g}$ and ${\tilde{u}}_{0}^{g}$ having larger steady-state probability in the original network and ${v}_{g}^{ind}$ is the one among ${\stackrel{\u0303}{\upsilon}}_{1}^{g}$ and ${\stackrel{\u0303}{\upsilon}}_{0}^{g}$ with larger steady-state probability in the original network.
Analogous to the induced optimal intervention, the induced robust intervention ${\psi}_{{}_{{}_{IBR}}}^{ind}\left(\Theta ;g\right)$ is found from the robust intervention ${\psi}_{{}_{{}_{IBR}}}\left({\Theta}^{g}\right)$ according to Algorithm 3; however, here we choose the two states possessing larger expected steady-state probability across Θ using the expected SSD, π(Θ) = E_{ θ } [π(θ)], where π(θ) is the SSD of the network with uncertainty vector θ in uncertainty class Θ. We can use this procedure to find the induced robust intervention for each remaining uncertainty class Θ_{ i,ϕ }.
Algorithm 3 Finding induced robust interventions
1: input: ${\psi}_{IBR}\left({\Theta}^{g}\right)=\left(\widehat{u},\widehat{\upsilon}\right),\mathbf{\pi}\left(\theta \right)\forall \theta \in \Theta $
2: output: ${\psi}_{IBR}^{ind}\left(\Theta ;g\right)=\left({u}_{g}^{ind},{\upsilon}_{g}^{ind}\right)$
3: $\mathbf{\pi}\leftarrow {E}_{\theta}\left[\mathbf{\pi}\left(\theta \right)\right]$
4: ${\tilde{u}}_{1}^{g}$← placing 1 in the gth coordinate of $\widehat{u}$
5: ${\tilde{u}}_{0}^{g}$← placing 0 in the gth coordinate of $\widehat{u}$
6: if ${\pi}_{{\tilde{u}}_{1}^{g}}\ge {\pi}_{{\tilde{u}}_{0}^{g}}$ then
7: ${u}_{g}^{ind}\leftarrow {\tilde{u}}_{1}^{g}$
8: else
9: ${u}_{g}^{ind}\leftarrow {\tilde{u}}_{0}^{g}$
10: ${\tilde{v}}_{1}^{g}$← placing 1 in the gth coordinate of $\widehat{\upsilon}$
11: ${\tilde{v}}_{0}^{g}$← placing 0 in the gth coordinate of $\widehat{\upsilon}$
12: if ${\pi}_{{\stackrel{\u0303}{\upsilon}}_{1}^{g}}\ge {\pi}_{{\stackrel{\u0303}{\upsilon}}_{0}^{g}}$ then
13: ${\upsilon}_{g}^{ind}\leftarrow {\stackrel{\u0303}{\upsilon}}_{1}^{g}$
14: else
15: ${\upsilon}_{g}^{ind}\leftarrow {\stackrel{\u0303}{\upsilon}}_{0}^{g}$
Preliminary gene elimination via the coefficient of determination
Genes possessing strong connection with the target gene in terms of CoD_{ X }(Y ; Θ) are not considered for deletion. When excluding genes using the CoD it is important to recognize the possibility of intrinsic multivariate prediction [28], where a set of genes may have low individual CoDs with respect to the target gene but may have significant CoD when used together for multivariate prediction. First we calculate CoD_{ X }(Y ; Θ) for all 3-gene combinations and pick the one with largest CoD. We compute CoD for 3-gene predictors because it has been shown in [17] that the average connectivity of the model cannot be too high providing that the model is not chaotic and it is commonplace to assume 3-gene predictivity in BNs. If we want to exclude less than 3 genes from the search space, then among the 3-gene combination with the largest expected CoD, we choose those genes that have larger expected individual CoD. If we want to exclude more than 3 genes, then in addition to the three genes in the combination with the largest CoD, we choose those genes in the 3-gene combination with the second largest CoD that have larger expected individual CoD and do not belong to the first 3-gene combination. We repeat this process until we reach the desired number of genes to exclude.
If there are initially n genes and we want to delete 3 genes, then we need to evaluate cost function (14) for all C(n, 3) 3-gene combinations, where C(n, k) denotes the number of combinations of n objects taken k at a time; however, if we exclude s genes from search space then the number of evaluations of (14) decreases to C(n − s, 3).
Having performed the CoD-based exclusion process and excluded s genes, ${X}_{1}^{\prime},{X}_{2}^{\prime},...,{X}_{s}^{\prime},$ we search for the genes to be deleted using the cost function in (14) among the remaining genes, {X_{1}, X_{2},..., X_{ n }} − $\left\{{X}_{1}^{\prime},{X}_{2}^{\prime},...,{X}_{s}^{\prime}\right\}$.
Computational complexity analysis
With n genes that take on binary expression levels, the network has 2^{ n } states. Finding an optimal single-gene function intervention requires searching among all possible 2^{2n}state pairs (u, v) according to (1). Assuming without loss of generality that states 2^{n−1 }to 2^{ n } are undesirable, (1) must be evaluated 2^{n−1 }times for each state pair. Thus, the complexity of finding the optimal intervention ψ(θ) is $\mathcal{O}$(2^{3n}). If there are k uncertain parameters and each can take on l different values, then the uncertainty class Θ contains l^{ k } different networks for which an optimal intervention must be found. Hence, the complexity of the optimal experimental design method in [10] is $\mathcal{O}$(l^{ k } × 2^{3n}).
To analyze the complexity of the proposed approximate method, suppose p genes are to be deleted. Then the cost function in (14) must be evaluated for all C(n − 1, p) p-gene combinations, n − 1 instead of n because the target gene cannot be deleted. The complexity of finding an induced optimal intervention for each network after deleting p genes is $\mathcal{O}$(2^{3(n−p)}). Therefore, the complexity of the approximate method is $\mathcal{O}$(C(n − 1, p) × l^{ k } × 2^{3n−3p}). For large n, it is possible that for small p the complexity of the approximate method can exceed that of the original method; however, by deleting more genes the complexity of the approximate method drops sharply because by deleting each additional gene the complexity of estimating the optimal intervention decreases by eight-fold.
which is the ratio of the complexity of the optimal method in [10] to the complexity of the approximate method when deleting p genes using the cost function in (14) and excluding s genes from the search space using the CoD-based gene exclusion step.
Comparing the approximate processing times (in seconds) of the optimal and approximate experimental design methods when p genes are deleted and s genes are excluded for networks of size n with 4 uncertain regulations
n= 10 | n= 11 | n= 12 | ||||
---|---|---|---|---|---|---|
[10] | 468 | 4846 | 60169 | |||
Proposed | s = 0 | s = 2 | s = 0 | s = 2 | s = 0 | s = 2 |
p = 3 | 81.71 | 39.64 | 830 | 407 | 9795 | 5026 |
p = 4 | 23.61 | 12.98 | 215 | 93 | 2355 | 1057 |
p = 5 | 8.68 | 8.36 | 58 | 35 | 450 | 181 |
Results and discussion
This section evaluates the performance of the approximate method for both synthetic and real GRNs where the majority vote rule is used as the transition rule. Majority vote rule [29–33] is popular in systems biology, especially when we are interested in the overall dynamics of the network. For example, majority vote is used in [32] to model the dynamics of yeast cell-cycle network. For the majority vote rule, a regulatory matrix R is defined component-wise by
${R}_{ij}=\left\{\begin{array}{cc}1\hfill & \mathsf{\text{gene}}\phantom{\rule{0.3em}{0ex}}j\phantom{\rule{0.3em}{0ex}}\mathsf{\text{activates}}\phantom{\rule{0.3em}{0ex}}\mathsf{\text{gene}}\phantom{\rule{0.3em}{0ex}}i\hfill \\ -1\hfill & \mathsf{\text{gene}}\phantom{\rule{0.3em}{0ex}}j\phantom{\rule{0.3em}{0ex}}\mathsf{\text{suppresses}}\phantom{\rule{0.3em}{0ex}}\mathsf{\text{gene}}\phantom{\rule{0.3em}{0ex}}i\hfill \\ 0\hfill & \mathsf{\text{no}}\phantom{\rule{0.3em}{0ex}}\mathsf{\text{relation}}\phantom{\rule{0.3em}{0ex}}\mathsf{\text{from}}\phantom{\rule{0.3em}{0ex}}\mathsf{\text{gene}}\phantom{\rule{0.3em}{0ex}}j\phantom{\rule{0.3em}{0ex}}\mathsf{\text{to}}\phantom{\rule{0.3em}{0ex}}\mathsf{\text{gene}}\phantom{\rule{0.3em}{0ex}}i\hfill \end{array}\right..$
If ρ > 0, then the chosen experiment outperforms the random experiment; if ρ < 0, then the random experiment outperforms the chosen experiment; and if ρ = 0, then they perform the same.
Simulation results based on the synthetic BNps
For the performance evaluation based on synthetic BNps, we generated 1000 networks randomly and chose 50 different sets of k regulations in each to be unknown - 50000 simulations in total. We assigned 3 random predictor genes to each gene where each one can be randomly activating or suppressive. The gene perturbation probability was set to 0.001. Without loss of generality, we assume that states with up-regulated X_{1} are undesirable. We removed the regulatory type of those regulations that have been assumed to be uncertain and retained other regulatory information of the network. We assume that all uncertain parameters are independent from each other and have uniform marginal distribution. The analysis can be easily extended to other distributions. Because X_{1} is the target gene, it was excluded from the reduction process. Hence, we look for the best p-gene set to be deleted among {X_{2},..., X_{ n }}.
where g^{ * } is the set of 5 genes with minimum cost of deletion. For this set of simulations, the average gain $\stackrel{\u0304}{\rho}$ is 0.0192. Note that here the average gain might not be very accurate owing to the small number of simulations. The approximate run time for each simulation was around 5700 seconds.
Percentage of finding the same experiment as [10] using the proposed approximate method with gene exclusion from the search space
n = 7, k = 2 | ||||||||
p = 1 | p = 2 | p = 3 | ||||||
Random | CoD | Random | CoD | Random | CoD | |||
s = 0 | 92.59 | 92.59 | 87.72 | 87.72 | 82.90 | 82.90 | ||
s = 1 | 91.78 | 91.96 | 85.85 | 86.44 | 80.05 | 80.85 | ||
s = 2 | 90.23 | 91.07 | 83.38 | 84.77 | 75.97 | 77.96 | ||
s = 3 | 88.31 | 89.89 | 79.28 | 82.01 | N/A | N/A | ||
s = 4 | 85.18 | 87.24 | N/A | N/A | N/A | N/A | ||
n = 7, k = 3 | ||||||||
p = 1 | p = 2 | p = 3 | ||||||
Random | CoD | Random | CoD | Random | CoD | |||
s = 0 | 90.84 | 90.84 | 84.19 | 84.19 | 77.87 | 77.87 | ||
s = 1 | 89.78 | 90.07 | 81.94 | 82.67 | 74.46 | 75.49 | ||
s = 2 | 88.19 | 89.08 | 78.67 | 80.42 | 69.56 | 71.58 | ||
s = 3 | 85.77 | 87.54 | 73.95 | 76.76 | N/A | N/A | ||
s = 4 | 82.15 | 84.79 | N/A | N/A | N/A | N/A | ||
n = 7, k = 4 | ||||||||
p = 1 | p = 2 | p = 3 | ||||||
Random | CoD | Random | CoD | Random | CoD | |||
s = 0 | 90.72 | 90.72 | 83.56 | 83.56 | 76.17 | 76.17 | ||
s = 1 | 89.65 | 90.06 | 81.11 | 82.00 | 72.44 | 73.67 | ||
s = 2 | 87.95 | 89.08 | 77.71 | 79.58 | 67.10 | 69.80 | ||
s = 3 | 85.54 | 87.57 | 72.51 | 76.07 | N/A | N/A | ||
s = 4 | 81.81 | 84.51 | N/A | N/A | N/A | N/A | ||
n = 7, k = 5 | ||||||||
p = 1 | p = 2 | p = 3 | ||||||
Random | CoD | Random | CoD | Random | CoD | |||
s = 0 | 91.28 | 91.28 | 83.71 | 83.71 | 76.46 | 76.46 | ||
s = 1 | 90.21 | 90.56 | 81.32 | 82.31 | 72.55 | 73.86 | ||
s = 2 | 88.60 | 89.63 | 78.17 | 79.96 | 67.09 | 69.72 | ||
s = 3 | 86.33 | 88.14 | 72.66 | 76.26 | N/A | N/A | ||
s = 4 | 82.64 | 85.58 | N/A | N/A | N/A | N/A | ||
n = 8, k = 4 | ||||||||
p = 1 | p = 2 | p = 3 | p = 4 | |||||
Random | CoD | Random | CoD | Random | CoD | Random | CoD | |
s = 0 | 92.43 | 92.43 | 86.98 | 86.98 | 80.90 | 80.90 | 74.97 | 74.97 |
s = 1 | 91.69 | 92.05 | 85.21 | 85.94 | 78.10 | 79.05 | 70.95 | 72.04 |
s = 2 | 90.59 | 91.40 | 82.53 | 84.13 | 74.53 | 76.29 | 65.81 | 67.94 |
s = 3 | 89.03 | 90.58 | 79.47 | 82.00 | 68.98 | 71.91 | N/A | N/A |
s = 4 | 86.83 | 88.85 | 74.51 | 77.93 | N/A | N/A | N/A | N/A |
s = 5 | 83.09 | 86.29 | N/A | N/A | N/A | N/A | N/A | N/A |
Performance evaluation based on the colon cancer pathways
EGF, HGF, and IL6 are three stimulation factors that carry the external signals generated by neighboring cells to downstream genes and activate downstream pathways. Signal transducers and activators of transcription (STATs) constitute a family of transcription factors that can be activated via extracellular signaling proteins such as cytokins and growth factors. These play a major role in regulating downstream processes such as cell growth, survival, and apoptosis [36]. STAT3 is an oncogene observed to be highly activated in many cancers, in particular, colon cancer [37, 38]. Hence, STAT3 has been recognized as a legitimate target for cancer therapy [39]. We considered states with up-regulated STAT3 (X_{1} = 1) as undesirable states, so that the set of undesirables states is U = {1024,..., 2047}. Before intervention the probability mass of undesirable states ${\pi}_{U}$ is 0.5525. The optimal intervention for this network is transitioning state 11111110101 to state 01011001101; that is, $\stackrel{\u0303}{\mathbf{\text{F}}}\left(11111110101\right)=01011001101$ for the regulatory function after intervention. The undesirable probability mass after intervention ${\stackrel{\u0303}{\pi}}_{U}$ is 0.3837.
Performance of the proposed approximate method on the colon cancer pathways when deleting p genes and excluding s genes from the search space
p = 3 | p = 4 | p = 5 | ||||
s = 0 | s = 2 | s = 0 | s = 2 | s = 0 | s = 2 | |
$\stackrel{\u0304}{\rho}$ | 0.0239 | 0.0235 | 0.0231 | 0.0229 | 0.0206 | 0.0198 |
Conclusion
We have proposed a computationally effective experimental design method for reducing uncertainty in gene regulatory networks. This method can effectively approximate the optimal experimental design method in [10], which is based on the mean objective cost of uncertainty (MOCU). To reduce computational complexity, we use network reduction to estimate the optimal and robust interventions needed for finding an optimal experiment. We introduced a novel cost function for gene deletion that takes into account the effect of gene deletion on the ranking of potential experiments. Because potential experiments are ranked based on the MOCU in [10], the proposed cost function is also based on the MOCU. Simulation results on both synthetic and real networks show that while our proposed method can greatly reduce computations, its performance is comparable to the optimal method in [10] and much better than random gene deletion. Greater computational reduction is achieved by excluding genes from the search space based on their CoD with the target gene whose expression the intervention is aimed at altering.
We have assumed a uniform distribution over the uncertainty class. If one has relevant prior knowledge, perhaps it can be used to construct a distribution reflecting it. Care must be taken because concentrating the mass of the distribution in the wrong place can lead to poorer results. In Bayesian terminology, the distribution on the uncertainty class is called a prior distribution. Putting a non-uniform prior on Θ does not change the reduction procedure introduced in the current paper; however, some calculations are altered by including the weights. Prior construction is a difficult problem and has been considered in the context of gene regulation, but not in the context of network construction. Rather, pathway knowledge has been used to construct prior distributions governing uncertainty classes of feature-label distributions for optimal Bayesian classification [40, 41]. Prior construction is particular to each application, examples being gene/protein signaling pathways in discrete phenotype classification [42] and model-based RNA-seq classification [43]. Prior construction for uncertainty classes of the kind considered in this paper constitutes an important issue for future study - and not just in relation to the specific problem considered herein.
Notes
Declarations
Acknowledgements
This work was supported in part by the National Science Foundation through the NSF Award CCF-1149544. It was also funded in part by a grant from the Los Alamos National Laboratory. Publication costs for this article were funded by the corresponding author's institution.
This article has been published as part of BMC Bioinformatics Volume 16 Supplement 13, 2015: Proceedings of the 12th Annual MCBIOS Conference. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/16/S13.
Authors’ Affiliations
References
- Shmulevich I, Dougherty ER, Zhang W: Control of stationary behavior in probabilistic boolean networks by means of structural intervention. Biological Systems. 2002, 10 (4): 431-446.View ArticleGoogle Scholar
- Qian X, Dougherty ER: Effect of function perturbation on the steady-state distribution of genetic regulatory networks: Optimal structural intervention. IEEE Transactions on Signal Processing. 2008, 56 (10): 4966-4976.View ArticleGoogle Scholar
- Xiao Y, Dougherty ER: The impact of function perturbations in boolean networks. Bioinformatics. 2007, 23 (10): 1265-1273.View ArticlePubMedGoogle Scholar
- Bouaynaya N, Shterenberg R, Schonfeld D: Inverse perturbation for optimal intervention in gene regulatory networks. Bioinformatics. 2011, 27 (1): 103-110.View ArticlePubMedGoogle Scholar
- Ching W-K, Zhang S-Q, Jiao Y, Akutsu T, Tsing N-K, Wong A: Optimal control policy for probabilistic boolean networks with hard constraints. IET Systems Biology. 2009, 3 (2): 90-99.View ArticlePubMedGoogle Scholar
- Pal R, Datta A, Dougherty ER: Robust intervention in probabilistic boolean networks. IEEE Transactions on Signal Processing. 2008, 56 (3): 1280-1294.View ArticleGoogle Scholar
- Pal R, Datta A, Dougherty ER: Bayesian robustness in the control of gene regulatory networks. IEEE Transactions on Signal Processing. 2009, 57 (9): 3667-3678.View ArticleGoogle Scholar
- Yang C, Wai-Ki C, Nam-Kiu T, Ho-Yin L: On finite-horizon control of genetic regulatory networks with multiple hard-constraints. BMC systems biology. 2010, 4 (Suppl 2): 14-View ArticleGoogle Scholar
- Yoon BJ, Qian X, Dougherty ER: Quantifying the objective cost of uncertainty in complex dynamical systems. IEEE Transactions on Signal Processing. 2013, 61 (9): 2256-2266.View ArticleGoogle Scholar
- Dehghannasiri R, Yoon B, Dougherty ER: Optimal experimental design for gene regulatory networks in the presence of uncertainty. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2014, doi:10.1109/TCBB.2014.2377733Google Scholar
- Ivanov I, Simeonov P, Ghaffari N, Qian X, Dougherty ER: Selection policy-induced reduction mappings for boolean networks. IEEE Transactions on Signal Processing. 2010, 58 (9): 4871-4882.View ArticleGoogle Scholar
- Ivanov I, Pal R, Dougherty ER: Dynamics preserving size reduction mappings for probabilistic boolean networks. IEEE Transactions on Signal Processing. 2007, 55 (5): 2310-2322.View ArticleGoogle Scholar
- Ghaffari N, Ivanov I, Qian X, Dougherty ER: A cod-based reduction algorithm for designing stationary control policies on boolean networks. Bioinformatics. 2010, 26 (12): 1556-1563.View ArticlePubMedGoogle Scholar
- Shmulevich I, Dougherty ER, Zhang W: From boolean to probabilistic boolean networks as models of genetic regulatory networks. Proceedings of the IEEE. 2002, 90 (11): 1778-1792.View ArticleGoogle Scholar
- Christensen C, Thakar J, Albert R: Systems-level insights into cellular regulation: inferring, analysing, and modelling intracellular networks. IET Systems Biology. 2007, 1 (2): 61-77.View ArticlePubMedGoogle Scholar
- Mohsenizadeh DN, Hua J, Bittner M, Dougherty ER: Dynamical modeling of uncertain interaction-based genomic networks. BMC Bioinformatics. 2015Google Scholar
- Kauffman SA: The Origins of Order. 1993, Oxford University Press, OxfordGoogle Scholar
- Trairatphisan P, Mizera A, Pang J, Tantar AA, Schneider J, Sauter T: Recent development and biomedical applications of probabilistic boolean networks. Cell communication and signaling. 2013, 4 (6): 1-25.Google Scholar
- Wuensche A: Genomic regulation modeled as a network with basins of attraction. Pacific Symposium on Biocomputing. 1998, 3: 44-Google Scholar
- Huang S: Gene expression profiling, genetic networks, and cellular states: an integrating concept for tumorigenesis and drug discovery. Journal of Molecular Medicine. 1999, 77 (6): 469-480.View ArticlePubMedGoogle Scholar
- Zhang Y, Qian M, Ouyang Q, Deng M, Li F, Tang C: Stochastic model of yeast cell-cycle network. Physica D: Nonlinear Phenomena. 2006, 219 (1): 35-39.View ArticleGoogle Scholar
- Flöttmann M, Scharp T, Klipp E: A stochastic model of epigenetic dynamics in somatic cell reprogramming. Frontiers in physiology. 2012, 3:Google Scholar
- Davidich MI, Bornholdt S: Boolean network model predicts cell cycle sequence of fission yeast. PloS one. 2008, 3 (2): 1672-View ArticleGoogle Scholar
- Fauré A, Naldi A, Chaouiya C, Thieffry D: Dynamical analysis of a generic boolean model for the control of the mammalian cell cycle. Bioinformatics. 2006, 22 (14): 124-13.View ArticleGoogle Scholar
- Shmulevich I, Zhang W: Binary analysis and optimization-based normalization of gene expression data. Bioinformatics. 2002, 18 (4): 555-565.View ArticlePubMedGoogle Scholar
- Zhang R, Shah MV, Yang J, Nyland SB, Liu X, Yun JK, Albert R, Loughran TP: Network model of survival signaling in large granular lymphocyte leukemia. Proceedings of the National Academy of Sciences. 2008, 105 (42): 16308-16313.View ArticleGoogle Scholar
- Dougherty ER, Kim S, Chen Y: Coefficient of determination in nonlinear signal processing. Signal Processing. 2000, 80 (10): 2219-2235.View ArticleGoogle Scholar
- Martins DC, Braga-Neto UM, Hashimoto RF, Bittner ML, Dougherty ER: Intrinsically multivariate predictive genes. IEEE Journal of Selected Topics in Signal Processing. 2008, 2 (3): 424-439.View ArticleGoogle Scholar
- Lau K, Ganguli S, Tang C: Function constrains network architecture and dynamics: A case study on the yeast cell cycle boolean network. Physical Review E. 2007, 75: 051907-View ArticleGoogle Scholar
- Wu Y, Zhang X, Yu J, Ouyang Q: Identification of a topological characteristic responsible for the biological robustness of regulatory networks. PLoS computational biology. 2009, 5 (7): 1000442-View ArticleGoogle Scholar
- Bornholdt S: Boolean network models of cellular regulation: prospects and limitations. Journal of the Royal Society Interface. 2008, 5 (Suppl 1): 85-94.View ArticleGoogle Scholar
- Li F, Long T, Lu Y, Ouyang Q, Tang C: The yeast cell-cycle network is robustly designed. Proceedings of the National Academy of Sciences of the United States of America. 2004, 101 (14): 4781-4786.View ArticlePubMedPubMed CentralGoogle Scholar
- Wu Y, Zhang X, Yu J, Ouyang Q: Identification of a topological characteristic responsible for the biological robustness of regulatory networks. PLoS computational biology. 2009, 5 (7): 1000442-View ArticleGoogle Scholar
- Hua J, Sima C, Cypert M, Gooden GC, Shack S, Alla L, Smith EA, Trent JM, Dougherty ER, Bittner ML: Tracking transcriptional activities with high-content epifluorescent imaging. Journal of biomedical optics. 2012, 17 (4): 0460081-04600815.View ArticleGoogle Scholar
- Esfahani MS, Dougherty ER: Incorporation of biological pathway knowledge in the construction of priors for optimal Bayesian classification. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2014, 11 (1): 202-218.View ArticlePubMedGoogle Scholar
- Levy DE, Darnell JE: STATs: transcriptional control and biological impact. Nature reviews Molecular cell biology. 2002, 3 (9): 651-662.View ArticlePubMedGoogle Scholar
- Kusaba T, Nakayama T, Yamazumi K, Yakata Y, Yoshizaki A, Inoue K, Nagayasu T, Sekine I: Activation of STAT3 is a marker of poor prognosis in human colorectal cancer. Oncology reports. 2006, 15 (6): 1445-1451.PubMedGoogle Scholar
- Ma XT, Wang S, Ye YJ, Du RY, Cui ZR, Somsouk M: Constitutive activation of STAT3 signaling pathway in human colorectal carcinoma. World Journal of Gastroenterology. 2004, 10 (11): 1569-1573.View ArticlePubMedPubMed CentralGoogle Scholar
- Klampfer L: Signal transducers and activators of transcription (STATs): Novel targets of chemopreventive and chemotherapeutic drugs. Current cancer drug targets. 2006, 6 (2): 107-121.View ArticlePubMedGoogle Scholar
- Dalton LA, Dougherty ER: Optimal classifiers with minimum expected error within a Bayesian framework--part I: discrete and Gaussian models. Pattern Recognition. 2013, 46 (5): 1301-1314.View ArticleGoogle Scholar
- Dalton LA, Dougherty ER: Optimal classifiers with minimum expected error within a Bayesian framework--part II: Properties and performance analysis. Pattern Recognition. 2013, 46 (5): 1288-1300.View ArticleGoogle Scholar
- Esfahani M, Dougherty E: An optimization-based framework for the transformation of incomplete biological knowledge into a probabilistic structure and its application to the utilization of gene/protein signaling pathways in discrete phenotype classification. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2015, doi:10.1109/TCBB.2015.2424407Google Scholar
- Knight JM, Ivanov I, Dougherty ER: MCMC implementation of the optimal Bayesian classifier for non-gaussian models: model-based RNA-seq classification. BMC bioinformatics. 2014, 15 (1): 401-View ArticlePubMedPubMed CentralGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.