 Proceedings
 Open Access
 Published:
Dynamical modeling of uncertain interactionbased genomic networks
BMC Bioinformatics volume 16, Article number: S3 (2015)
Abstract
Background
Most dynamical models for genomic networks are built upon two current methodologies, one processbased and the other based on Booleantype networks. Both are problematic when it comes to experimental design purposes in the laboratory. The first approach requires a comprehensive knowledge of the parameters involved in all biological processes a priori, whereas the results from the second method may not have a biological correspondence and thus cannot be tested in the laboratory. Moreover, the current methods cannot readily utilize existing curated knowledge databases and do not consider uncertainty in the knowledge. Therefore, a new methodology is needed that can generate a dynamical model based on available biological data, assuming uncertainty, while the results from experimental design can be examined in the laboratory.
Results
We propose a new methodology for dynamical modeling of genomic networks that can utilize the interaction knowledge provided in public databases. The model assigns discrete states for physical entities, sets priorities among interactions based on information provided in the database, and updates each interaction based on associated node states. Whenever uncertainty in dynamics arises, it explores all possible outcomes. By using the proposed model, biologists can study regulation networks that are too complex for manual analysis.
Conclusions
The proposed approach can be effectively used for constructing dynamical models of interactionbased genomic networks without requiring a complete knowledge of all parameters affecting the network dynamics, and thus based on a small set of available data.
Background
Cellular regulatory networks consist of thousands of molecular processes/reactions collected over years from many studies. Based on their relationships and functions, these interactions and their associated molecular entities have been categorized into various pathways. Biologists have long used pathway knowledge to help derive the dynamics of cellular interactions, generate plausible hypotheses, and design promising experiments; however, our knowledge of regulatory networks is growing quickly and manual exploration is becoming increasingly difficult. The challenge of understanding network dynamics has become overwhelming in cancer drug development because multiple pathways are commonly involved and the direct targets of drug intervention are often multiple steps away from the key mechanism to be controlled, which could be in a different compartment and different pathway. Thus, a salient goal of computational biology is to simulate the dynamics of genetic/molecular regulatory networks based on available biological knowledge and in doing so help biologists generate meaningful hypotheses and improve experimental design.
A considerable amount of pathway knowledge centered around biological interactions is currently being curated into well organized public databases, such as PID [1], Reactome [2, 3], and Pathway Commons [4], where the interactions and their associated physical entities are defined and shared via a standard language such as BioPAX [5]. The popular standard, Systems Biology Markup Language (SBML), has adopted systems of Ordinary Differential Equations (ODEs) or Partial Differential Equations (PDEs) as a natural way to describe the finegrained interactions inside the pathways [6]. Along with the emergence of SBML, there are softwares that can run ODE and/or PDEbased cell models, e.g. LibSBML [7], CellDesigner [8], and the Virtual Cell [9]. However, such an approach demands complete information on the rate equations and their parameters in all reactions to be determined before any simulation can be conducted [10]. This requirement is commonly not available in translational research, for instance, in drug discovery [11].
Taking a coarse perspective, qualitative methods such as Boolean networks for regulatory networks [12] or Petri net for metabolic networks [13], focus on the physical entities rather than interactions. In these methods genes and gene products are linked by logical or algebraic relationships. While there are some popular knowledge databases, such as KEGG [14], where direct relationships are readily available, these cannot be directly adapted to interactioncentered databases (KEGG allows both relation and reaction, although relations are commonly provided.). Although it is possible to manually convert such databases into a form fit for Boolean modeling, the translation is not unique and can be unstable in relation to small interaction alterations. There are also software tools that can be used to run such simulations, including CellNetAnalyzer [15], CellNOpt [16], BooleanNet [17], and CPN tools [18].
Furthermore, all the existing approaches, differential equations, Boolean, or Petri net, require a complete and exact knowledge of the network, including the networks readily available from the public databases, whereas such knowledge associated with the biological processes is often incomplete and admits multiple possibilities. Although biologists routinely take these uncertainties into their own consideration, current computational approaches do not. Hence, a huge amount of curated pathway knowledge cannot be effectively utilized by the computer.
To help remedy this situation, we proposed a general scheme in which uncertainty is considered as part of the system and all possible outcomes that are attainable due to different possible ways in which pathway steps can be carried out will be considered accordingly. Via this approach, one can apply knowledge retrieved from interactioncentered knowledge databases directly to perform systematic simulations. Ultimately, with more meaningful predictions of network dynamics, a better guide to experimentation can be achieved.
This paper is organized as follows. In the Methods section, we provide some definitions and introduce an interactionbased network model which can be generated from an input biological database. We discuss about the dynamic trajectories of a network and the concept of uncertainty in network dynamics. An abstract form of the dynamical modeling algorithm for computing the dynamic trajectories characterizing the dynamical behavior of a network for a given initial condition is also provided in this section. A study on the dynamics of a number of real data networks using our dynamical modeling methodology is given in the Results and discussion section. We also describe our current research activities. Finally, we provide our concluding remarks in the Conclusions section.
Methods
To produce our network model, we employ knowledge retrieved from public knowledge databases such as PID and Reactome. To simplify the discussion, we assume that knowledge is shared via the standard language BioPAX, specifically, BioPAX3. To facilitate efficient simulation, we first represent the interactions and associated physical entities shared in BioPAX using a set of nodes and edges.
An interactionbased network model
The model M, of the biological database under study, is a 2tuple M = (N, E), where N is the set of network nodes representing the biological entities in the database, including genes, proteins, complexes, molecules, transcripts, metabolites and chemicals, interacting with each other, and E is the set of network hyperedges (a hyperedge is an edge that can have multiple inputs/outputs), or simply edges, representing the biological interactions and processes in the database. The set of network nodes N and network edges E can be automatically generated from a simple textbased input file describing the biological interactions using our developed software package [19]. A detailed description on preparing the input file can be found in the Appendix section.
Each node in the network has a value, where the abstract term "value" is interpreted based on the characteristics of that biological entity. It may represent the concentration level of a protein/chemical or the expression level of a gene, indicating whether the gene is active or inactive. It does not necessarily indicate the relationship, such as concentration relationship, between different biological entities. The concentration of proteins, chemicals, and so on, in the cell are continuous variables; however, there are reasons why discrete variables are more suitable. First, accurate realtime concentration measurements are almost impossible. Second, a gene/protein typically participates in a biological process if its concentration is above a threshold level and such behavior is readily handled using discrete variables. Finally, from a control system engineering perspective, it is computationally and practically easier to deal with discretetime systems rather than continuoustime systems. Therefore, we assume that network nodes take nonnegative integer values. The state of a network with m nodes, namely N = {n_{1}, n_{2},..., n_{ m }}, is a vector x\in {\mathbb{Z}}_{\ge 0}^{m},{\mathbb{Z}}_{\ge 0} being the set of nonnegative integers, where each element of x is the value of a node n ∈ N. Let us also denote by y the vector of observable/measurable states, representing the subset of genes, proteins, transcripts, and so on, whose activities can be observed or measured and are of our interest. We represent the ith node in N by n_{ i }. The same notation will be used for network edges, for example, e_{ j } refers to the jth edge in E.
Each network edge represents a corresponding biological process or interaction in which, generally, the process reactants (inputs) interact with each other to form products (outputs) under a set of conditions. Thus, in general, each edge will have input and output nodes. In cases where a process is controlled, for instance, via activation or inhibition by a biological entity, a corresponding control node is associated to that edge. The input, output, and control nodes of the ith edge are denoted by e_{ i }.I, e_{ i }.O and e_{ i }.C, respectively. We assume that an edge (a process) can dynamically take place if all its input and activator nodes have nonzero values and its inhibitor nodes have zero values. We use the following rules to update the node values when the ith edge is being dynamically processed:
Arrows indicate the directions of processes/edges graphically. A dashed arrow is used to show an activation or inhibition in a network process. As an example, a process in which protein A converts to protein B if the activator protein C presents is shown as follows:
To generate a network model for the above process, we define the set of network nodes by N = {n_{1}, n_{2}, n_{3}} = {A, B, C}, and the set of network edges by E = {e_{1}}:
where n_{1}, n_{2} and n_{3} are the input, output and activator nodes of e_{1}, respectively. Supposing that the initial condition is given by x_{0} = [1, 0, 1]^{T}, since the input and activator nodes have nonzero values, the updated state vector, following the rules in (1), becomes x_{1} = [0, 1, 1]^{T}, and we write:
Generally, in conversion processes to which activator entities are associated, if the activator is removed, the conversion will be halted and even reversed. Also, as can be seen in real cellular systems, if the activator does not present then the new input species coming from other processes will not convert to outputs and will accumulate; thus, the outputs gradually degrade and will be depleted. Based on this observation, we make the following assumption.
Assumption 1 If in a conversion process, the activator entity has zero value and the outputs (righthand side entities) have nonzero values, then the process takes place in the reverse direction.
Considering the statement of Assumption 1 for any conversion process in the network under study to which an activator entity is associated, an additional network edge corresponding to the reverse direction of the original process will be automatically added to E. For instance, for the conversion process in (2), we present the network model in the following form:
Now, if x_{0} = [0, 1, 0]^{T}, then the process takes place in the reverse direction resulting in x_{1} = [1, 0, 0]^{T} .
For the set of biological processes in which a transcription takes place we follow the BioPAX format in which no input is assigned, but only an activator and an output are defined, these representing whether the transcription has occurred or not. Thus, for this subset of network edges we use the following logic to update the output node value:
where e_{ i }.C_{ act }represents the activator node associated to e_{ i }. Once we have updated the output node of a transcription process we will not update that node any further until the activator node value changes. For example, suppose that transcription B happens if protein A presents. We thus define N = {n_{1}, n_{2}} = {A, B} and E = {e_{1}}:
Now, for the initial condition x_{0} = [1, 0]^{T} , the updated state vector will be x_{1} = [1, 1]^{T} , implying that the corresponding transcription happens.
Dynamic trajectories and uncertainty in network dynamics
Once an interactionbased network model M is developed, a dynamical analysis can be performed. Given a network model M = (N, E), an initial condition vector x_{0}, and defining the parameter time step, the network state vector x can be updated at each time step based on our proposed algorithm, presented later in this section. The abstract term time step does not necessarily reflect the real clock time that biological processes take to complete; the real clock time for one time step can be substantially different from another time step. We denote by x_{ k } the network state vector at the kth time step. Generally, a complete knowledge of all parameters affecting the dynamics of a biological process is not available. Besides, in most practical cases, these parameters can not be even measured or observed. Therefore, in general, for a given network initial condition x_{0}, different sequences of processes can happen, these being referred to as dynamic trajectories and denoted by T. Each trajectory t ∈ T, indicates one possible sequence of processes that can take place. We describe the trajectory t ∈ T by t = {e_{ i }, e_{ j } ,...}, for appropriate values of i, j,..., which means that in the first time step process i (edge e_{ i }) happens, in the second time step process j (edge e_{ j }) takes place, and so on. The fact that, for a given x_{0}, multiple dynamic trajectories can be computed reflects uncertainty that we refer to as dynamics uncertainty. In cases where a possibly limited knowledge about the biological processes is available, it can be effectively used to reduce the dynamics uncertainty of the model. This can be accomplished by assigning appropriate properties to the set of network edges, as discussed next. We emphasize that the dynamic trajectories of a network depend on the properties assigned to the network edges as well as the network initial condition.
Network edge properties
To each network edge a set of properties, upon availability of corresponding information in the biological database under study, can be associated, these being critical in characterizing network dynamics. For each network edge, these properties will be stored in a vector associated with that specific edge and can be accessed in the dynamical modeling algorithm, to be discussed shortly. For instance, to access the property X of the ith edge we call e_{ i }.X. The dynamical modeling algorithm consists of conditional statements being functions of the edge properties; thus, the algorithm outputs depend on the edge properties. The properties, speed and priority, are assigned to each edge according to the nature or earlier measurements of that biological process.
Speed
The speed label describes the time scale needed for a process to complete [20, 21]. This is critical in dynamical network modeling because fast processes, such as complex assembly processes in which a group of proteins bind together to form a complex, happen almost instantaneously, while slow processes, such as transcription and apoptosis (cell death), take much longer times to complete. If such qualitative knowledge on the time scale of processes can be obtained from the biological database, then corresponding speed labels can be assigned to network edges, thereby reducing uncertainty in the dynamical model; otherwise, the dynamical model will embrace such uncertainty. In BioPAX3, processes are categorized into Conversion, Template Reaction, Control, Molecular Interaction, and Genetic Interaction. In processbased knowledge databases, almost all processes belong to the first three categories and every Control process is essentially associated with either a Conversion or a Template Reaction process. By default, to any process labeled as Conversion in the knowledge database, we assign a "fast" label; however, for those labeled as Template Reaction, such as processes that end in a phenotypical or pathway level event, a "slow" label is assigned, because the Control and Conversion interactions generally happen considerably faster than the Template Reaction processes. If knowledge about the time scale of a process is not available, then a "fast" label is assigned to the corresponding edge by default.
To show how the speed property can affect network dynamics, consider proteins A, B, and C interacting according to:
where the first two processes are of Conversion (complex assembly) type and the last one, resulting in apoptosis, is a Template Reaction process. We define N = {n_{1}, n_{2}, n_{3}, n_{4}, n_{5}, n_{6}} = {A, B, C, A.B, B.C, Apoptosis}, and E = {e_{1}, e_{2}, e_{3}}:
Also, suppose that the initial condition is given as x_{0} = [1, 1, 1, 0, 0, 0]^{T} . According to the above description of processes, e_{1} and e_{2} happen fast, while e_{3} takes more time to complete. If such timescale knowledge is not incorporated into the network model in the form of appropriate edge properties and we, by default, assign a "fast" label to all the edges, then the network state vector can be updated in two different ways, as detailed below, thereby introducing uncertainty to the dynamical model:
In the above updates of the network states, {x}_{k}^{p} indicates the pth possible update of the state vector at the kth time step. Now, if one applies the process timescale knowledge by assigning a "fast" label to the speed of e_{1} and e_{2}, and a "slow" label to that of e_{3}, then the state vector will be updated only according to the sequence in (10). Use of timescale information reduces uncertainty in the dynamical model.
Priority
The priority label helps when there exist two or more processes with nonzero input and activator nodes, and zero inhibitor nodes, meaning that they have equal probability to happen; however, a prior knowledge states that only one or few of them will happen in practice. Priority is usually not provided in the knowledge database; but in many cases we can set default priorities based on experience. For instance, if an entity is an input in process 1 and a control node in process 2, then process 1 has higher priority than process 2, because in process 1, the entity will be converted and not be available for control. For example, PIP3 is needed to phosphorylate AKT, yet PIP3 can also be converted to PIP2 by PTEN. Hence, if PTEN exists, PIP3 will be converted and not be able to phosphorylate AKT. Therefore, upon availability of such prior knowledge, one can assign priority labels to the network edges and reduce the uncertainty in the dynamical model; if this is not the case, then no priority labels will be assigned and the dynamical model will contain all of the possibilities. Now, consider the following example in which proteins A, B, C, and D bind together to form complex assemblies:
We define N = {n_{1}, n_{2}, n_{3}, n_{4}, n_{5}, n_{6}, n_{7}} = {A, B, C, D, A.B, B.C, A.D} and E = {e_{1}, e_{2}, e_{3}}:
Since all the above processes are of Conversion type, we assign the "fast" label to the speed property of all the edges. Assuming that the initial condition is given by x_{0} = [1, 1, 1, 1, 0, 0, 0]^{T} , the network state vector can be updated in two different ways:
Now, suppose that a prior knowledge indicates that, in situations where e_{2} and e_{3} can mathematically happen at the same time owing to nonzero input nodes, e_{2} has higher priority than e_{3} to take place. Accommodating this priority information into our network model by assigning the label "{e}_{2}^{prio}>{e}_{3}^{prio}" to the priority of e_{2}, and "{e}_{3}^{prio}<{e}_{2}^{prio}" to that of e_{ 3 } , the state vector can now be updated based on the sequence of processes given in (14).
Dynamical modeling algorithm
An abstract form of the algorithm proposed to perform the dynamical analysis is provided below. A detailed algorithm can be found in the Additional file 1 to this manuscript. The algorithm is based on the breadthfirst search algorithm [22]. It takes the network model M = (N, E) and an initial condition x_{0} as inputs, and return the resulting dynamic trajectories T, and the corresponding updates of the state vector x_{ k } at each time step k. There are two approaches to update the network state vector: synchronous and asynchronous [23, 17]. With synchronicity, multiple network edges can be processed at the same time step; while with asynchronicity, only one network edge is processed at each time step. Our software package [19] has the capability to update the network state vector using either approach. Since, in general, the asynchronous approach yields a larger set of possible dynamic trajectories compared to the synchronous method [24], asynchronicity is the default method to update the network state vector. The abstract algorithm given below updates the state vector asynchronously.
Algorithm Dynamical Modeling Algorithm
Inputs: (N, E, x_{0} )
Outputs: (T, x)
1: k ← 1, T ← ∅, x_{ in } ← x_{0}, S_{ in }, x_{ k }, S_{ k } ← ∅
2: run States Update Subroutine (Inputs: (x_{ in }, S_{ in }, x_{ k }, S_{ k }, T), Outputs: (x_{ k }, S_{ k }, T ))
3: while x_{ k } ≠ ∅ do
4: new time step: k ← k + 1, x_{ k }, S_{ k } ← ∅
5: for each updated state vector {x}_{k\mathsf{\text{1}}}^{i} in x_{k 1 }do
6: if corresponding {S}_{k\mathsf{\text{1}}}^{i} forms a cycle then
7: if {S}_{k\mathsf{\text{1}}}^{i}\notin T, then add {S}_{k\mathsf{\text{1}}}^{i} to T
8: else
9: {x}_{in}\leftarrow {x}_{k1}^{i},{S}_{in}\leftarrow {S}_{k1}^{i}
10: run States Update Subroutine (Inputs: (x_{ in }, S_{ in }, x_{ k }, S_{ k }, T ), Outputs: (x_{ k }, S_{ k }, T ))
11: end if
12: end for
13: end while
States Update Subroutine
Inputs: (x_{ in }, S_{ in }, x_{ k }, S_{ k }, T)
Outputs: (x_{ k }, S_{ k }, T)
1: initialize all nodes by x_{ in }
2: E_{ v } ← find all edges that have nonzero input nodes, nonzero activator nodes and zero inhibitor nodes (if applicable)
3: if E_{ v } ≠ ∅ then
4: remove edges in E_{ v } having relatively slow/low priority label
5: for each edge in E_{ v } do
6: initialize all nodes by x_{ in }
7: {x}_{k}^{new} ← update the state vector using the given rules
8: x_{ k }← {x_{ k }, {x}_{k}^{new}}
9: {S}_{k}^{new} ← the sequence of edges yielded {x}_{k}^{new}
10: {S}_{k}\leftarrow \left\{{S}_{k},{S}_{k}^{new}\right\}
11: end for
12: else
13: if S_{ in }∉ T, then add S_{ in }to T
14: end if
To illustrate the algorithm, consider the network in Figure 1, where N = {n_{1}, n_{2},..., n_{12}} and E = {e_{1}, e_{2},..., e_{6}}. Suppose that the biological database provides the following information about the processes in this network: in process/edge e_{1} proteins n_{1} and n_{2} bind together to form the complex n_{8}, if the inhibitor gene n_{3} does not express; e_{2} is a transcription process (the only slow process in this example) in which n_{4} is the activator and n_{9} accounts for the status of the transcription; in e_{3} proteins n_{4} and n_{5} convert to complex n_{10} ; in e_{4} proteins n_{6} and n_{10} form compound n_{11} and release n_{4}, if the activator n_{8} presents; and finally, in e_{5} proteins n_{7} and n_{10} convert to compound n_{12}, but this process is assumed to have less priority compared to e_{4} . Based on the above description and according to the statement of Assumption 1, our method automatically adds edge e_{6} to the set of network edges to account for the reverse direction of e_{4}. We incorporate the above biological information into the network model by assigning appropriate labels to the properties of network edges (see Table 1). Suppose that the initial condition is:
{x}_{0}={\left[1,1,0,1,1,1,1,0,0,0,0,0\right]}^{T}
Following the proposed algorithm to calculate the dynamic trajectories and state vector updates, we get (the sign ▷ is used for comments, and "SU" indicates the States Update Subroutine):

line 1: k ← 1, T ← ∅, x_{ in } ← x_{0}, S_{ in }, x_{1}, S_{1} ← ∅

line 2: run States Update Subroutine

line SU1: initialize all nodes by x_{ in }

line SU2: E_{ υ }= {e_{1}, e_{2}, e_{3}}

line SU4: E_{ υ }= {e_{1}, e_{3}} ⊳ e_{2} is removed because it is slower than other edges

line SU6: initialize all nodes by x_{ in }

line SU78: {x}_{1}^{new}={x}_{1}^{1}={\left[0,0,0,1,1,1,1,1,0,0,0,0\right]}^{T},{x}_{1}\leftarrow {x}_{1}^{new}

line SU910: {S}_{1}^{new}={S}_{1}^{1}=\left\{{e}_{1}\right\},{S}_{1}\leftarrow {S}_{1}^{new}

line SU6: initialize all nodes by x_{ in }

line SU78: {x}_{1}^{new}={x}_{1}^{2}={\left[\mathsf{\text{1}},\mathsf{\text{1}},0,0,0,\mathsf{\text{1}},\mathsf{\text{1}},0,0,\mathsf{\text{1}},0,0\right]}^{T},{x}_{1}\leftarrow \left\{{x}_{1},{x}_{1}^{new}\right\}

line SU910: {S}_{1}^{new}={S}_{1}^{2}=\left\{{e}_{3}\right\},{S}_{1}\leftarrow \left\{{S}_{1},{S}_{1}^{new}\right\}

line 4: k ← 2, x_{2}, S_{2} ← ∅

line 9: {x}_{in}\leftarrow {x}_{1}^{1},{S}_{in}\leftarrow {S}_{1}^{1}

line 10: run States Update Subroutine

line SU1: initialize all nodes by x_{ in }

line SU2: E_{ υ }= {e_{2}, e_{3}}

line SU4: E_{ υ }= {e_{3}}

line SU6: initialize all nodes by x_{ in }

line SU78: {x}_{2}^{new}={x}_{2}^{1}={\left[0,0,0,0,0,1,1,1,0,1,0,0\right]}^{T},{x}_{2}\leftarrow {x}_{2}^{new}

line SU910: {S}_{2}^{new}={S}_{2}^{1}=\left\{{e}_{1},{e}_{3}\right\},{S}_{2}\leftarrow {S}_{2}^{new}

line 9: {x}_{in}\leftarrow {x}_{1}^{2},{S}_{in}\leftarrow {S}_{1}^{2}

line 10: run States Update Subroutine

line SU1: initialize all nodes by x_{ in }

line SU2: E_{ υ } = {e_{1}, e_{5}} ⊳ since e_{4}, which has higher priority than e_{5}, is not in E_{ υ }, then e_{5} will not be removed once line SU4 is processed.

line SU6: initialize all nodes by x_{ in }

line SU78: {x}_{2}^{new}={x}_{2}^{2}={\left[0,0,0,0,0,1,1,1,0,1,0,0\right]}^{T},{x}_{2}\leftarrow \left\{{x}_{2},{x}_{2}^{new}\right\}

line SU910: {S}_{2}^{new}={S}_{2}^{2}=\left\{{e}_{3},{e}_{1}\right\},{S}_{2}\leftarrow \left\{{S}_{2},{S}_{2}^{new}\right\}

line SU6: initialize all nodes by x_{ in }

line SU78: {x}_{2}^{new}={x}_{2}^{3}={\left[\mathsf{\text{1}},\mathsf{\text{1}},0,0,0,\mathsf{\text{1}},0,0,0,0,0,\mathsf{\text{1}}\right]}^{T},{x}_{2}\leftarrow \left\{{x}_{2},{x}_{2}^{new}\right\}

line SU910: {S}_{2}^{new}={S}_{2}^{3}=\left\{{e}_{3},{e}_{5}\right\},{S}_{2}\leftarrow \left\{{S}_{2},{S}_{2}^{new}\right\}

line 4: k ← 3, x_{3}, S_{3} ← ∅

The next time steps can be followed easily.
Finally, the algorithm outputs are:
Dynamic trajectory t_{1} ∈ T : {t}_{\mathsf{\text{1}}}:={S}_{3}^{3}=\left\{{e}_{\mathsf{\text{3}}},{e}_{\mathsf{\text{5}}},{e}_{\mathsf{\text{1}}}\right\}
State updates for t_{1} ∈ T: {x}_{1}^{2},{x}_{2}^{3},{x}_{3}^{3}
Dynamic trajectory t_{2} ∈ T: {t}_{2}:={S}_{4}^{1}=\left\{{e}_{1},{e}_{3},{e}_{4},{e}_{2}\right\}
State updates for t_{2} ∈ T: {x}_{1}^{1},{x}_{2}^{1},{x}_{3}^{1},{x}_{4}^{1}
Results and discussion
In this section we study the dynamics of 3 networks: BAK1MCL1 interaction, the MAP Kinase pathway, and the ErbB2/ErbB3 pathway. For each network we calculate the dynamic trajectories for a given initial condition vector. For illustration, these networks are graphically plotted in Microsoft Visio software and provided here. The computer used to perform dynamical analysis and compute dynamic trajectories is a Sony VAIO laptop with Intel Core 2 Duo 1.67 GHz CPU and 2 GB RAM.
BAK1MCL1 interaction
Biological database and network model
Consider the network of BAK1MCL1 interaction shown in Figure 2. In cells, once synthesized, protein BAK1 is localized at the mitochondria. Following the apoptotic stimulus, BAK1 will dimerize to form homodimers. These dimers will further oligomerize and trigger mitochondrial outer membrane permeabilization (MOMP) and induce apoptosis [25]. To counter such apoptosis stimulus and keep the the cell alive, an antiapoptotic protein like MCL1 will bind to BAK1 with higher affinity and block BAK1 from forming homodimers, thereby stopping apoptosis [26]. Thus, in cancer cells, especially melanoma, MCL1 is commonly found to be overexpressed [27].
In this network X.Y indicates a binder complex produced by X and Y. We define the set of network nodes and edges by N = {n_{1}, n_{2},..., n_{5}} = {BAK1, MCL1, BAK1.BAK1, Apoptosis, MCL1.BAK1} and E = {e_{1}, e_{2}, e_{3}, e_{4}}. Process e_{2} is the only slow process in this network resulting in apoptosis; thus, we assign a "slow" label to the speed property of e_{2} and, by default, a "fast" label to e_{1}, e_{3}, and e_{4} . Also, as discussed, MCL1 binds to BAK1 with higher affinity than BAK1 binding to itself to form BAK1.BAK1 [26]; therefore, e_{3} has higher priority than e_{1}, and we accordingly assign the priority label "{e}_{1}^{prio}<{e}_{3}^{prio}" to e_{1}, and "{e}_{3}^{prio}<{e}_{1}^{prio} " to e_{3} . In this network the observable state is y = n_{4} = Apoptosis, since it can be checked under a microscope.
Dynamic trajectories
Suppose that the network initial state vector is given by x_{0} = [2, 1, 0, 0, 0]^{T} , biologically meaning that the concentration of BAK1 is considerably higher than MCL1, the concentration of MCL1 is enough to participate in the processes in Figure 2, and other entities do not exist or have considerably low concentrations. For this initial condition and the assigned edge properties, our algorithm returns the following possible updates of the network state vector (for notational simplicity we drop the superscripts used for the updated state vectors):
The dynamic trajectory in (16) is
The dynamic trajectory obtained here yields the final value of the observable state y, the apoptosis status, to be 1, which biologically means that apoptosis takes place. This result agrees with our expectation from a biological point of view because initially there was insufficient amount of MCL1 to bind to the higher amount of BAK1 to avoid apoptosis. The CPU time to calculate the dynamic trajectories for this network using the computer specifications described above was 0.1 sec.
MAP Kinase pathway
Biological database and network model
The dynamical study of pathways containing feedback processes has practical importance for experimental design and therapy. In this example we considered the set of biological processes shown in Figure 3, which is based on the wellstudied MAK Kinase pathway. Under the stimulus of a growth factor, GRB2.SOS complex will be recruited to the cell membrane, which will transform the GDPbound RAS into its active form, GTPbound RAS. The activated RAS will in turn activate the Raf/MEK/ERK pathway [28]. However, it has been shown that the activation of RAS is often short and transient, due to a negative feedback loop through ERK, which dissociates the GRB2.SOS complex, thereby inducing an oscillatory pattern in ERK activity [29].
The set of network nodes is N = {n_{1}, n_{2},..., n_{ 9 }} = {RAS.GDP, RAS.GTP, GRB2.SOS, RAF, pRAF, ERK, pERK, GRB2, SOS} and the network edges are E = {e_{1}, e_{2},..., e_{8} }. The database provides information about the processes e_{1}, e_{2}, e_{3}, and e_{4} . Owing to Assumption 1, our algorithm automatically generates the corresponding edges e_{5}, e_{6}, e_{7}, and e_{8} to account for the reversible direction of the original processes. By default we assign a "fast" label to all edges. Since the database gives no information about the priority between the processes, no priority labels are assigned.
Dynamic trajectories
Given the initial condition x_{0} = [1, 0, 1, 1, 0, 1, 0, 0, 0]^{T} , our algorithm yields the following dynamic trajectory:
and this sequence repeats. The effect of the feedback process can be seen in the dynamic trajectory as the sequence of edges is repeating over time. The network state vector evolves as follows:
The CPU time for this network was 0.3 sec.
ErbB2/ErbB3 pathway
Biological database and network model
In this example (see Figure 4) we studied the dynamics of the ErbB2/ErbB3 pathway downloaded from the NCI pathway interaction database (http://pid.nci.nih.gov). The roles and connections of the ErbB2, ErbB3 and Neuregulin proteins and peptides in supporting both the MEK and PI3K pathways were first mapped out around the year 2000 [30], and form the basis for the pathway displayed in Figure 4. The dependencies of RAF/MEK pathways [31], and PI3K/AKT pathways [32], on the levels of activated ErbB3 have been thoroughly demonstrated to show a very high level of reliance. In this network, the edges e_{31}, e_{32}, e_{33}, e_{36} are transcription processes and are the only slow processes in the network; thus, they are assigned a "slow" label. A "fast" label, by default, is assigned to all other edges. Since no priority information is available in the database, no priority labels are assigned. The vector of observable states for this pathway is y = [FOS[n], CHRNA1[n], JUN[n], CHRNE[n]]^{T} , which are the output nodes of the transcription processes e_{31}, e_{32}, e_{33}, and e_{36}, respectively, whose activities can be measured by a microarray.
Dynamic trajectories
Assuming that the initial condition of the nodes marked with (*) is 1 and the others are 0, our algorithm gives 7 dynamic trajectories:
where the dynamic trajectories t_{6} and t_{7} contain transcription processes giving the final value of vector y, as y = [1, 1, 1, 0]^{T}. The CPU time for computing these trajectories was 25.8 sec.
Drugs such as lapatinib can have a very strong inhibitory effect on levels of ErbB2/ErbB3 activity, though the drug's effects are themselves highly dependent on the presence of only low levels of ErbB2/ErbB3 [33]. The immunoblot figure in [33] shows that by suppressing activated ErbB2/ErbB3, both AKT and ERK phosphorylation will be turned off. To see how suppressing the activity of ErbB2/ErbB3 affects the network dynamics we consider another initial condition vector. Suppose that the nodes marked with (†) are initially set to 1 and the others are 0. For this initial condition, our algorithm yields no dynamic trajectory, T = ∅. This implies that none of the AKT and ERK phosphorylation processes occur, as we expected from the results in [33]. A similar dynamical analysis can be performed for other initial conditions, thereby giving the set of all possible outcomes.
Current research
Each network edge in a dynamic trajectory can be mathematically expressed by a state transition matrix A. For instance, the transition from x_{0} to x_{1} in (4) can be described by
where
A state transition matrix facilitates writing the network dynamics in terms of state space equations suitable for control system studies.
Given a network model M = (N, E), an initial condition x_{0}, the calculated dynamic trajectories T, and recalling the notion of observable state vector y, one can mathematically describe the network dynamics corresponding to trajectory t ∈ T by the following set of state space equations:
which is a linear discretetime timevarying system. We alternatively refer to y_{ k }as the system output at the kth time step.
From a biological viewpoint, not all dynamic trajectories are primarily of interest for design purposes, but those including observable states are important because the result of changes to network inputs or initial condition can be measured. Thereafter, a suitable experimental design can be proposed to attain a desired performance. We are currently conducting research on developing a new methodology to find logical relationships between network state variables and a set of desirable values for network observable states. This problem, referred to as a "backward inference problem", is a key step toward experimental design because it characterizes the necessary and sufficient conditions on the network state variables for which a desirable network performance can be obtained.
Conclusions
We have proposed a new dynamical modeling methodology that can handle common biological knowledge without the need of exact parameters related to process dynamics. Our proposed approach has similarities in some aspects to other qualitative methods. The node update for conversion processes is similar to token passing in Petri nets, while the update for transcription processes functions like Boolean logic. These similarities occur because they are natural ways to describe the basic processes. However, to properly handle the pathway knowledge commonly available from a public database as a BioPAX file with minimal extra input, one needs to apply these canonical approaches in a practical way, which results in the kind of hybrid approach introduced in this paper. For instance, in the ErbB2/ErbB3 pathway example we were able to use different updating rules for the conversion and transcription processes that resulted in a dynamical model which captures the network dynamics more accurately. We would like to conclude by pointing out critical differences between our proposed method and the standard Petri net and Boolean network approaches.
First, for conversion type processes, a Petri net usually does not consider the role of activator by assuming they are always present, and the potential reversibility of the process, so that every physical entity ("place" in Petri net terminology) is either an input or an output. Although this can be partially solved by introducing a reading arc to indicate when a process can take place, it cannot be incorporated into the incidence matrix, which is critical for a fast update. As a result, a Petri net loses one of its most attractive properties. The use of an incidence matrix is further hampered by the synchronous update, which assumes that equal time elapses for each process.
For practical applications, a critical shortcoming of the standard approaches is the manner in which they handle uncertainty. A standard Petri net handles uncertainty in a random manner and Boolean networks demand that all uncertainties be resolved before simulation starts. In reality, a biological process often takes a deterministic fashion unknown to the modeler, which demands a comprehensive exploration of all the possibilities. Our choice of this hybrid approach is based on the purpose of utilizing public knowledge databases; however, the concept of exploiting all possible uncertainties is not limited to the specific procedure considered here. It can be applied to other qualitative methods, including both Boolean networks and Petri nets.
In conclusion, the entire procedure emulates how biologists actually utilize available pathway information. Our ultimate aim is to use predicted outcomes to design the most efficient experiments [34] to reduce uncertainty and find the best perturbation scheme to achieve desired network performance.
Appendix
The input and output file formats of our software package are simpletext files. The input file contains the set of interactions/processes in the network and the initial condition vector in the following form:
For instance, for the following process
the input file contains:
The initial condition vector is added to the input file (after the list of processes) as follows:
For the above process, if A, B and C are initially 1 and D is 0, then it will be added to the input file as:
The output file summarizes the given initial condition vector and the calculated dynamic trajectories as well as the corresponding updates of the state vector in the following form:
Dynamic Trajectories for Initial Condition: x_{0} = (given initial condition vector)
Trajectory t_{(traj#) }= {(edge #'s)}
State Vector Updates for t_{(traj #)}:
Time Step 1: (edge #)
x_{1} = (first update of the state vector)
Time Step 2: (edge #)
x_{2} = (second update of the state vector)
...and so on for the next time steps and other dynamic trajectories. The output file for the above process and initial condition is:
Dynamic Trajectories for Initial Condition: x_{0} = [1, 1, 1, 0]^{T}
Trajectory t_{1} = {e_{1}}
State Vector Updates for t_{1}:
Time Step 1: e_{1}
x_{1} = [0, 0, 1, 1]^{T}.
Availability
A MATLAB software package enabled with Graphical User Interface (GUI) is programmed based on our proposed approach to perform dynamical modeling of uncertain biological networks. This package along with the real data examples presented in this manuscript are available at http://gsp.tamu.edu/Publications/supplementary/mohsenizadeh15a.
References
Schaefer CF, et al: Pid: the pathway interaction database. Nucleic Acids Res. 2009, 37 (Database): 674679. 10.1093/nar/gkn653.
Milacic M, et al: Annotating cancer variants and anticancer therapeutics in reactome. Cancers (Basel). 2012, 4 (4): 11801211. 10.3390/cancers4041180.
Croft D: Building models using reactome pathways as templates. Methods Mol Biol. 2013, 1021: 273283. 10.1007/9781627034500_14.
Cerami EG, et al: Pathway commons, a web resource for biological pathway data. Nucleic Acids Res. 2011, 39 (Database): 685690. 10.1093/nar/gkq1039.
Demir E, et al: The biopax community standard for pathway data sharing. Nat Biotechnol. 2010, 28 (9): 935942. 10.1038/nbt.1666.
Hucka M, et al: The systems biology markup language (sbml): a medium for representation and exchange of biochemical network models. Bioinformatics. 2003, 19 (4): 52431. 10.1093/bioinformatics/btg015.
Bornstein BJ, Keating SM, Jouraku A, Hucka M: Libsbml: An api library for sbml. Bioinformatics. 2008, 24 (6): 880881. 10.1093/bioinformatics/btn051.
Funahashi A, Matsuoka Y, Jouraku A, Morohashi M, Kikuchi N, Kitano H: Celldesigner 3.5: A versatile modeling tool for biochemical networks. Proceedings of the IEEE. 2008, 96 (8): 12541265.
Loew LM, Schaff JC: The virtual cell: a software environment for computational cell biology. TRENDS Biotechnol. 2001, 19: 401406. 10.1016/S01677799(01)017401.
Azeloglu EU, Iyengar R: Good practices for building dynamical models in systems biology. Sci Signal. 2015, 8 (371): 810.1126/scisignal.aab0880.
Weng G, Bhalla US, Iyengar R: Complexity in biological signaling systems. Science. 1999, 284 (5411): 9296. 10.1126/science.284.5411.92.
Chaouiya C, et al: Sbml qualitative models: a model representation format and infrastructure to foster interactions between qualitative modelling formalisms and tools. BMC Syst Biol. 2013, 7: 13510.1186/175205097135.
Koch I, Reisig W, Schreiber F: Modeling in Systems Biology: the Petri Net Approach. 2011, Springer, London
Kanehisa M, Goto S, Furumichi M, Tanabe M, Hirakawa M: Kegg for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. 2010, 38 (Database): 355360. 10.1093/nar/gkp896.
Klamt S, SaezRodriguez J, Lindquist JA, Simeoni L, Gilles ED: A methodology for the structural and functional analysis of signaling and regulatory networks. BMC Bioinformatics. 2006, 7 (56):
Terfve C, Cokelaer T, MacNamara A, Henriques D, Goncalves E, Morris MK, van Iersel M, Lauffenburger DA, SaezRodriguez J: Cellnoptr: a flexible toolkit to train protein signaling networks to data using multiple logic formalisms. BMC Systems Biology. 2012, 6 (133):
Albert I, Thakar J, Li S, Zhang R, Albert R: Boolean network simulations for life scientist. Source Code Biol Med. 2008, 3 (16):
Ratzer AV, Wells L, Laursen HM, Qvortrup JF, Stissing MS, Westergaard M, Christensen S, Jensen K: Cpn tools for editing, simulating, and analysing coloured petri nets. Proceedings of the 24th International Conference on Applications and Theory of Petri Nets. ICATPN'03. 2003, 450462.
Mohsenizadeh DN: DMUN: Dynamical Modeling of Uncertain Networks (version 2.0) [software]. 2014, Available at: http://gsp.tamu.edu/Publications/supplementary/mohsenizadeh15a
Alon U: An Introduction to Systems Biology: Design Principles of Biological Circuits. 2006, Taylor & Francis, Boca Raton, FL
de Leon SBT, Davidson EH: Modeling the dynamics of transcriptional gene regulatory networks for animal development. Dev Biol. 2009, 325 (2): 31728. 10.1016/j.ydbio.2008.10.043.
Cormen TH, Leiserson CE, Rivest RL, Stein C: Introduction to Algorithms. 2009, The MIT Press, Cambridge, MA, 3
Garg A, Cara AD, Xenarios I, Mendoza L, Micheli GD: Synchronous versus asynchronous modeling of gene regulatory networks. Bioinformatics. 2008, 24 (17): 19171925. 10.1093/bioinformatics/btn336.
Wynn ML, Consul N, Merajver SD, Schnell S: Logicbased models in systems biology: a predictive and parameterfree network analysis method. Integr Biol (Camb). 2012, 4 (11): 132337. 10.1039/c2ib20193c.
Wei MC, et al: Proapoptotic bax and bak: a requisite gateway to mitochondrial dysfunction and death. Science. 2001, 292 (5517): 72730. 10.1126/science.1059108.
Minet E, Cosse JP, Demazy C, Raes M, Michiels C: Accumulation of the proapoptotic factor bak is controlled by antagonist factor mcl1 availability. Apoptosis. 2006, 11 (6): 103947. 10.1007/s1049500666505.
Placzek WJ, Wei J, Kitada S, Zhai D, Reed JC, Pellecchia M: A survey of the antiapoptotic bcl2 subfamily expression in cancer types provides a platform to predict the efficacy of bcl2 antagonists in cancer therapy. Cell Death Dis. 2010, 1 (5): 4010.1038/cddis.2010.18.
Qi M, Elion EA: Map kinase pathways. J Cell Sci. 2005, 118: 356972. 10.1242/jcs.02470.
Langlois WJ, Sasaoka T, Saltiel AR, Olefsky JM: Negative feedback regulation and desensitization of insulinand epidermal growth factorstimulated p21ras activation. J Biol Chem. 1995, 270 (43): 253203. 10.1074/jbc.270.43.25320.
Le XF, Vadlamudi R, McWatters A, Bae DS, Mills GB, Kumar R, Bast RCJ: Differential signaling by an antip185(her2) antibody and heregulin. Cancer Research. 2000, 60 (13): 352231.
Abel EV, et al: Melanoma adapts to RAF/MEK inhibitors through FOXD3mediated upregulation of ERBB3. J Clin Invest. 2013, 123 (5): 215568. 10.1172/JCI65780.
Cook RS, et al: ErbB3 ablation impairs PI3K/Aktdependent mammary tumorigenesis. Cancer Research. 2011, 71 (11): 394151. 10.1158/00085472.CAN103775.
Amin DN, Sergina N, Ahuja D, McMahon M, Blair AJ, Wang D, Hann B, Koch KM, Shokat KM, Moasser MM: Resiliency and vulnerability in the HER2HER3 tumorigenic driver. Sci Transl Med. 2010, 2 (16): 167.
Dehghannasiri R, Yoon B, Dougherty ER: Optimal experimental design for gene regulatory networks in the presence of uncertainty. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2014
Acknowledgements
This work and its publication were supported by the National Institutes of Health (NIH) grant number 5R25CA090301.
This article has been published as part of BMC Bioinformatics Volume 16 Supplement 13, 2015: Proceedings of the 12th Annual MCBIOS Conference. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/16/S13.
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
D.N.M., J.H. and E.R.D. conceived the method. D.N.M. developed the dynamical modeling methodology and programmed the software package. D.N.M., J.H., M.B. and E.R.D. selected the real data examples and provided insights on the interpretation of the results. All authors wrote the manuscript.
Electronic supplementary material
12859_2015_7154_MOESM1_ESM.pdf
Additional file 1: Dynamical Modeling Algorithm. This file contains the detailed algorithm proposed for dynamical modeling of uncertain interactionbased biological networks. (PDF 98 KB)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.
The Creative Commons Public Domain Dedication waiver (https://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Mohsenizadeh, D.N., Hua, J., Bittner, M. et al. Dynamical modeling of uncertain interactionbased genomic networks. BMC Bioinformatics 16 (Suppl 13), S3 (2015). https://doi.org/10.1186/1471210516S13S3
Published:
DOI: https://doi.org/10.1186/1471210516S13S3
Keywords
 Dynamical model
 Uncertain networks
 Algorithm design