Skip to main content

Dynamical modeling of uncertain interaction-based genomic networks

Abstract

Background

Most dynamical models for genomic networks are built upon two current methodologies, one process-based and the other based on Boolean-type networks. Both are problematic when it comes to experimental design purposes in the laboratory. The first approach requires a comprehensive knowledge of the parameters involved in all biological processes a priori, whereas the results from the second method may not have a biological correspondence and thus cannot be tested in the laboratory. Moreover, the current methods cannot readily utilize existing curated knowledge databases and do not consider uncertainty in the knowledge. Therefore, a new methodology is needed that can generate a dynamical model based on available biological data, assuming uncertainty, while the results from experimental design can be examined in the laboratory.

Results

We propose a new methodology for dynamical modeling of genomic networks that can utilize the interaction knowledge provided in public databases. The model assigns discrete states for physical entities, sets priorities among interactions based on information provided in the database, and updates each interaction based on associated node states. Whenever uncertainty in dynamics arises, it explores all possible outcomes. By using the proposed model, biologists can study regulation networks that are too complex for manual analysis.

Conclusions

The proposed approach can be effectively used for constructing dynamical models of interaction-based genomic networks without requiring a complete knowledge of all parameters affecting the network dynamics, and thus based on a small set of available data.

Background

Cellular regulatory networks consist of thousands of molecular processes/reactions collected over years from many studies. Based on their relationships and functions, these interactions and their associated molecular entities have been categorized into various pathways. Biologists have long used pathway knowledge to help derive the dynamics of cellular interactions, generate plausible hypotheses, and design promising experiments; however, our knowledge of regulatory networks is growing quickly and manual exploration is becoming increasingly difficult. The challenge of understanding network dynamics has become overwhelming in cancer drug development because multiple pathways are commonly involved and the direct targets of drug intervention are often multiple steps away from the key mechanism to be controlled, which could be in a different compartment and different pathway. Thus, a salient goal of computational biology is to simulate the dynamics of genetic/molecular regulatory networks based on available biological knowledge and in doing so help biologists generate meaningful hypotheses and improve experimental design.

A considerable amount of pathway knowledge centered around biological interactions is currently being curated into well organized public databases, such as PID [1], Reactome [2, 3], and Pathway Commons [4], where the interactions and their associated physical entities are defined and shared via a standard language such as BioPAX [5]. The popular standard, Systems Biology Markup Language (SBML), has adopted systems of Ordinary Differential Equations (ODEs) or Partial Differential Equations (PDEs) as a natural way to describe the fine-grained interactions inside the pathways [6]. Along with the emergence of SBML, there are softwares that can run ODE and/or PDE-based cell models, e.g. LibSBML [7], CellDesigner [8], and the Virtual Cell [9]. However, such an approach demands complete information on the rate equations and their parameters in all reactions to be determined before any simulation can be conducted [10]. This requirement is commonly not available in translational research, for instance, in drug discovery [11].

Taking a coarse perspective, qualitative methods such as Boolean networks for regulatory networks [12] or Petri net for metabolic networks [13], focus on the physical entities rather than interactions. In these methods genes and gene products are linked by logical or algebraic relationships. While there are some popular knowledge databases, such as KEGG [14], where direct relationships are readily available, these cannot be directly adapted to interaction-centered databases (KEGG allows both relation and reaction, although relations are commonly provided.). Although it is possible to manually convert such databases into a form fit for Boolean modeling, the translation is not unique and can be unstable in relation to small interaction alterations. There are also software tools that can be used to run such simulations, including CellNetAnalyzer [15], CellNOpt [16], BooleanNet [17], and CPN tools [18].

Furthermore, all the existing approaches, differential equations, Boolean, or Petri net, require a complete and exact knowledge of the network, including the networks readily available from the public databases, whereas such knowledge associated with the biological processes is often incomplete and admits multiple possibilities. Although biologists routinely take these uncertainties into their own consideration, current computational approaches do not. Hence, a huge amount of curated pathway knowledge cannot be effectively utilized by the computer.

To help remedy this situation, we proposed a general scheme in which uncertainty is considered as part of the system and all possible outcomes that are attainable due to different possible ways in which pathway steps can be carried out will be considered accordingly. Via this approach, one can apply knowledge retrieved from interaction-centered knowledge databases directly to perform systematic simulations. Ultimately, with more meaningful predictions of network dynamics, a better guide to experimentation can be achieved.

This paper is organized as follows. In the Methods section, we provide some definitions and introduce an interaction-based network model which can be generated from an input biological database. We discuss about the dynamic trajectories of a network and the concept of uncertainty in network dynamics. An abstract form of the dynamical modeling algorithm for computing the dynamic trajectories characterizing the dynamical behavior of a network for a given initial condition is also provided in this section. A study on the dynamics of a number of real data networks using our dynamical modeling methodology is given in the Results and discussion section. We also describe our current research activities. Finally, we provide our concluding remarks in the Conclusions section.

Methods

To produce our network model, we employ knowledge retrieved from public knowledge databases such as PID and Reactome. To simplify the discussion, we assume that knowledge is shared via the standard language BioPAX, specifically, BioPAX3. To facilitate efficient simulation, we first represent the interactions and associated physical entities shared in BioPAX using a set of nodes and edges.

An interaction-based network model

The model M, of the biological database under study, is a 2-tuple M = (N, E), where N is the set of network nodes representing the biological entities in the database, including genes, proteins, complexes, molecules, transcripts, metabolites and chemicals, interacting with each other, and E is the set of network hyper-edges (a hyper-edge is an edge that can have multiple inputs/outputs), or simply edges, representing the biological interactions and processes in the database. The set of network nodes N and network edges E can be automatically generated from a simple text-based input file describing the biological interactions using our developed software package [19]. A detailed description on preparing the input file can be found in the Appendix section.

Each node in the network has a value, where the abstract term "value" is interpreted based on the characteristics of that biological entity. It may represent the concentration level of a protein/chemical or the expression level of a gene, indicating whether the gene is active or inactive. It does not necessarily indicate the relationship, such as concentration relationship, between different biological entities. The concentration of proteins, chemicals, and so on, in the cell are continuous variables; however, there are reasons why discrete variables are more suitable. First, accurate real-time concentration measurements are almost impossible. Second, a gene/protein typically participates in a biological process if its concentration is above a threshold level and such behavior is readily handled using discrete variables. Finally, from a control system engineering perspective, it is computationally and practically easier to deal with discrete-time systems rather than continuous-time systems. Therefore, we assume that network nodes take non-negative integer values. The state of a network with m nodes, namely N = {n1, n2,..., n m }, is a vector x 0 m , 0 being the set of non-negative integers, where each element of x is the value of a node n N. Let us also denote by y the vector of observable/measurable states, representing the subset of genes, proteins, transcripts, and so on, whose activities can be observed or measured and are of our interest. We represent the i-th node in N by n i . The same notation will be used for network edges, for example, e j refers to the j-th edge in E.

Each network edge represents a corresponding biological process or interaction in which, generally, the process reactants (inputs) interact with each other to form products (outputs) under a set of conditions. Thus, in general, each edge will have input and output nodes. In cases where a process is controlled, for instance, via activation or inhibition by a biological entity, a corresponding control node is associated to that edge. The input, output, and control nodes of the i-th edge are denoted by e i .I, e i .O and e i .C, respectively. We assume that an edge (a process) can dynamically take place if all its input and activator nodes have nonzero values and its inhibitor nodes have zero values. We use the following rules to update the node values when the i-th edge is being dynamically processed:

e i I | new = e i I | old - 1 e i O | new = e i O | old + 1 e i C | new = e i C | old
(1)

Arrows indicate the directions of processes/edges graphically. A dashed arrow is used to show an activation or inhibition in a network process. As an example, a process in which protein A converts to protein B if the activator protein C presents is shown as follows:

C A B
(2)

To generate a network model for the above process, we define the set of network nodes by N = {n1, n2, n3} = {A, B, C}, and the set of network edges by E = {e1}:

n 3 n 1 n 2 e 1
(3)

where n1, n2 and n3 are the input, output and activator nodes of e1, respectively. Supposing that the initial condition is given by x0 = [1, 0, 1]T, since the input and activator nodes have nonzero values, the updated state vector, following the rules in (1), becomes x1 = [0, 1, 1]T, and we write:

1 0 1 x 0 e 1 0 1 1 x 1
(4)

Generally, in conversion processes to which activator entities are associated, if the activator is removed, the conversion will be halted and even reversed. Also, as can be seen in real cellular systems, if the activator does not present then the new input species coming from other processes will not convert to outputs and will accumulate; thus, the outputs gradually degrade and will be depleted. Based on this observation, we make the following assumption.

Assumption 1 If in a conversion process, the activator entity has zero value and the outputs (right-hand side entities) have nonzero values, then the process takes place in the reverse direction.

Considering the statement of Assumption 1 for any conversion process in the network under study to which an activator entity is associated, an additional network edge corresponding to the reverse direction of the original process will be automatically added to E. For instance, for the conversion process in (2), we present the network model in the following form:

n 3 n 1 e 1 ( rev : e 2 ) n 2
(5)

Now, if x0 = [0, 1, 0]T, then the process takes place in the reverse direction resulting in x1 = [1, 0, 0]T .

For the set of biological processes in which a transcription takes place we follow the BioPAX format in which no input is assigned, but only an activator and an output are defined, these representing whether the transcription has occurred or not. Thus, for this subset of network edges we use the following logic to update the output node value:

  if e i C a c t > 0 then e i O | new = 1 else if e i C a c t = 0 then e i O | new = 0
(6)

where e i .C act represents the activator node associated to e i . Once we have updated the output node of a transcription process we will not update that node any further until the activator node value changes. For example, suppose that transcription B happens if protein A presents. We thus define N = {n1, n2} = {A, B} and E = {e1}:

e 1 : n 1 n 2
(7)

Now, for the initial condition x0 = [1, 0]T , the updated state vector will be x1 = [1, 1]T , implying that the corresponding transcription happens.

Dynamic trajectories and uncertainty in network dynamics

Once an interaction-based network model M is developed, a dynamical analysis can be performed. Given a network model M = (N, E), an initial condition vector x0, and defining the parameter time step, the network state vector x can be updated at each time step based on our proposed algorithm, presented later in this section. The abstract term time step does not necessarily reflect the real clock time that biological processes take to complete; the real clock time for one time step can be substantially different from another time step. We denote by x k the network state vector at the k-th time step. Generally, a complete knowledge of all parameters affecting the dynamics of a biological process is not available. Besides, in most practical cases, these parameters can not be even measured or observed. Therefore, in general, for a given network initial condition x0, different sequences of processes can happen, these being referred to as dynamic trajectories and denoted by T. Each trajectory t T, indicates one possible sequence of processes that can take place. We describe the trajectory t T by t = {e i , e j ,...}, for appropriate values of i, j,..., which means that in the first time step process i (edge e i ) happens, in the second time step process j (edge e j ) takes place, and so on. The fact that, for a given x0, multiple dynamic trajectories can be computed reflects uncertainty that we refer to as dynamics uncertainty. In cases where a possibly limited knowledge about the biological processes is available, it can be effectively used to reduce the dynamics uncertainty of the model. This can be accomplished by assigning appropriate properties to the set of network edges, as discussed next. We emphasize that the dynamic trajectories of a network depend on the properties assigned to the network edges as well as the network initial condition.

Network edge properties

To each network edge a set of properties, upon availability of corresponding information in the biological database under study, can be associated, these being critical in characterizing network dynamics. For each network edge, these properties will be stored in a vector associated with that specific edge and can be accessed in the dynamical modeling algorithm, to be discussed shortly. For instance, to access the property X of the i-th edge we call e i .X. The dynamical modeling algorithm consists of conditional statements being functions of the edge properties; thus, the algorithm outputs depend on the edge properties. The properties, speed and priority, are assigned to each edge according to the nature or earlier measurements of that biological process.

Speed

The speed label describes the time scale needed for a process to complete [20, 21]. This is critical in dynamical network modeling because fast processes, such as complex assembly processes in which a group of proteins bind together to form a complex, happen almost instantaneously, while slow processes, such as transcription and apoptosis (cell death), take much longer times to complete. If such qualitative knowledge on the time scale of processes can be obtained from the biological database, then corresponding speed labels can be assigned to network edges, thereby reducing uncertainty in the dynamical model; otherwise, the dynamical model will embrace such uncertainty. In BioPAX3, processes are categorized into Conversion, Template Reaction, Control, Molecular Interaction, and Genetic Interaction. In process-based knowledge databases, almost all processes belong to the first three categories and every Control process is essentially associated with either a Conversion or a Template Reaction process. By default, to any process labeled as Conversion in the knowledge database, we assign a "fast" label; however, for those labeled as Template Reaction, such as processes that end in a phenotypical or pathway level event, a "slow" label is assigned, because the Control and Conversion interactions generally happen considerably faster than the Template Reaction processes. If knowledge about the time scale of a process is not available, then a "fast" label is assigned to the corresponding edge by default.

To show how the speed property can affect network dynamics, consider proteins A, B, and C interacting according to:

A + B A . B C + A . B A + B . C A . B Apoptosis
(8)

where the first two processes are of Conversion (complex assembly) type and the last one, resulting in apoptosis, is a Template Reaction process. We define N = {n1, n2, n3, n4, n5, n6} = {A, B, C, A.B, B.C, Apoptosis}, and E = {e1, e2, e3}:

n 1 + n 2 e 1 n 4 n 3 + n 4 e 2 n 1 + n 5 n 4 e 3 n 6
(9)

Also, suppose that the initial condition is given as x0 = [1, 1, 1, 0, 0, 0]T . According to the above description of processes, e1 and e2 happen fast, while e3 takes more time to complete. If such time-scale knowledge is not incorporated into the network model in the form of appropriate edge properties and we, by default, assign a "fast" label to all the edges, then the network state vector can be updated in two different ways, as detailed below, thereby introducing uncertainty to the dynamical model:

1 1 1 0 0 0 x 0 e 1 0 0 1 1 0 0 x 1 1 e 2 1 0 0 0 1 0 x 2 1
(10)
1 1 1 0 0 0 x 0 e 1 0 0 1 1 0 0 x 1 1 e 3 0 0 1 0 0 1 x 2 2
(11)

In the above updates of the network states, x k p indicates the p-th possible update of the state vector at the k-th time step. Now, if one applies the process timescale knowledge by assigning a "fast" label to the speed of e1 and e2, and a "slow" label to that of e3, then the state vector will be updated only according to the sequence in (10). Use of time-scale information reduces uncertainty in the dynamical model.

Priority

The priority label helps when there exist two or more processes with nonzero input and activator nodes, and zero inhibitor nodes, meaning that they have equal probability to happen; however, a prior knowledge states that only one or few of them will happen in practice. Priority is usually not provided in the knowledge database; but in many cases we can set default priorities based on experience. For instance, if an entity is an input in process 1 and a control node in process 2, then process 1 has higher priority than process 2, because in process 1, the entity will be converted and not be available for control. For example, PIP3 is needed to phosphorylate AKT, yet PIP3 can also be converted to PIP2 by PTEN. Hence, if PTEN exists, PIP3 will be converted and not be able to phosphorylate AKT. Therefore, upon availability of such prior knowledge, one can assign priority labels to the network edges and reduce the uncertainty in the dynamical model; if this is not the case, then no priority labels will be assigned and the dynamical model will contain all of the possibilities. Now, consider the following example in which proteins A, B, C, and D bind together to form complex assemblies:

A + B A . B C + A . B A + B . C D + A . B B + A . D
(12)

We define N = {n1, n2, n3, n4, n5, n6, n7} = {A, B, C, D, A.B, B.C, A.D} and E = {e1, e2, e3}:

n 1  +  n 2 e 1 n 5 n 3  +  n 5 e 2 n 1  +  n 6 n 4  +  n 5 e 3 n 2  +  n 7
(13)

Since all the above processes are of Conversion type, we assign the "fast" label to the speed property of all the edges. Assuming that the initial condition is given by x0 = [1, 1, 1, 1, 0, 0, 0]T , the network state vector can be updated in two different ways:

1 1 1 1 0 0 0 x 0 e 1 0 0 1 1 1 0 0 x 1 1 e 2 1 0 0 1 0 1 0 x 2 1
(14)
1 1 1 1 0 0 0 x 0 e 1 0 0 1 1 1 0 0 x 1 1 e 3 0 1 1 0 0 0 1 x 2 2
(15)

Now, suppose that a prior knowledge indicates that, in situations where e2 and e3 can mathematically happen at the same time owing to nonzero input nodes, e2 has higher priority than e3 to take place. Accommodating this priority information into our network model by assigning the label " e 2 p r i o > e 3 p r i o " to the priority of e2, and " e 3 p r i o < e 2 p r i o " to that of e 3 , the state vector can now be updated based on the sequence of processes given in (14).

Dynamical modeling algorithm

An abstract form of the algorithm proposed to perform the dynamical analysis is provided below. A detailed algorithm can be found in the Additional file 1 to this manuscript. The algorithm is based on the breadthfirst search algorithm [22]. It takes the network model M = (N, E) and an initial condition x0 as inputs, and return the resulting dynamic trajectories T, and the corresponding updates of the state vector x k at each time step k. There are two approaches to update the network state vector: synchronous and asynchronous [23, 17]. With synchronicity, multiple network edges can be processed at the same time step; while with asynchronicity, only one network edge is processed at each time step. Our software package [19] has the capability to update the network state vector using either approach. Since, in general, the asynchronous approach yields a larger set of possible dynamic trajectories compared to the synchronous method [24], asynchronicity is the default method to update the network state vector. The abstract algorithm given below updates the state vector asynchronously.

Algorithm Dynamical Modeling Algorithm

Inputs: (N, E, x0 )

Outputs: (T, x)

1: k ← 1, T, x in x0, S in , x k , S k

2: run States Update Subroutine (Inputs: (x in , S in , x k , S k , T), Outputs: (x k , S k , T ))

3: while x k do

4:   new time step: kk + 1, x k , S k

5:   for each updated state vector x k - 1 i in xk- 1 do

6:      if corresponding S k - 1 i forms a cycle then

7:         if S k - 1 i T, then add S k - 1 i to T

8:      else

9:          x i n x k - 1 i , S i n S k - 1 i

10:         run States Update Subroutine (Inputs: (x in , S in , x k , S k , T ), Outputs: (x k , S k , T ))

11:      end if

12:   end for

13: end while

States Update Subroutine

Inputs: (x in , S in , x k , S k , T)

Outputs: (x k , S k , T)

1: initialize all nodes by x in

2: E v ← find all edges that have nonzero input nodes, nonzero activator nodes and zero inhibitor nodes (if applicable)

3: if E v then

4:   remove edges in E v having relatively slow/low priority label

5:   for each edge in E v do

6:      initialize all nodes by x in

7:       x k n e w ← update the state vector using the given rules

8:      x k ← {x k , x k n e w }

9:       S k n e w ← the sequence of edges yielded x k n e w

10:       S k S k , S k n e w

11:   end for

12: else

13:   if S in T, then add S in to T

14: end if

To illustrate the algorithm, consider the network in Figure 1, where N = {n1, n2,..., n12} and E = {e1, e2,..., e6}. Suppose that the biological database provides the following information about the processes in this network: in process/edge e1 proteins n1 and n2 bind together to form the complex n8, if the inhibitor gene n3 does not express; e2 is a transcription process (the only slow process in this example) in which n4 is the activator and n9 accounts for the status of the transcription; in e3 proteins n4 and n5 convert to complex n10 ; in e4 proteins n6 and n10 form compound n11 and release n4, if the activator n8 presents; and finally, in e5 proteins n7 and n10 convert to compound n12, but this process is assumed to have less priority compared to e4 . Based on the above description and according to the statement of Assumption 1, our method automatically adds edge e6 to the set of network edges to account for the reverse direction of e4. We incorporate the above biological information into the network model by assigning appropriate labels to the properties of network edges (see Table 1). Suppose that the initial condition is:

Figure 1
figure 1

An illustrative network.

Table 1 Edge properties for the illustrative network.

x 0 = [ 1 , 1 , 0 , 1 , 1 , 1 , 1 , 0 , 0 , 0 , 0 , 0 ] T

Following the proposed algorithm to calculate the dynamic trajectories and state vector updates, we get (the sign is used for comments, and "SU" indicates the States Update Subroutine):

  • line 1: k ← 1, T, x in x0, S in , x1, S1

  • line 2: run States Update Subroutine

  • line SU1: initialize all nodes by x in

  • line SU2: E υ = {e1, e2, e3}

  • line SU4: E υ = {e1, e3} e2 is removed because it is slower than other edges

  • line SU6: initialize all nodes by x in

  • line SU7-8: x 1 n e w = x 1 1 = 0 , 0 , 0 , 1 , 1 , 1 , 1 , 1 , 0 , 0 , 0 , 0 T , x 1 x 1 n e w

  • line SU9-10: S 1 n e w = S 1 1 = { e 1 } , S 1 S 1 n e w

  • line SU6: initialize all nodes by x in

  • line SU7-8: x 1 n e w = x 1 2 = 1 , 1 , 0 , 0 , 0 , 1 , 1 , 0 , 0 , 1 , 0 , 0 T , x 1 { x 1 , x 1 n e w }

  • line SU9-10: S 1 n e w = S 1 2 = { e 3 } , S 1 { S 1 , S 1 n e w }

  • line 4: k ← 2, x2, S2

  • line 9: x i n x 1 1 , S i n S 1 1

  • line 10: run States Update Subroutine

  • line SU1: initialize all nodes by x in

  • line SU2: E υ = {e2, e3}

  • line SU4: E υ = {e3}

  • line SU6: initialize all nodes by x in

  • line SU7-8: x 2 n e w = x 2 1 = 0 , 0 , 0 , 0 , 0 , 1 , 1 , 1 , 0 , 1 , 0 , 0 T , x 2 x 2 n e w

  • line SU9-10: S 2 n e w = S 2 1 = { e 1 , e 3 } , S 2 S 2 n e w

  • line 9: x i n x 1 2 , S i n S 1 2

  • line 10: run States Update Subroutine

  • line SU1: initialize all nodes by x in

  • line SU2: E υ = {e1, e5} since e4, which has higher priority than e5, is not in E υ , then e5 will not be removed once line SU4 is processed.

  • line SU6: initialize all nodes by x in

  • line SU7-8: x 2 n e w = x 2 2 = 0 , 0 , 0 , 0 , 0 , 1 , 1 , 1 , 0 , 1 , 0 , 0 T , x 2 { x 2 , x 2 n e w }

  • line SU9-10: S 2 n e w = S 2 2 = { e 3 , e 1 } , S 2 { S 2 , S 2 n e w }

  • line SU6: initialize all nodes by x in

  • line SU7-8: x 2 n e w = x 2 3 = 1 , 1 , 0 , 0 , 0 , 1 , 0 , 0 , 0 , 0 , 0 , 1 T , x 2 { x 2 , x 2 n e w }

  • line SU9-10: S 2 n e w = S 2 3 = { e 3 , e 5 } , S 2 { S 2 , S 2 n e w }

  • line 4: k ← 3, x3, S3

  • The next time steps can be followed easily.

Finally, the algorithm outputs are:

Dynamic trajectory t1 T : t 1 : = S 3 3 = { e 3 , e 5 , e 1 }

State updates for t1 T: x 1 2 , x 2 3 , x 3 3

Dynamic trajectory t2 T: t 2 : = S 4 1 = { e 1 , e 3 , e 4 , e 2 }

State updates for t2 T: x 1 1 , x 2 1 , x 3 1 , x 4 1

Results and discussion

In this section we study the dynamics of 3 networks: BAK1-MCL1 interaction, the MAP Kinase pathway, and the ErbB2/ErbB3 pathway. For each network we calculate the dynamic trajectories for a given initial condition vector. For illustration, these networks are graphically plotted in Microsoft Visio software and provided here. The computer used to perform dynamical analysis and compute dynamic trajectories is a Sony VAIO laptop with Intel Core 2 Duo 1.67 GHz CPU and 2 GB RAM.

BAK1-MCL1 interaction

Biological database and network model

Consider the network of BAK1-MCL1 interaction shown in Figure 2. In cells, once synthesized, protein BAK1 is localized at the mitochondria. Following the apoptotic stimulus, BAK1 will dimerize to form homodimers. These dimers will further oligomerize and trigger mitochondrial outer membrane permeabilization (MOMP) and induce apoptosis [25]. To counter such apoptosis stimulus and keep the the cell alive, an anti-apoptotic protein like MCL1 will bind to BAK1 with higher affinity and block BAK1 from forming homodimers, thereby stopping apoptosis [26]. Thus, in cancer cells, especially melanoma, MCL1 is commonly found to be overexpressed [27].

Figure 2
figure 2

BAK1-MCL1 interaction.

In this network X.Y indicates a binder complex produced by X and Y. We define the set of network nodes and edges by N = {n1, n2,..., n5} = {BAK1, MCL1, BAK1.BAK1, Apoptosis, MCL1.BAK1} and E = {e1, e2, e3, e4}. Process e2 is the only slow process in this network resulting in apoptosis; thus, we assign a "slow" label to the speed property of e2 and, by default, a "fast" label to e1, e3, and e4 . Also, as discussed, MCL1 binds to BAK1 with higher affinity than BAK1 binding to itself to form BAK1.BAK1 [26]; therefore, e3 has higher priority than e1, and we accordingly assign the priority label " e 1 p r i o < e 3 p r i o " to e1, and " e 3 p r i o < e 1 p r i o " to e3 . In this network the observable state is y = n4 = Apoptosis, since it can be checked under a microscope.

Dynamic trajectories

Suppose that the network initial state vector is given by x0 = [2, 1, 0, 0, 0]T , biologically meaning that the concentration of BAK1 is considerably higher than MCL1, the concentration of MCL1 is enough to participate in the processes in Figure 2, and other entities do not exist or have considerably low concentrations. For this initial condition and the assigned edge properties, our algorithm returns the following possible updates of the network state vector (for notational simplicity we drop the superscripts used for the updated state vectors):

2 1 0 0 0 x 0 e 3 1 0 0 0 1 x 1 e 1 0 0 1 0 1 x 2 e 2 0 0 0 1 1 x 3
(16)

The dynamic trajectory in (16) is

T = { t 1 } = { e 3 , e 1 , e 2 }
(17)

The dynamic trajectory obtained here yields the final value of the observable state y, the apoptosis status, to be 1, which biologically means that apoptosis takes place. This result agrees with our expectation from a biological point of view because initially there was insufficient amount of MCL1 to bind to the higher amount of BAK1 to avoid apoptosis. The CPU time to calculate the dynamic trajectories for this network using the computer specifications described above was 0.1 sec.

MAP Kinase pathway

Biological database and network model

The dynamical study of pathways containing feedback processes has practical importance for experimental design and therapy. In this example we considered the set of biological processes shown in Figure 3, which is based on the well-studied MAK Kinase pathway. Under the stimulus of a growth factor, GRB2.SOS complex will be recruited to the cell membrane, which will transform the GDP-bound RAS into its active form, GTP-bound RAS. The activated RAS will in turn activate the Raf/MEK/ERK pathway [28]. However, it has been shown that the activation of RAS is often short and transient, due to a negative feedback loop through ERK, which dissociates the GRB2.SOS complex, thereby inducing an oscillatory pattern in ERK activity [29].

Figure 3
figure 3

MAP Kinase pathway.

The set of network nodes is N = {n1, n2,..., n 9 } = {RAS.GDP, RAS.GTP, GRB2.SOS, RAF, pRAF, ERK, pERK, GRB2, SOS} and the network edges are E = {e1, e2,..., e8 }. The database provides information about the processes e1, e2, e3, and e4 . Owing to Assumption 1, our algorithm automatically generates the corresponding edges e5, e6, e7, and e8 to account for the reversible direction of the original processes. By default we assign a "fast" label to all edges. Since the database gives no information about the priority between the processes, no priority labels are assigned.

Dynamic trajectories

Given the initial condition x0 = [1, 0, 1, 1, 0, 1, 0, 0, 0]T , our algorithm yields the following dynamic trajectory:

T = { t 1 } = { e 1 , e 2 , e 3 , e 4 , e 5 , e 6 , e 7 , e 8 , . . . }
(18)

and this sequence repeats. The effect of the feedback process can be seen in the dynamic trajectory as the sequence of edges is repeating over time. The network state vector evolves as follows:

1 0 1 1 0 1 0 0 0 x 0 e 1 0 1 1 1 0 1 0 0 0 x 1 e 2 0 1 1 0 1 1 0 0 0 x 2 e 3 0 1 1 0 1 0 1 0 0 x 3 e 4 0 1 0 0 1 0 1 1 1 x 4 e 5 1 0 0 0 1 0 1 1 1 x 5 e 6 1 0 0 1 0 0 1 1 1 x 6 e 7 1 0 0 1 0 1 0 1 1 x 7 e 8 1 0 1 1 0 1 0 0 0 x 0 e 1 . . . . .
(19)

The CPU time for this network was 0.3 sec.

ErbB2/ErbB3 pathway

Biological database and network model

In this example (see Figure 4) we studied the dynamics of the ErbB2/ErbB3 pathway downloaded from the NCI pathway interaction database (http://pid.nci.nih.gov). The roles and connections of the ErbB2, ErbB3 and Neuregulin proteins and peptides in supporting both the MEK and PI3K pathways were first mapped out around the year 2000 [30], and form the basis for the pathway displayed in Figure 4. The dependencies of RAF/MEK pathways [31], and PI3K/AKT pathways [32], on the levels of activated ErbB3 have been thoroughly demonstrated to show a very high level of reliance. In this network, the edges e31, e32, e33, e36 are transcription processes and are the only slow processes in the network; thus, they are assigned a "slow" label. A "fast" label, by default, is assigned to all other edges. Since no priority information is available in the database, no priority labels are assigned. The vector of observable states for this pathway is y = [FOS[n], CHRNA1[n], JUN[n], CHRNE[n]]T , which are the output nodes of the transcription processes e31, e32, e33, and e36, respectively, whose activities can be measured by a microarray.

Figure 4
figure 4

ErbB2/ErbB3 pathway.

Dynamic trajectories

Assuming that the initial condition of the nodes marked with (*) is 1 and the others are 0, our algorithm gives 7 dynamic trajectories:

t 1 = { e 19 , e 20 , e 21 , e 47 } T t 2 = { e 19 , e 20 , e 21 , e 12 , e 47 , e 48 } T t 3 = { e 19 , e 20 , e 21 , e 12 , e 13 , e 47 , e 48 , e 49 } T t 4 = { e 19 , e 20 , e 21 , e 12 , e 13 , e 14 , e 47 , e 48 , e 49 , e 50 } T t 5 = { e 19 , e 20 , e 21 , e 12 , e 13 , e 14 , e 47 , e 48 , e 49 , e 50 , e 51 } T t 6 = { e 19 , e 24 , e 25 , e 26 , e 27 , e 28 , e 29 , e 30 , e 31 , e 32 , e 33 } T t 7 = { e 1 , e 24 , e 25 , e 26 , e 27 , e 28 , e 29 , e 30 , e 31 , e 32 , e 33 , e 40 , e 19 } T
(20)

where the dynamic trajectories t6 and t7 contain transcription processes giving the final value of vector y, as y = [1, 1, 1, 0]T. The CPU time for computing these trajectories was 25.8 sec.

Drugs such as lapatinib can have a very strong inhibitory effect on levels of ErbB2/ErbB3 activity, though the drug's effects are themselves highly dependent on the presence of only low levels of ErbB2/ErbB3 [33]. The immunoblot figure in [33] shows that by suppressing activated ErbB2/ErbB3, both AKT and ERK phosphorylation will be turned off. To see how suppressing the activity of ErbB2/ErbB3 affects the network dynamics we consider another initial condition vector. Suppose that the nodes marked with (†) are initially set to 1 and the others are 0. For this initial condition, our algorithm yields no dynamic trajectory, T = . This implies that none of the AKT and ERK phosphorylation processes occur, as we expected from the results in [33]. A similar dynamical analysis can be performed for other initial conditions, thereby giving the set of all possible outcomes.

Current research

Each network edge in a dynamic trajectory can be mathematically expressed by a state transition matrix A. For instance, the transition from x0 to x1 in (4) can be described by

x 1 = A 1 x 0
(21)

where

A 1 = 0 0 0 1 / 2 0 1 / 2 1 / 2 0 1 / 2
(22)

A state transition matrix facilitates writing the network dynamics in terms of state space equations suitable for control system studies.

Given a network model M = (N, E), an initial condition x0, the calculated dynamic trajectories T, and recalling the notion of observable state vector y, one can mathematically describe the network dynamics corresponding to trajectory t T by the following set of state space equations:

x k + 1 = A k + 1 x k y k = C x k
(23)

which is a linear discrete-time time-varying system. We alternatively refer to y k as the system output at the k-th time step.

From a biological viewpoint, not all dynamic trajectories are primarily of interest for design purposes, but those including observable states are important because the result of changes to network inputs or initial condition can be measured. Thereafter, a suitable experimental design can be proposed to attain a desired performance. We are currently conducting research on developing a new methodology to find logical relationships between network state variables and a set of desirable values for network observable states. This problem, referred to as a "backward inference problem", is a key step toward experimental design because it characterizes the necessary and sufficient conditions on the network state variables for which a desirable network performance can be obtained.

Conclusions

We have proposed a new dynamical modeling methodology that can handle common biological knowledge without the need of exact parameters related to process dynamics. Our proposed approach has similarities in some aspects to other qualitative methods. The node update for conversion processes is similar to token passing in Petri nets, while the update for transcription processes functions like Boolean logic. These similarities occur because they are natural ways to describe the basic processes. However, to properly handle the pathway knowledge commonly available from a public database as a BioPAX file with minimal extra input, one needs to apply these canonical approaches in a practical way, which results in the kind of hybrid approach introduced in this paper. For instance, in the ErbB2/ErbB3 pathway example we were able to use different updating rules for the conversion and transcription processes that resulted in a dynamical model which captures the network dynamics more accurately. We would like to conclude by pointing out critical differences between our proposed method and the standard Petri net and Boolean network approaches.

First, for conversion type processes, a Petri net usually does not consider the role of activator by assuming they are always present, and the potential reversibility of the process, so that every physical entity ("place" in Petri net terminology) is either an input or an output. Although this can be partially solved by introducing a reading arc to indicate when a process can take place, it cannot be incorporated into the incidence matrix, which is critical for a fast update. As a result, a Petri net loses one of its most attractive properties. The use of an incidence matrix is further hampered by the synchronous update, which assumes that equal time elapses for each process.

For practical applications, a critical shortcoming of the standard approaches is the manner in which they handle uncertainty. A standard Petri net handles uncertainty in a random manner and Boolean networks demand that all uncertainties be resolved before simulation starts. In reality, a biological process often takes a deterministic fashion unknown to the modeler, which demands a comprehensive exploration of all the possibilities. Our choice of this hybrid approach is based on the purpose of utilizing public knowledge databases; however, the concept of exploiting all possible uncertainties is not limited to the specific procedure considered here. It can be applied to other qualitative methods, including both Boolean networks and Petri nets.

In conclusion, the entire procedure emulates how biologists actually utilize available pathway information. Our ultimate aim is to use predicted outcomes to design the most efficient experiments [34] to reduce uncertainty and find the best perturbation scheme to achieve desired network performance.

Appendix

The input and output file formats of our software package are simple-text files. The input file contains the set of interactions/processes in the network and the initial condition vector in the following form:

Process ( p r o c e s s   # ) Input ( i n p u t n o d e s ) Output ( o u t p u t n o d e s ) Activator ( a c t i v a t o r n o d e s ) Inhibitor ( i n h i b i t o r n o d e s )

For instance, for the following process

C A + B D

the input file contains:

Process 1 Input A B Output D Activator C

The initial condition vector is added to the input file (after the list of processes) as follows:

Entity Value ( n o d e ) ( i n i t i a l c o n d i t i o n )

For the above process, if A, B and C are initially 1 and D is 0, then it will be added to the input file as:

Entity Value A 1 B 1 C 1 D 0

The output file summarizes the given initial condition vector and the calculated dynamic trajectories as well as the corresponding updates of the state vector in the following form:

Dynamic Trajectories for Initial Condition: x0 = (given initial condition vector)

Trajectory t(traj#) = {(edge #'s)}

State Vector Updates for t(traj #):

Time Step 1: (edge #)

x1 = (first update of the state vector)

Time Step 2: (edge #)

x2 = (second update of the state vector)

...and so on for the next time steps and other dynamic trajectories. The output file for the above process and initial condition is:

Dynamic Trajectories for Initial Condition: x0 = [1, 1, 1, 0]T

Trajectory t1 = {e1}

State Vector Updates for t1:

Time Step 1: e1

x1 = [0, 0, 1, 1]T.

Availability

A MATLAB software package enabled with Graphical User Interface (GUI) is programmed based on our proposed approach to perform dynamical modeling of uncertain biological networks. This package along with the real data examples presented in this manuscript are available at http://gsp.tamu.edu/Publications/supplementary/mohsenizadeh15a.

References

  1. Schaefer CF, et al: Pid: the pathway interaction database. Nucleic Acids Res. 2009, 37 (Database): 674-679. 10.1093/nar/gkn653.

    Article  Google Scholar 

  2. Milacic M, et al: Annotating cancer variants and anti-cancer therapeutics in reactome. Cancers (Basel). 2012, 4 (4): 1180-1211. 10.3390/cancers4041180.

    Article  CAS  Google Scholar 

  3. Croft D: Building models using reactome pathways as templates. Methods Mol Biol. 2013, 1021: 273-283. 10.1007/978-1-62703-450-0_14.

    Article  CAS  PubMed  Google Scholar 

  4. Cerami EG, et al: Pathway commons, a web resource for biological pathway data. Nucleic Acids Res. 2011, 39 (Database): 685-690. 10.1093/nar/gkq1039.

    Article  Google Scholar 

  5. Demir E, et al: The biopax community standard for pathway data sharing. Nat Biotechnol. 2010, 28 (9): 935-942. 10.1038/nbt.1666.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Hucka M, et al: The systems biology markup language (sbml): a medium for representation and exchange of biochemical network models. Bioinformatics. 2003, 19 (4): 524-31. 10.1093/bioinformatics/btg015.

    Article  CAS  PubMed  Google Scholar 

  7. Bornstein BJ, Keating SM, Jouraku A, Hucka M: Libsbml: An api library for sbml. Bioinformatics. 2008, 24 (6): 880-881. 10.1093/bioinformatics/btn051.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Funahashi A, Matsuoka Y, Jouraku A, Morohashi M, Kikuchi N, Kitano H: Celldesigner 3.5: A versatile modeling tool for biochemical networks. Proceedings of the IEEE. 2008, 96 (8): 1254-1265.

    Article  Google Scholar 

  9. Loew LM, Schaff JC: The virtual cell: a software environment for computational cell biology. TRENDS Biotechnol. 2001, 19: 401-406. 10.1016/S0167-7799(01)01740-1.

    Article  CAS  PubMed  Google Scholar 

  10. Azeloglu EU, Iyengar R: Good practices for building dynamical models in systems biology. Sci Signal. 2015, 8 (371): 8-10.1126/scisignal.aab0880.

    Article  Google Scholar 

  11. Weng G, Bhalla US, Iyengar R: Complexity in biological signaling systems. Science. 1999, 284 (5411): 92-96. 10.1126/science.284.5411.92.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Chaouiya C, et al: Sbml qualitative models: a model representation format and infrastructure to foster interactions between qualitative modelling formalisms and tools. BMC Syst Biol. 2013, 7: 135-10.1186/1752-0509-7-135.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Koch I, Reisig W, Schreiber F: Modeling in Systems Biology: the Petri Net Approach. 2011, Springer, London

    Chapter  Google Scholar 

  14. Kanehisa M, Goto S, Furumichi M, Tanabe M, Hirakawa M: Kegg for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. 2010, 38 (Database): 355-360. 10.1093/nar/gkp896.

    Article  Google Scholar 

  15. Klamt S, Saez-Rodriguez J, Lindquist JA, Simeoni L, Gilles ED: A methodology for the structural and functional analysis of signaling and regulatory networks. BMC Bioinformatics. 2006, 7 (56):

  16. Terfve C, Cokelaer T, MacNamara A, Henriques D, Goncalves E, Morris MK, van Iersel M, Lauffenburger DA, Saez-Rodriguez J: Cellnoptr: a flexible toolkit to train protein signaling networks to data using multiple logic formalisms. BMC Systems Biology. 2012, 6 (133):

  17. Albert I, Thakar J, Li S, Zhang R, Albert R: Boolean network simulations for life scientist. Source Code Biol Med. 2008, 3 (16):

  18. Ratzer AV, Wells L, Laursen HM, Qvortrup JF, Stissing MS, Westergaard M, Christensen S, Jensen K: Cpn tools for editing, simulating, and analysing coloured petri nets. Proceedings of the 24th International Conference on Applications and Theory of Petri Nets. ICATPN'03. 2003, 450-462.

    Google Scholar 

  19. Mohsenizadeh DN: DMUN: Dynamical Modeling of Uncertain Networks (version 2.0) [software]. 2014, Available at: http://gsp.tamu.edu/Publications/supplementary/mohsenizadeh15a

    Google Scholar 

  20. Alon U: An Introduction to Systems Biology: Design Principles of Biological Circuits. 2006, Taylor & Francis, Boca Raton, FL

    Google Scholar 

  21. de Leon SB-T, Davidson EH: Modeling the dynamics of transcriptional gene regulatory networks for animal development. Dev Biol. 2009, 325 (2): 317-28. 10.1016/j.ydbio.2008.10.043.

    Article  Google Scholar 

  22. Cormen TH, Leiserson CE, Rivest RL, Stein C: Introduction to Algorithms. 2009, The MIT Press, Cambridge, MA, 3

    Google Scholar 

  23. Garg A, Cara AD, Xenarios I, Mendoza L, Micheli GD: Synchronous versus asynchronous modeling of gene regulatory networks. Bioinformatics. 2008, 24 (17): 1917-1925. 10.1093/bioinformatics/btn336.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Wynn ML, Consul N, Merajver SD, Schnell S: Logic-based models in systems biology: a predictive and parameter-free network analysis method. Integr Biol (Camb). 2012, 4 (11): 1323-37. 10.1039/c2ib20193c.

    Article  Google Scholar 

  25. Wei MC, et al: Proapoptotic bax and bak: a requisite gateway to mitochondrial dysfunction and death. Science. 2001, 292 (5517): 727-30. 10.1126/science.1059108.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Minet E, Cosse J-P, Demazy C, Raes M, Michiels C: Accumulation of the pro-apoptotic factor bak is controlled by antagonist factor mcl-1 availability. Apoptosis. 2006, 11 (6): 1039-47. 10.1007/s10495-006-6650-5.

    Article  CAS  PubMed  Google Scholar 

  27. Placzek WJ, Wei J, Kitada S, Zhai D, Reed JC, Pellecchia M: A survey of the anti-apoptotic bcl-2 subfamily expression in cancer types provides a platform to predict the efficacy of bcl-2 antagonists in cancer therapy. Cell Death Dis. 2010, 1 (5): 40-10.1038/cddis.2010.18.

    Article  Google Scholar 

  28. Qi M, Elion EA: Map kinase pathways. J Cell Sci. 2005, 118: 3569-72. 10.1242/jcs.02470.

    Article  CAS  PubMed  Google Scholar 

  29. Langlois WJ, Sasaoka T, Saltiel AR, Olefsky JM: Negative feedback regulation and desensitization of insulinand epidermal growth factor-stimulated p21ras activation. J Biol Chem. 1995, 270 (43): 25320-3. 10.1074/jbc.270.43.25320.

    Article  CAS  PubMed  Google Scholar 

  30. Le XF, Vadlamudi R, McWatters A, Bae DS, Mills GB, Kumar R, Bast RCJ: Differential signaling by an anti-p185(her2) antibody and heregulin. Cancer Research. 2000, 60 (13): 3522-31.

    CAS  PubMed  Google Scholar 

  31. Abel EV, et al: Melanoma adapts to RAF/MEK inhibitors through FOXD3-mediated upregulation of ERBB3. J Clin Invest. 2013, 123 (5): 2155-68. 10.1172/JCI65780.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Cook RS, et al: ErbB3 ablation impairs PI3K/Akt-dependent mammary tumorigenesis. Cancer Research. 2011, 71 (11): 3941-51. 10.1158/0008-5472.CAN-10-3775.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Amin DN, Sergina N, Ahuja D, McMahon M, Blair AJ, Wang D, Hann B, Koch KM, Shokat KM, Moasser MM: Resiliency and vulnerability in the HER2-HER3 tumorigenic driver. Sci Transl Med. 2010, 2 (16): 16-7.

    Article  Google Scholar 

  34. Dehghannasiri R, Yoon B, Dougherty ER: Optimal experimental design for gene regulatory networks in the presence of uncertainty. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2014

    Google Scholar 

Download references

Acknowledgements

This work and its publication were supported by the National Institutes of Health (NIH) grant number 5R25CA090301.

This article has been published as part of BMC Bioinformatics Volume 16 Supplement 13, 2015: Proceedings of the 12th Annual MCBIOS Conference. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/16/S13.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Edward R Dougherty.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

D.N.M., J.H. and E.R.D. conceived the method. D.N.M. developed the dynamical modeling methodology and programmed the software package. D.N.M., J.H., M.B. and E.R.D. selected the real data examples and provided insights on the interpretation of the results. All authors wrote the manuscript.

Electronic supplementary material

12859_2015_7154_MOESM1_ESM.pdf

Additional file 1: Dynamical Modeling Algorithm. This file contains the detailed algorithm proposed for dynamical modeling of uncertain interaction-based biological networks. (PDF 98 KB)

Rights and permissions

Open Access  This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver (https://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mohsenizadeh, D.N., Hua, J., Bittner, M. et al. Dynamical modeling of uncertain interaction-based genomic networks. BMC Bioinformatics 16 (Suppl 13), S3 (2015). https://doi.org/10.1186/1471-2105-16-S13-S3

Download citation

  • Published:

  • DOI: https://doi.org/10.1186/1471-2105-16-S13-S3

Keywords