A genetic algorithm-based boolean delay model of intracellular signal transduction in inflammation

Background Signal transduction is the major mechanism through which cells transmit external stimuli to evoke intracellular biochemical responses. Understanding relationship between external stimuli and corresponding cellular responses, as well as the subsequent effects on downstream genes, is a major challenge in systems biology. Thus, a systematic approach to integrate experimental data and qualitative knowledge to identify the physiological consequences of environmental stimuli is needed. Results In present study, we employed a genetic algorithm-based Boolean model to represent NF-κB signaling pathway. We were able to capture feedback and crosstalk characteristics to enhance our understanding on the acute and chronic inflammatory response. Key network components affecting the response dynamics were identified. Conclusions We designed an effective algorithm to elucidate the process of immune response using comprehensive knowledge about network structure and limited experimental data on dynamic responses. This approach can potentially be implemented for large-scale analysis on cellular processes and organism behaviors.


Background
Resolving the complex cellular signal transduction is a grand challenge in systems biology. Signal transduction involves cascade of protein-protein interaction and complex feedback loops [1] across proteomic and genomic levels. Models of the dynamics of the combined regulatory networks provide in-depth analysis temporal characteristics of targeted biological process. Furthermore, in silico knockout experiments by these models could help biologists to prioritize target genes of interest and reduce time and cost of real experiments.
Types of dynamic network models include kinetic models [2,3], hidden Markov models [4], and logicbased models [5,6]. Kinetic models based on differential equations have been used to elaborate dynamics on numerous systems [7]. However, they need detailed information about network structure, reaction mechanism and the respective kinetic parameters; which, unfortunately, are not easily obtainable. Hidden Markov model (HMM) is a statistical model in acyclic pathway [8]. The state is hidden, but the outcome dependent on the state is visible. Hence, HMMs are usually used to model known results with unknown process mechanism. Boolean network is a qualitative logic-based model that was introduced in the 1960s [9]. In the past few decades, scientists have frequently used Boolean network to model gene regulatory networks (GRN), apoptosis, metabolic network, immune response and signaling pathways [5,[10][11][12]. Since logic-based or qualitative knowledge of interaction is abundant, the network structure can be easily established. Moreover, only minimal information is required to describe the dynamics of Boolean transfer function, they can be obtained using limited experimental data. Therefore Boolean model is an effective and extendable way of modeling the dynamics of signal transduction.
The transcription factor NF-B controls various inflammation mediators to orchestra interwoven cellular responses to inflammatory stimuli such as TNF, IL-1 and TLR4 etc. In this study, Boolean model with timedelay was used to described the NF-B signaling. The objective is to integrate qualitative information on network interactions from published datasets and dynamic response data in literatures to reveal the regulatory mechanism of infection and inflammation.

Results and discussion
Model Figure 1 shows the workflow of building our Boolean model with time delay. First, we generated the Boolean transfer function of our model. Oda et al [16] provided a comprehensive intracellular molecular interaction map. The information was integrated with pathways from KEGG database to obtain a comprehensive network. We focus on the network between three receptors: Interleukin 1 (IL-1), Toll-like receptor 4 (TLR4) and Tumor necrosis factor (TNF) and observable outputs are IKK, IkBa, TNFa. External stimuli considered are TNF and LPS. The network contains a lot of sequential relations which can be compressed using the method shown in Figure 2. In the compression, an intermediate node with only one input and one output is removed; Any branching or meeting nodes are preserved. Figure 3 illustrates the simplified cascade and feedback of signals. There are three feedback routes involving TNF, A20, and IL1. A kernel pathway involving IKK, IkBa and TNFa can be identified through which all the signals have to go through.
Boolean transfer functions were used for each edge. Each transfer function has two dynamic parameters. The delayed activation θ denotes the duration that the input to a node must turned on before the reception node is turned on. The sustained response r is the time that the output of a node can be sustained once it is turned on. These parameters were obtained by fitting a training data set published in [2,17,18] using GA.
Even with a simplified network and limited number of dynamic parameters, convergence to a set of reasonable parameters was not easy. In order to improve the modeling process, we trained model parameters in kernel pathway first with parameters of the rest of the edges set equal to 1, using wild type data containing measurements of both IKK and NFkB measurements. When kernel's parameters were decided, the parameters of the remaining edges were determined by adding additional data involving A20 knockout, stimulus of various strengths and measurements of either IKK or NFkB only. Figure 4 shows the comparison between our model outcome and the experimental data in the learning sets. The upper boxes are each active pattern with specific treatment (the description was written on the graph) and compared with real data (the western blot data under corresponded box). The MSE value between our model and data is 0.0919.

Dynamics implications
The parameters obtained for the Boolean model were shown in Table 1. The delay of transcription factor NFkB induced components, such as A20, IkBa, IL1 and TNF are longer than other reactions in Table 1. It reveals transcription costs more time than phosphorylation. TNF related pathways will response immediately in our model. In contrast, LPS costs longer delay time to start immune response. The activation of IKK by LPS through TAB1/TAB2/TAK1 is much slower than activation through NIK by TNF.

Negative regulator
Because NFkB regulates pro-inflammatory cytokines, such as IL1, TNF, and negative regulator A20 [19], these effectors can generate positive and negative feedback control. In our model, IL1 and TNF were positive feedback. Excessive cytokine production is harmful to the host due to its effects on blood circulation system. Thus, "Endotoxin tolerance" is a critical negative feedback mechanism to protect host from endotoxic shock [20]. In our model, negative regulator A20 and IkBa could suppress NFkB's activity. Specifically, NFkB induced IkBa (nuclear factor of kappa light polypeptide gene enhancer in B-cells inhibitor, alpha) will supress NFkB activity in a self-regulatory cycle. As shown in Figure 5, inflammation response will be prolonged by IkBa deletion in TNF and LPS stimulation.
Zinc finger protein A20, also called TNFAIT3 (Tumor necrosis factor, alpha-induced protein 3), can also produce RIP-or TRAF2-mediated signal to indirectly block the NFkB activity. We have learned from the literatures that the negative regulator A20 blocked the NFkB activation while protecting the host cell from TNF-mediated apoptosis. To mimic an A20 knockout assay done in the wet-bench experiments, we set the output of the A20 nod to 0 and keep it in off state during simulation as the deletion. The resulting signaling pattern of wild type and A20 mutation in our model were shown in Figure 4. TNF induced IKK(left side of Figure 4.A) active from 5 min to 60 min in wild type and persist with A20 deletion. NF-B(right side of  Figure 4.B) and NFkB (right side of Figure 4.B), there is no difference between active patterns obtained with wild-type and A20 deletion. This is because in a LPS induced response, the secondary TNF response will be triggered after the transcription of NFkB. Hence A20 is apparently a key component in TNF-induced pathway but with no significant influence in LPS-induced pathway.  Figure 6 can help us understand acute and chronic inflammatory response. In acute infection, IKK will active for short span of time to initiate the immune response and return the system to steady state quickly. On the other hand, under chronic inflammatory response Figure 1 The workflow of our strategy. The flowchart of developing our model: First collect network structure information and experimental time profile data. Simplify the network, identify the kernel and create Boolean transfer functions for the simplified network. Experimental data are normalized, filter and converted to binary form. The parameters of the kernel pathways are trained first using genetic algorithm. Parameters of remainin pathway are then determined. Figure 2 The network simplification procedure. In A, the orange node, which has only one input and one output is removed. In B, C, The green nodes, which has more than one input or more than one output are preserved.

Clinical implication
(shown in Figure 6), IKK has an oscillatory profile that can generate greater cytokines production to protect the host cell. With our model, it is hence possible to separate the time course into two phases: the pro-inflammatory and anti-inflammatory phases. As mentioned before, NFkB is a key regulator for inflammation. It up-regulates pro-    inflammatory cytokines such as TNF and IL-1 as well as trigger negative regulator such as A20 and IkBa to suppress the IKK activities. Over a short span of 60 minutes, IKK's activity will decrease by the repression of IkBa, this process can thus be defined as the anti-inflammatory phase. Further, the time profile of TNF-induced IKK can cause secondary activation around 6 hour.

Conclusions
In this work, a dynamic Boolean model was generated by integrating and comprehensive qualitative knowledge about network structure and fitting a minimal amount of dynamic response data. The model is capable of capturing feedback and crosstalk dynamics between diverse signaling pathways. Using this model, mechanisms of and factors affecting periodic pro-inflammatory and anti-inflammatory responses can be elucidated. The proposed approach integrated intracellular and intercellular process. Hence it is possible for us to use this approach to develop system models for host defense against the shock from environmental or pathogen stimuli and predict the inflammatory response. Such a model will potentially be able to provide insight to a feedback treatment scheme for clinical therapy.   Logical gates to biological processes. Boolean network include combinations of logic operator (AND, OR, NOT) were developed from the knowledge of components directly upstream of each target node in the network. The logic gate also called transfer function which modified with information from the literature. We use OR operator when either of upstream nodes could activate the target component. AND operator is for synergy, which means two or more upstream nodes are necessary to activate the target component. In the other case, the NOT operator represents inhibition or competition.

Boolean transfer function
The Boolean model with time delay can be described mathematically by the graph G={V,E}, in which V = {x 1 ... x N } is a set of nodes and E={e ij } is a set of edges, with e ij equals 1 if there is an linking edge starting from the j th to the i th node and e ij equals 0 otherwise [21][22][23]. The transfer function that determine activation of the node x i at time t is given by: The effect of accumulated activation is given by In other words, the activation of the i th node by j th node is on only if the j th node has been on continuously for a period of θ ij . The effect of sustained and delayed response can be described by the following pseudocode: Figure 8 Experimental data processing normalization. The data processing by ImageJ software. The original data is from [2]. We subtracted background and separated profile bipartitely by Max entropy threshold approach. The binary data profile comes after data processing.

Figure 9
Flowchart of the Genetic algorithm workflow. The first population was generated randomly. The fitness criterion is defined as minimizing mean squared error (MSE) between model predictions and experimental results. Roulette wheel selection is used to select candidates for parents in crossover. A fixed mutation rate of 2% was used to prevent premature convergence. The Elitism algorithm was used to determine the survival of parents and children in the new generation.
τ ij = τ ij -1 end end Thus if r ij ≥ 1, the activation of x i by x j will be sustained; alternatively the activation of x i by x j will be delayed.
The function f 1 is a series of logical gates connecting input nodes to output nodes. The relations of these logical gates to biological processes are shown in figure 7.

Data processing
To generate on-off response of the observed nodes, we collected the Western blot experimental data in [2,17,18] and employed processing steps in ImageJ as shown in Figure 8. First the background is subtracted. Then the maximum entropy threshold approach was used to filter the data. Finally, a binary data profile is obtained.

Model fitting by genetic algorithm
We implemented genetic algorithm to optimize model by MATLAB. Figure 9 shows work flow of the genetic algorithm utilized in this study. We generated first population randomly. The fitness is defined as mean squared error (MSE) between model predictions and experimental results: The solution will be achieved by minimizing the fitness function through genetic operations. First the parameters in the Boolean transfer function θ ij and r ij of the active links are transformed into chromosomal representation. Roulette wheel selection is used to select candidates for parents in crossover. A fixed mutation rate of 2% was used to prevent premature convergence. The Elitism algorithm [24] was used to determine the survival of parents and children in the new generation. A population size of 1000 was used. The GA is terminated at 800 generations. Parameter settings of the GA algorithm were in Table 2.