Inferring robust gene networks from expression data by a sensitivity-based incremental evolution method
- Yu-Ting Hsiao^{1} and
- Wei-Po Lee^{1}Email author
https://doi.org/10.1186/1471-2105-13-S7-S8
© Hsiao and Lee; licensee BioMed Central Ltd. 2012
Published: 8 May 2012
Abstract
Background
Reconstructing gene regulatory networks (GRNs) from expression data is one of the most important challenges in systems biology research. Many computational models and methods have been proposed to automate the process of network reconstruction. Inferring robust networks with desired behaviours remains challenging, however. This problem is related to network dynamics but has yet to be investigated using network modeling.
Results
We propose an incremental evolution approach for inferring GRNs that takes network robustness into consideration and can deal with a large number of network parameters. Our approach includes a sensitivity analysis procedure to iteratively select the most influential network parameters, and it uses a swarm intelligence procedure to perform parameter optimization. We have conducted a series of experiments to evaluate the external behaviors and internal robustness of the networks inferred by the proposed approach. The results and analyses have verified the effectiveness of our approach.
Conclusions
Sensitivity analysis is crucial to identifying the most sensitive parameters that govern the network dynamics. It can further be used to derive constraints for network parameters in the network reconstruction process. The experimental results show that the proposed approach can successfully infer robust GRNs with desired system behaviors.
Background
Gene regulatory networks (GRNs) are essential for controlling cellular metabolism and the organismal development. Under the command of transcription factors (TFs), each gene influences the activity of the cell by generating messenger RNA (mRNA) that guides the synthesis of proteins by ribosomes in the cytoplasm (the location in the cell where biochemical reactions and molecular events take place). Gene network modeling uses gene expression data to characterize the phenotypic behavior of a system under study. With the reconstructed networks, biologists can generate and test hypotheses to further understand the complex phenomena that occur in nature systems and to explore the dynamics of those systems.
Here, x_{ i } is the expression level of gene i, and N is the number of genes in a genetic network. The non-negative parameters α_{ i } and β_{ i } are rate constants that indicate the direction of mass flow. The real-number exponents g_{ i,j } and h_{ i,j } are kinetic orders that reflect the strength of interactions from genes j to i. The above set of parameters defines an S-system model. To infer an S-system model is, thus, to estimate all of the 2N (N+1) parameters simultaneously.
Although non-linear ODEs can more accurately model the system dynamics of gene networks, they are difficult to solve by traditional optimization techniques [2, 3]. To reduce the complexity of such models, Di Bernardo et. al developed an approach that involves taking a series of steady-state gene expression measurements following transcriptional perturbations and reducing the linear ODEs to a linear regression problem that can be solved with relevant techniques [5]. In addition, some researchers employed the strategies of gene clustering for dimension reduction and gene classification for identification of expression patterns to reduce task complexity [6–10]. As evolutionary algorithms (EAs) have been widely used to solve many difficult optimization tasks with good results, they have been suggested as a way to infer gene networks [3, 4]. Among many studies on this topic, the most relevant are those involving the use of EAs to infer S-system models (e.g, [3, 4, 11]). In these studies, the network parameters of the S-system were arranged as a string of floating numbers (i.e., the chromosome in EAs) that could be evolved by genetic operators.
In recent studies of EA-based parameter inference, one critical problem, network robustness [12–14], has not been addressed. It is important to investigate the effect of network parameter perturbations on the overall system. These parameters not only are the numerical components of the model, but also represent the activities or interactions among proteins, transcription factors, and mRNAs in the gene network. Each network parameter has an important role in determining the behavior of a biological system. Parameters that are very sensitive to variation can introduce fragility into the system. In fact, recent research has found that parameters can define the dynamics of a model, and more important, studies have shown that enforcing constraints on the parameters can limit (control) model dynamics [15]. Specifically, by measuring and analyzing the variations of network parameters and their effects on relevant genes, the gene-gene interactions of an inferred model can be interpreted. To ensure the robustness of the inferred network and to further investigate gene interactions, it is important to derive an acceptable value range for each parameter and to restrict the parameter's value to the specified range during the network reconstruction process [13]. Therefore, we take parameter sensitivity into consideration in the network reconstruction procedure so that robust results can be obtained.
Here, P represents the parameters that are varied in a given range, M is the mathematical function describing the system behavior, and ∂M means the change in M due to the value changed in ∂P with respect to P.
In a dynamic system, there are two types of methods often used to conduct SA: local SA and global SA. Local SA can measure the sensitivity of a parameter from a given range even if the system structure is unknown, but it only considers one parameter at a time and ignores the interactions between parameters [20, 21]. Alternatively, global SA can examine parameter interactions with different parameter magnitudes simultaneously [22, 23], but this can only be done if the system structure is known in advance. The above two methods have been compared extensively [24, 25]. It should be noted that neither local SA nor global SA is superior: the choice of method depends on the specific application and the information available. In the biological cases that involve the use of SA techniques, the sensitivity of parameters should be investigated from a biological point of view in order to validate the results.
From the perspective of parameter dynamics, we propose a new approach to infer robust networks. Our approach includes two major parts. The first part is a sensitivity analysis procedure that selects sensitive network parameters, determines value ranges for them, and then sends the parameters with their constraints to the inference mechanism. The second part is an evolutionary method responsible for inferring networks from the results obtained in the first part. This approach, to our knowledge, is the first work in network modeling that integrates sensitivity analysis into an inference algorithm to consider both internal (robustness) and external (behavior) characteristics of the inferred network. To validate the proposed approach, a series of experiments has been conducted, in which the proposed SA procedure was coupled with different EAs. The results have been analyzed, and they show that this approach can be used to infer robust networks successfully from artificial and real gene expression profiles.
Methods
Parameter sensitivity analysis
In the process of inferring a gene network from expression data the genes interact with each other, and the network structure is generally unclear during the modeling process. Most studies of global SA use known pathways as examples and can therefore easily select the critical parameters and control the total number of parameters (usually few than 100) for further analysis [23–25]. In the type of network inference problem addressed here, it is not possible to identify and select the most important parameters in order to perform global SA, because no prior information about network structure is available.
In this expression, x_{ i }^{ d }(t) is the desired expression level of gene i at time t, x_{ i }^{ a }(t) is the actual value obtained from the inferred model, N is the number of genes in the network, and T is the number of time points used to measure gene expression data during the period.
Our modified method, m-MPSA, inherits qualities from MPSA but has several advantages. First, MPSA is easy to implement and modify. It allows users to select one parameter at a time or to consider multiple parameters at the same time to perform sensitivity analysis. This characteristic is crucial because, when the network structure is unknown or uncertain, selecting one parameter at a time is a good way to reduce the complexity of the search space. Researchers can then concentrate on each parameter and temporarily neglect the combination sets at the selection phase. Second, compared with most of the global SA methods, the mathematical equations of MPSA are more manifest and succinct, and the parameter sensitivity is thus easy to calculate. Finally, the value ranges of parameters used in MPSA are the same as in an ODE model. This means that the sensitivity of each parameter can be directly mapped into an ODE model without any extra computation.
In previous studies, researchers have shown that the most influential (sensitive) parameters play key roles in modeling a GRN; that is, a model's behavior varies largely because of its sensitive parameters [13]. Therefore, our current study draws on the characteristics of sensitive parameters, and it incorporates the sensitivity analysis method into the incremental evolution approach to infer a GRN. The proposed m-MPSA method is described in the Sensitivity analysis algorithm below. It uses an iterative process to calculate the sensitivity of each parameter, and it then ranks the sensitivities of all parameters. The input of this algorithm is the set of network parameters to be determined; and the output, a list of parameters ranked by their sensitivity values. With the parameter sensitivities ranked, the evolutionary algorithm can then infer robust solutions by exploiting the sensitive parameters that are likely to significantly influence the genetic model as a whole.
Sensitivity analysis algorithm()
Step 1: Select one parameter, i, at a time. (i = 1, 2, .., μ, μ is the number of network parameters).
Step 2: Set the parameter range R_{ i } for each S-system parameter, i. Initially, the commonly used search regions [-3.0..3.0] and [0.0..10.0] are taken as the initial settings for the parameters of kinetic orders and rate constants respectively, as suggested in [26]. These settings are used to define the parameter range R_{ i }, by adding/subtracting an amount of one third of the initial settings to/from the current value of parameter i.
Step 3: For each R_{ i }, a set of random values are created with a uniform distribution within the specified range. Each random value for parameter i, together with the other (μ-1) parameters, constitutes a candidate solution (In these experiments, 500 random values were generated, as suggested in [19].
Step 4: Construct the mathematical model (i.e., S-system model) and calculate the objective function value for each of the random values generated in Step 3.
Step 5: Determine whether the objective function value of each random point obtained in Step 4 is acceptable or unacceptable by a given threshold C_{ r } (guided by the literature or experience; it is defined as the triple of the best available objective function value in the whole population). The parameter value of a random point is classified as unacceptable if its corresponding objective function value is greater than C_{ r }; otherwise, the parameter value is acceptable. Go back to Step 1 until all parameters have been dealt with.
Step 6: Calculate the sensitivity for parameter i by using its cumulative frequency (CF_{ i }). CF_{ i } measures the similarity of two statistical distributions that are formed by the acceptable and unacceptable values produced from Steps 3-5 for parameter i. The parameters are ranked according to their CF, and the ones with relatively low CF are considered to be sensitive.
Step 7: Output a parameter list in which the parameters are sorted by their CF values.
A similar procedure was performed for the other parameter, P_{145}, and a sensitivity value of 0.7695 was obtained. Because P_{145} had a sensitivity value lower than that of P_{125}, it was considered to be more sensitive than P_{125}. In this way, after sensitivity values of all parameters were calculated, they were ranked and sent to the incremental evolution algorithm.
As mentioned above, more sensitive parameters are more likely to have a significant influence on the entire genetic model. Therefore, if no prior knowledge can be applied to the reconstruction of a large GRN, researchers can conduct sensitivity analysis for parameter selection to guide the network inference method, and observe how numerical quantities within different intervals can formulate candidate results (to determine the model dynamics). Taking P_{145} as an example, the acceptable values for this parameter fell in the range [-0.63, 0.17] that consisted of 167 acceptable network models (dynamics) in total. We can exploit this result to infer P_{145} and explore the relationships of this parameter with other genetic reactions (or parameters). In fact, researchers have shown that the most influential (sensitive) parameters play important roles in modeling a GRN. That is, the behavior variation of a network model largely depends on its sensitive parameters [13].
In Figure 2, we can see that the variation of the parameter value range influenced the system dynamics. That is, the system robustness was significantly affected by the value range specified for a parameter. For instance, the two distributions for P_{145} in Figure 2 (c) show that, if the value range of P_{145} changed from [-0.43, -0.23] to [0.37, 0.57], then the acceptable values for P_{145} changed from 56 to 0. This means that the inference algorithm can maintain the system robustness if it can locate P_{145} within the interval [-0.43, -0.23]. The above analysis concludes that the proposed SA method can be used to find sensitive parameters and to derive suitable value ranges as search constraints, which can then be used in the inference algorithm to construct robust networks.
Evolutionary algorithm for parameter optimization
To derive the network parameters of S-system model, we also implement an evolutionary algorithm (EA) for parameter optimization to work with the above SA method. EA is a population-based approach that evaluates many solutions simultaneously in the search space, and it is likely to find a global solution for a given problem. Recently, a new population-based optimization technique, particle swarm optimization (PSO, [27]), was proposed as an alternative to the traditional EAs. PSO tries to mimic the goal-seeking behavior of biological swarms. The standard PSO algorithm contains a set of particles and operates in an iterative manner. Each particle is characterized by its position and velocity, and it moves in the solution space. The position of each particle represents a potential solution that is evaluated by a predefined fitness function. PSO has some attractive characteristics. In particular, it has memory, so that knowledge of a good solution can be retained by all particles (solutions). During the iterative search process, each particle remembers its previous best position and the best position of any particle in the swarm. Then the particle uses the position information to modify its position and velocity, and continues its movement in the search space. The details of the PSO algorithm can be found in [27]. Some performance comparisons between PSO and the most popular EA method, the genetic algorithm (GA), have been made, underscoring the reliability and the convergence speed of both methods. However, the result tends to be inconclusive: each of the algorithms has shown better performance than the other for some particular applications. Consequently, hybrid techniques were proposed to effectively exploit the qualities and the uniqueness of the two methods. It is now commonly agreed that the hybrid methods can lead to further performance improvements [28–30]. Therefore, in this work, we develop a hybrid method to exploit the solution memory of PSO and the genetic operations of GA for network inference.
As has been mentioned, our primary goal is to investigate how parameter sensitivity can be used to guide an EA and derive robust networks. To concentrate on this issue, we have chosen to implement a popular GA-PSO method (to avoid the deviations caused by different computational methods) for parameter optimization, and we have built our SA method upon it. Breeding Swarms [30] is a frequently used GA-PSO hybrid method that has demonstrated its good performance in many benchmark test functions. Therefore, we adopt this method for our network inference application.
In our implementation, we take a direct encoding scheme to represent solutions for both the GA and PSO parts, in which the network parameters (i.e., α_{ i }, β_{ i, } g_{ i,j }, and h_{ i,j } in the S-system model) are arranged as a linear string chromosome of floating-point numbers. The goal here is to minimize the accumulated discrepancy between the gene expression data recorded in the dataset and the values produced by the inferred model. Therefore, the error (objective or fitness) function defined previously is used directly for performance measurement. For the PSO part, the equations for updating the particle's velocity and position are the same as the ones listed in the original PSO work [27], and for the GA part, the operations of crossover and mutation described in [31] are used.
Sensitivity-based incremental evolution method for network inference
As described in the first section, to infer the S-system model is to determine the 2N (N+1) parameters simultaneously. Solving this high-dimensional problem is difficult, especially when the complexity of regulation increases along with the number of genes involved. One promising approach to this problem is to adopt the concept of incremental evolution to infer the large number of parameters. The underlying principle of incremental evolution [32, 33] is that a population is first evolved to solve an easier version T' of the original complex task T, in which the solution region of T is more accessible from region T'. More task versions with incremental complexity can be arranged so that the original task can be achieved progressively. Evidently, the main task involved in implementing incremental evolution is the formulation of a scheme to transfer the goal task into another more evolvable task. In the process of task transformation, the structure of the environment and the overall goal must be preserved. This can usually be achieved by arranging the task sequence manually, or, alternatively, it can be done using an automated procedure. In this work we modify the cutting plane mechanism used in the high-dimension function optimization problem to develop an adaptive strategy to automatically perform incremental evolution.
In network inference, solving an easier version of the original task means evolving partial solutions (i.e., subsets of all network parameters) that can provide some guidance for the search and move toward the target solution gradually. In the proposed method, parameter sensitivities are calculated and used to determine the priorities of the parameters to be evolved (optimized). The sensitive parameters are selected iteratively, by the proposed SA method and the evolutionary method is employed to search the parameter dimensions accordingly.
Sensitivity-based incremental evolution algorithm()
Step 1: Initialize the population and start the evolution. Note that Steps 2 and 3 are performed only at certain generations (SA interval and exploration interval, respectively).
- (a)
Use a threshold (i.e., the correlation coefficient ratio of the parameter CF values obtained from the SA procedure) to select the most sensitive parameters. Then update the threshold in order to consider more parameters in the next sensitivity analysis iteration.
- (b)
Set new value ranges for all network parameters. Sensitive parameters are given tight bounds, and insensitive parameters are given loose bounds.
Step 3: Start the exploration procedure, again only if the evolutionary cycle reaches a pre-defined generation number (every 1000 generations in this work). In this phase, a random value is generated for each parameter from its new bounds, replacing each parameter's current value.
Step 4: Continue the evolutionary computation, using the bounds obtained in Step 2 to restrict the corresponding parameters.
In the above algorithm, the most critical feature for enabling incremental evolution is found in Step 2. The strategy is to perform the sensitivity analysis for network parameters and to specify constraints on them. A network parameter (in other word, a search dimension) is added to the sensitive list and given a pair of tight constraints (specified by the upper and lower bounds) if its sensitivity value is less than (or equal to) the threshold; otherwise, a parameter is added to the insensitive list and given loose constraints. In the above procedure, the threshold used in Step 2(a) to identify the sensitive parameters has an initial value of 0.84 (chosen from the preliminary study), and this value is increased gradually with a step value of 0.1. The upper and lower bounds of a sensitive parameter in Step 2(b) are determined from the best value of this parameter available so far by adding and subtracting an amount equal to twice the value of a small constant (which is equal to the velocity bound generally used in a PSO-based method – 0.2 in this work). Bounds for the insensitive parameters are defined in a similar way, except that a value of five times the same constant is used.
The dimensions in the sensitive list have higher priorities to be searched, and the corresponding parameter values need to be determined at an earlier evolutionary stage. As the algorithm describes, the search is accomplished by setting small-range bounds for each sensitive parameter in accordance with the best value found so far, and the evolutionary procedure is performed next to infer a suitable parameter value within the specified bounds. The creation of constraints for parameters is important in the search for robust solutions, especially when a non-deterministic search method (such as the evolutionary algorithm) is employed to derive parameter solutions (because many feasible but fragile solutions could be evolved). If the parameter value exceeds the specified boundary value, the system dynamics will be influenced, and consequently, the network behaviors will change. To generate tight constraints for sensitive parameters in the search process is in fact to restrict search regions for the integrated algorithm, which can thus obtain robust networks with desired behaviors. As indicated in [34], by observing how parameters with varied boundaries direct a system to move toward different system dynamics, we can obtain new insights into a complex biological system and understand the principle of biological design.
As the evolutionary method proceeds, parameters in the insensitive list will be moved to the sensitive list incrementally as the threshold is gradually increased. Therefore, more parameters will be considered sensitive and their values will be determined. Eventually, all the parameter values will be obtained. Because sensitive parameters are more influential to network behavior, enforcing tight constraints on these parameters maintains the robustness of the entire genetic network. At the same time, it allows other parameters to evolve during the search process. In this way, the proposed method can infer robust network models with the desired system behavior.
Results and discussion
To verify the proposed approach, we conducted a series of experiments on several datasets collected from the literature. As mentioned, our main goal was to explore the use of sensitivity analysis to infer robust networks, not to compare different computational methods. Therefore, without losing generality, we chose to implement three computational methods, including the traditional PSO method, a GA-PSO hybrid method, and a differential evolution (DE) method [35], and to build the proposed m-MPSA on them. The main reason to select the above three methods is that they are the most representative and popular evolutionary algorithms used for optimization tasks. The same SA strategy can also be embedded into other methods if their implementation details are available.
The first and second phases of the experiments involved comparing the external behavior and internal robustness of the networks inferred by different methods. Four datasets were used. For each dataset, the above three algorithms were performed with three different settings: (1) using only the original algorithms; (2) using our m-MPSA to select sensitive parameters and to derive parameter constraints in the incremental evolution process (i.e., the proposed approach); and (3) using m-MPSA to select sensitive parameters, but with random value bounds during the incremental evolution process. The third setting was mainly included to demonstrate the importance of using appropriate bounds in the evolutionary process. In addition, one real dataset was used in the third phase, which made it possible to investigate how the proposed approach can be applied to the study of real gene networks. This dataset came from a study of gene expression in the SOS DNA repair system of E. coli. For this dataset, network models were inferred and the critical parameters were analyzed and discussed.
Performance of sensitivity analysis in network modeling
Fitness values for dataset 1
PSO | GA-PSO | DE | |||||||
---|---|---|---|---|---|---|---|---|---|
Setting | 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 |
Avg | 0.2845 | 0.0839 | 0.2502 | 0.0170 | 0.0054 | 0.0144 | 0.3150 | 0.1440 | 0.2856 |
Best | 0.1801 | 0.0451 | 0.0997 | 0.0078 | 0.0020 | 0.0079 | 0.2485 | 0.1079 | 0.2452 |
Worst | 0.4497 | 0.1442 | 0.3880 | 0.0236 | 0.0088 | 0.0215 | 0.3733 | 0.1820 | 0.3723 |
SD | 0.1059 | 0.0327 | 0.0819 | 0.0048 | 0.0023 | 0.0053 | 0.0452 | 0.0237 | 0.0420 |
Fitness values for dataset 2
PSO | GA-PSO | DE | |||||||
---|---|---|---|---|---|---|---|---|---|
Setting | 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 |
Avg | 0.5718 | 0.3647 | 1.5170 | 0.0589 | 0.0192 | 0.0769 | 2.7558 | 0.6857 | 1.8766 |
Best | 0.3172 | 0.1315 | 1.1044 | 0.0310 | 0.0098 | 0.0530 | 2.2035 | 0.5169 | 0.9808 |
Worst | 0.8492 | 0.4928 | 1.9613 | 0.1034 | 0.0314 | 0.1040 | 3.0571 | 0.9289 | 2.6480 |
SD | 0.1729 | 0.1286 | 0.2743 | 0.0229 | 0.0062 | 0.0174 | 0.3933 | 0.1159 | 0.7201 |
Fitness values for dataset 3
PSO | GA-PSO | DE | |||||||
---|---|---|---|---|---|---|---|---|---|
Setting | 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 |
Avg | 1.8992 | 1.0233 | 1.9141 | 0.3586 | 0.1404 | 0.2973 | 3.6997 | 1.8465 | 3.6311 |
Best | 0.8746 | 0.7241 | 1.3902 | 0.2325 | 0.0799 | 0.2162 | 2.4568 | 1.6471 | 2.2708 |
Worst | 3.7303 | 1.3419 | 2.2926 | 0.5930 | 0.1941 | 0.3671 | 4.7652 | 2.0209 | 4.3125 |
SD | 1.0149 | 0.1925 | 0.2829 | 0.1035 | 0.0378 | 0.0465 | 0.7628 | 0.1328 | 0.6616 |
Fitness values for dataset 4
PSO | GA-PSO | DE | |||||||
---|---|---|---|---|---|---|---|---|---|
Setting | 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 |
Avg | 0.6657 | 0.4732 | 1.1542 | 0.1661 | 0.1128 | 0.2280 | 1.7835 | 0.7052 | 1.8918 |
Best | 0.4486 | 0.3224 | 0.8873 | 0.1373 | 0.0691 | 0.1320 | 1.1684 | 0.5572 | 1.3348 |
Worst | 0.7960 | 0.5952 | 1.3679 | 0.2020 | 0.1376 | 0.3642 | 2.2143 | 0.8420 | 2.5387 |
SD | 0.1341 | 0.0915 | 0.1487 | 0.0225 | 0.0199 | 0.0813 | 0.3024 | 0.1073 | 0.3962 |
As shown in Tables 1, 2, 3, 4, SA (setting 2) consistently outperformed the other two settings for inferring gene networks. It produced the best results for the average, best, and worst fitness values when used in conjunction with any of the three inference methods. Compared with other methods, the GA-PSO method produces smaller standard deviations, which indicates that it is more stable than the other methods. It should also be noted that the performance when using a m-MPSA method with random bounds (setting 3) was not as good as when using m-MPSA to select parameters and derive bounds for them. These results confirm the effectiveness of the proposed bound-setting strategy.
Evaluation of network robustness
Sensitivity values by PSO
Setting 1 | Setting 2 | Setting 3 | ||||
---|---|---|---|---|---|---|
Fitness | Sensitivity | Fitness | Sensitivity | Fitness | Sensitivity | |
Dataset 1 | 0.2845 | 0.7622 | 0.0839 | 0.7203 | 0.2502 | 0.7554 |
Dataset 2 | 0.5718 | 0.7334 | 0.3647 | 0.7002 | 1.5170 | 0.7309 |
Dataset 3 | 1.8992 | 0.6865 | 1.0233 | 0.6463 | 1.9141 | 0.6620 |
Dataset 4 | 0.6657 | 0.7330 | 0.4732 | 0.7072 | 1.1542 | 0.7205 |
Sensitivity values by GA-PSO
Setting 1 | Setting 2 | Setting 3 | ||||
---|---|---|---|---|---|---|
Fitness | Sensitivity | Fitness | Sensitivity | Fitness | Sensitivity | |
Dataset 1 | 0.0117 | 0.7885 | 0.0054 | 0.7363 | 0.0144 | 0.7678 |
Dataset 2 | 0.0598 | 0.8249 | 0.0192 | 0.7581 | 0.0769 | 0.7867 |
Dataset 3 | 0.3586 | 0.8432 | 0.1404 | 0.7502 | 0.2973 | 0.7856 |
Dataset 4 | 0.1661 | 0.8084 | 0.1128 | 0.7691 | 0.2280 | 0.7802 |
Sensitivity values by DE
Setting 1 | Setting 2 | Setting 3 | ||||
---|---|---|---|---|---|---|
Fitness | Sensitivity | Fitness | Sensitivity | fitness | Sensitivity | |
Dataset 1 | 0.3150 | 0.7901 | 0.1440 | 0.7438 | 0.2856 | 0.7791 |
Dataset 2 | 2.7558 | 0.7722 | 0.6857 | 0.7149 | 1.8766 | 0.7654 |
Dataset 3 | 3.6997 | 0.7709 | 1.8465 | 0.7027 | 3.6311 | 0.7509 |
Dataset 4 | 1.7835 | 0.7701 | 0.7052 | 0.7273 | 1.8918 | 0.7609 |
Following the comparisons of the inferred networks, we used an example to further illustrate the effectiveness and importance of using sensitivity analysis to derive appropriate constraints for network parameters in the inference process. Here, we chose the third dataset (which was the largest network considered in the experiments) and investigated the corresponding parameter correlations. In this example of eight-gene network, there were 144 parameters (P_{1}~P_{144}) to be determined, of which the three most sensitive parameters, P_{14}, P_{64}, and P_{72} (identified as sensitive in at least fifteen out of twenty runs), were analyzed. Because the GA-PSO algorithm can give the best performance (as shown in Tables 5, 6, 7), data were collected from the runs using this method without (setting 1) and with (setting 2) the use of the SA method to derive value ranges for network parameters.
Parameter patterns
Setting 1 | run-id |
---|---|
pattern-ns1: P_{64} >P_{72} >P_{14} | 9, 13, 14, 15, 16, 19 |
pattern-ns2: P_{64} >P_{14} >P_{72} | 1, 4, 7, 8, 12 |
Setting 2 | run-id |
pattern-s1: P_{64} >P_{14} >P_{72} | 1, 2, 3, 5, 6, 9, 11, 12, 14, 15, 20 |
pattern-s2: P_{14} >P_{64} >P_{72} | 4, 8, 10, 13, 18 |
Evaluation of the proposed approach on a real dataset
After evaluating the performance of the proposed SA-based approach in network inference, we conducted a set of experiments to investigate how our approach can be applied to the study of a real gene network. Because the GA-PSO algorithm has been shown to outperform other methods in the above experiments, we used this method with two settings (with and without SA) to conduct experiments on a real dataset.
Inference of expression profiles with the decomposed S-system model
In the experiments, six genes were selected from the original experimental data reported in [39]. The genes selected were uvrD, umuD, lexA, uvrA, recA, and polB. These were selected because the interactions of the 6-gene network have been well studied and commonly used in related studies, and the corresponding network has been inferred successfully [11, 40–43]. Using the regulatory relationships that have been described, we can validate our proposed network inference method by comparing our results to those reported previously.
Though there have been several studies on inferring the SOS repair system, none of them used a tightly coupled S-system model (the general S-system) to represent the gene network, and none of them inferred such a model from expression data. To the best of our knowledge, the most relevant work is [44], in which the authors utilized a decoupled S-system model to infer the SOS repair system. In a decoupled S-system, a tightly coupled system of non-linear differential equations is decomposed and analyzed as several differential equations [44, 45], each of which can describe a specific gene and can then be separately inferred (by considering one gene at a time). Motivated by this research, we thus adopted decoupled differentials to describe gene profiles, and we employed the proposed approach to infer gene interactions. In this way, we can not only examine the inferred network behaviors, but also compare the inferred gene regulations to the ones obtained in the other studies mentioned above.
Results by GA-PSO
lexA | uvrA | uvrD | ||||
---|---|---|---|---|---|---|
Setting | 1 | 2 | 1 | 2 | 1 | 2 |
Avg | 1.5778 | 0.5360 | 1.8404 | 0.7447 | 4.0382 | 1.3453 |
Best | 0.6302 | 0.4084 | 1.1121 | 0.5333 | 1.6084 | 1.2191 |
Worst | 1.9917 | 0.9554 | 2.1375 | 1.2190 | 7.4100 | 1.5640 |
SD | 0.4145 | 0.1974 | 0.3572 | 0.2084 | 2.1226 | 0.1130 |
Sensitivity | 0.8355 | 0.7977 | 0.8417 | 0.7838 | 0.7890 | 0.7890 |
recA | umuD | polB | ||||
Setting | 1 | 2 | 1 | 2 | 1 | 2 |
Avg | 3.0527 | 1.8589 | 6.0470 | 4.1333 | 21.1221 | 14.6093 |
Best | 2.4037 | 0.9570 | 3.9336 | 3.9694 | 16.4665 | 10.7421 |
Worst | 3.8859 | 2.2015 | 7.4803 | 4.5852 | 25.3341 | 16.6921 |
SD | 0.5531 | 0.4623 | 1.0923 | 0.1756 | 2.7867 | 2.0940 |
Sensitivity | 0.8669 | 0.7943 | 0.8315 | 0.7841 | 0.8507 | 0.7787 |
Analysis of critical parameters of the SOS repair system
The most sensitive parameters
Gene-id (i) | Gene name | Num of related gene | g_{ i }_{,1}/h_{ i }_{,1} | g_{ i }_{,2}/h_{ i }_{,2} | g_{ i }_{,3}/h_{ i }_{,3} | g_{ i }_{,4}/h_{ i }_{,4} | g_{ i }_{,5}/h_{ i }_{,5} | g_{ i }_{,6}/h_{ i }_{,6} | g_{ i }_{,7}/h_{ i }_{,7} | g _{ i } _{,8} /h _{ i } _{,8} |
---|---|---|---|---|---|---|---|---|---|---|
lexA | uvrA | uvrD | recA | uvrY | ruvA | umuD | polB | |||
1 | lexA | 2 | H* | H* | ||||||
2 | uvrA | 2 | H* | H* | ||||||
3 | uvrD | 4 | H* | H* | G | H | ||||
4 | recA | 1 | G* | |||||||
5 | umuD | 3 | G* | G | H* | |||||
6 | polB | 1 | G* |
Critical gene regulatory relationships
After identifying the most sensitive parameters of the SOS repair system, we again analyzed the gene regulation relationships obtained from the runs to observe the correlation of parameters. The importance of investigating the relationships among parameter values of genes has been emphasized in the study of systems biology, especially when the boundary of a parameter value is taken into consideration [15, 34]. As presented and discussed in the second experimental phase, the qualitative relationships among genes can be regarded as special patterns (describing parameter correlations) that determined the system dynamics of a candidate model. By exploiting these patterns, the inference algorithm can refine the values of kinetic orders to obtain better solutions.
Gene regulations for lexA
Parameter patterns for gene lexA | |
---|---|
Setting 1 | run-id |
pattern-ns1: uvrA -| lexA > lexA -| lexA | 1, 3, 8, 9, 10, 12, 13, 14, 15, 17, 18 |
pattern-ns2: lexA -| lexA > uvrA -| lexA | 2, 4, 5, 6, 7, 11, 16, 19, 20 |
Setting 2 | run-id |
pattern-s1: uvrA -| lexA > lexA -| lexA | 1, 3, 4, 5, 6, 8, 9, 12, 13, 14, 15, 16, 17, 18, 19 |
Gene regulations for uvrA
Parameter patterns for gene uvrA | |
---|---|
Setting 1 | run-id |
pattern-ns1: recA→uvrA > lexA→uvrA | all runs except for 10 and 17 |
Setting 2 | run-id |
pattern-s1: recA→uvrA > lexA→uvrA | all twenty runs |
Gene regulations for uvrD
Parameter patterns for gene uvrD | |
---|---|
Setting 1 | run-id |
pattern-ns1: recA -| uvrD > lexA -| uvrD > uvrD-| uvrD > uvrA -| uvrD | 3, 4, 11, 13, 14, 16 |
Setting 2 | run-id |
pattern-s1: recA -| uvrD > lexA -| uvrD > uvrD-| uvrD > uvrA -| uvrD | 2, 3, 5, 10, 11, 14, 15, 16, 17, 19 |
pattern-s2: lexA -| uvrD > recA -| uvrD > uvrD-| uvrD > uvrA -| uvrD | 4, 7, 12, 13, 18, 20 |
Gene regulations for umuD
Parameter patterns for gene umuD | |
---|---|
Setting 1 | run-id |
pattern-ns1: recA -| umuD > uvrD -| umuD > uvrA -| umuD | 2, 6, 7, 14, 16, 19 |
pattern-ns2: recA -| umuD > uvrA -| umuD > uvrD -| umuD | 1, 11, 17, 18, 20 |
Setting 2 | run-id |
pattern-s1: recA -| umuD > uvrA -| umuD > uvrD -| umuD | 3, 5, 6, 8, 9, 12, 13, 14, 15, 18 |
pattern-s2: recA -| umuD > uvrD -| umuD > uvrA -| umuD | 1, 2, 10, 11, 16, 20 |
Conclusion
In this work, we developed a sensitivity-based incremental evolution approach to cope with one important issue, network robustness, which has not been addressed in gene network inference. Our approach included two parts. The first part was a sensitivity analysis method that was used to select sensitive parameters for deriving value bounds of these parameters. The second part was an evolutionary algorithm that took the bounds as constraints to perform parameter optimization. This process leads to inferred networks that are robust and produce the desired behaviors. To validate the proposed approach, a series of experiments were conducted to evaluate the external behaviors and internal robustness of the networks inferred by different methods. The results show that the proposed SA-based approach outperformed other methods in all datasets. Moreover, we analyzed in detail the results obtained from real time-series expression data for the SOS repair system. The analyses indicate that our approach can identify the critical parameters and use them to establish regulatory relationships among genes. By enforcing these relationships in the repetitive search process, our approach can successfully infer robust networks.
Declarations
Acknowledgements
This work has been supported by National Science Council of Taiwan (Grant Nos. NSC 99-2221-E-110-077 and NSC 100-2221-E-110-086).
This article has been published as part of BMC Bioinformatics Volume 13 Supplement 7, 2012: Advanced intelligent computing theories and their applications in bioinformatics. Proceedings of the 2011 International Conference on Intelligent Computing (ICIC 2011). The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/13/S7.
Authors’ Affiliations
References
- Ingolia NT, Weissman JS: Systems biology: reverse engineering the cell. Nature 2008, 454: 1059–1062. 10.1038/4541059aView ArticlePubMedGoogle Scholar
- Lee WP, Tzou WS: Computational methods for discovering gene networks from expression data. Brief Bioinform 2009, 10: 408–423.PubMedGoogle Scholar
- Kikuchi S, Tominaga D, Arita M, Takahashi K, Tomita M: Dynamic modeling of genetic networks using genetic algorithm and S-system. Bioinformatics 2003, 19: 643–650. 10.1093/bioinformatics/btg027View ArticlePubMedGoogle Scholar
- Ho SY, Hsieh CH, Yu FC, Huang HL: An intelligent two-stage evolutionary algorithm for dynamic pathway identification from gene expression profiles. IEEE/ACM Trans Comput Biol Bioinform 2007, 4: 648–704.View ArticlePubMedGoogle Scholar
- Di Bernardo D, Gardner TS, Collins JJ: Robust identification of large genetic networks. Pac Symp Biocomput 2004, 486–497.Google Scholar
- Lee WP, Yang KC: A clustering-based approach for inferring recurrent neural networks as gene regulatory networks. Neurocomputing 2008, 71: 600–610. 10.1016/j.neucom.2007.07.023View ArticleGoogle Scholar
- Datta S, Datta S: Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes. BMC Bioinformatics 2006, 7: 397. 10.1186/1471-2105-7-397PubMed CentralView ArticlePubMedGoogle Scholar
- Zheng CH, Zhang L, Ng TY, Shiu CK, Huang DS: Metasample-based sparse representation for tumor classification. IEEE/ACM Trans Comput Biol Bioinform 2011, 8: 1273–1282.View ArticlePubMedGoogle Scholar
- Huang DS, Zheng CH: Independent component analysis-based penalized discriminant method for tumor classification using gene expression data. Bioinformatics 2006, 22: 1855–1862. 10.1093/bioinformatics/btl190View ArticlePubMedGoogle Scholar
- Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 1999, 96: 6745–6750. 10.1073/pnas.96.12.6745PubMed CentralView ArticlePubMedGoogle Scholar
- Kabir M, Noman N, Iba H: Reversely engineering gene regulatory network from microarray data using linear time-variant model. BMC Bioinformatics 2010, 11: S56. 10.1186/1471-2105-11-S1-S56PubMed CentralView ArticlePubMedGoogle Scholar
- van Riel NAW: Dynamic modeling and analysis of biochemical networks: mechanism-based models and model-based experiments. Brief Bioinform 2006, 7: 364–374. 10.1093/bib/bbl040View ArticlePubMedGoogle Scholar
- Fomekong-Nanfack Y, Postma M, Kaandorp J: Inferring Drosophila gap gene regulatory network: a parameter sensitivity and perturbation analysis. BMC Syst Biol 2009, 3: 94. 10.1186/1752-0509-3-94PubMed CentralView ArticlePubMedGoogle Scholar
- MacNeil L, Walhout AJM: Gene regulatory networks and the role of robustness and stochasticity in the control of gene expression. Genome Res 2011, 21: 645–657. 10.1101/gr.097378.109PubMed CentralView ArticlePubMedGoogle Scholar
- Gunawardena J: Models in systems biology: the parameter problem and the meanings of robustness. In Elements of Computational Systems Biology. Edited by: Lodhi HM, Muggleton SH. Hoboken, New Jersey: Wiley; 2010:21–47.Google Scholar
- Feng X, Hooshangi S, Chen D, Li G, Weiss R, Rabitz H: Optimizing genetic circuits by global sensitivity analysis. Biophys J 2004, 87: 2195–2202. 10.1529/biophysj.104.044131PubMed CentralView ArticlePubMedGoogle Scholar
- Leloup J, Goldbeter A: Modeling the mammalian circadian clock: sensitivity analysis and multiplicity of oscillatory mechanisms. J Theor Biol 2004, 230: 541–562. 10.1016/j.jtbi.2004.04.040View ArticlePubMedGoogle Scholar
- Mahdavi A, Davey R, Bhola P, Yin T, Zandstra PW: Sensitivity analysis of intracellular signaling pathway kinetics predicts targets for stem cell fate control. PLoS Comput Biol 2007, 3: e130. 10.1371/journal.pcbi.0030130PubMed CentralView ArticlePubMedGoogle Scholar
- Cho K, Shin S, Kolch W, Wolkenhauer O: Experimental design in systems biology, based on parameter sensitivity analysis using a Monte Carlo method: a case study for the TNFα-mediated NF- k B signal transduction pathway. Simulation 2003, 79: 726–729. 10.1177/0037549703040943View ArticleGoogle Scholar
- Degenring D, Froemel C, Dikta G, Takors R: Sensitivity analysis for the reduction of complex metabolism models. J Process Control 2004, 14: 729–745. 10.1016/j.jprocont.2003.12.008View ArticleGoogle Scholar
- Radhakrishnan K, Edwards JS, Lidke DS, Jovin TM, Wilson BS, Oliver JM: Sensitivity analysis predicts that the ERK-pMEK interaction regulates ERK nuclear translocation. IET Syst Biol 2009, 3: 329–341. 10.1049/iet-syb.2009.0010PubMed CentralView ArticlePubMedGoogle Scholar
- Bentele M, Lavrik I, Ulrich M, Stößer S, Heermann DW, Kalthoff H, Krammer PH, Eils R: Mathematical modeling reveals threshold mechanism in CD95-induced apoptosis. J Cell Biol 2004, 166: 839. 10.1083/jcb.200404158PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang HX, Dempsey WP, Goutsias J: Probabilistic sensitivity analysis of biochemical reaction systems. J Chem Phys 2009, 131: 094101. 10.1063/1.3205092View ArticlePubMedGoogle Scholar
- Marino S, Hogue IB, Ray CJ, Kirschner DE: A methodology for performing global uncertainty and sensitivity analysis in systems biology. J Theor Biol 2008, 254: 178–196. 10.1016/j.jtbi.2008.04.011PubMed CentralView ArticlePubMedGoogle Scholar
- Zheng Y, Rundell A: Comparative study of parameter sensitivity analyses of the TCR-activated Erk-MAPK signalling pathway. Syst Biol (Stevenage) 2006, 153: 201–211. 10.1049/ip-syb:20050088View ArticleGoogle Scholar
- Kimura S, Ide K, Kashihara A, Kano M, Hatakeyama M, Masui R, Nakagawa N, Yokoyama S, Kuramitsu S, Konagaya A: Inference of S-system models of genetic networks using a cooperative coevolutionary algorithm. Bioinformatics 2005, 21: 1154–1163. 10.1093/bioinformatics/bti071View ArticlePubMedGoogle Scholar
- Kennedy J, Eberhart R: Swarm Intelligence. San Francisco: Morgan Kaufmann Publishers; 2001.Google Scholar
- Grimaccia F, Mussetta M, Zich R: Genetical swarm optimization: self-adaptive hybrid evolutionary algorithm for electromagnetics. IEEE Trans Antennas Propag 2006, 55: 781–785.View ArticleGoogle Scholar
- Elhossini A, Areibi S, Dony R: Strength pareto particle swarm optimization and hybrid EA-PSO for multiobjective optimization. Evol Comput 2010, 18: 127–156. 10.1162/evco.2010.18.1.18105View ArticlePubMedGoogle Scholar
- Settles M, Soule T: Breeding swarms: a GA/PSO hybrid. Proceedings of Genetic and Evolutionary Computation Conference: 25–29 June 2005; Washington 2005, 161–168.View ArticleGoogle Scholar
- Michalewicz Z: Genetic Algorithms + Data Structures = Evolution Programs. Springer; 1999.Google Scholar
- Guan SU, Zhang S: Incremental evolution of cellular automata for random number generation. Int J Mod Phys C 2003, 14: 881–896. 10.1142/S0129183103005017View ArticleGoogle Scholar
- Stanley K, Miikkulainen R: Evolving neural networks through augmenting topologies. Evol Comput 2002, 10: 99–127. 10.1162/106365602320169811View ArticlePubMedGoogle Scholar
- Kitano H: Biological robustness. Nat Rev Genet 2004, 5: 826–837.View ArticlePubMedGoogle Scholar
- Storn R, Price K: Differential evolution -- simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization 1997, 11: 341–359. 10.1023/A:1008202821328View ArticleGoogle Scholar
- Cao H, Kang L, Chen Y: Evolutionary modeling of systems of ordinary differential equations with genetic programming. Genetic Programming and Evolvable Machines 2000, 1: 309–337. 10.1023/A:1010013106294View ArticleGoogle Scholar
- Wen X, Fuhrman S, Michaels GS, Carr DB, Smith S, Barker JL, Somogyi R: Large-scale temporal gene expression mapping of central nervous system development. Proc Natl Acad Sci USA 1998, 95: 334–339. 10.1073/pnas.95.1.334PubMed CentralView ArticlePubMedGoogle Scholar
- Sutton MD, Smith BT, Godoy VG, Walker GC: The SOS response: recent insights into umuDC-dependent mutagenesis and DNA damage tolerance. Ann Rev Genet 2000, 34: 479–497. 10.1146/annurev.genet.34.1.479View ArticlePubMedGoogle Scholar
- Ronen M, Rosenberg R, Shraiman BI, Alon U: Assigning numbers to the arrows: parameterizing a gene regulation network by using accurate expression kinetics. Proc Natl Acad Sci USA 2002, 99: 10555–10560. 10.1073/pnas.152046799PubMed CentralView ArticlePubMedGoogle Scholar
- Bansal M, Gatta GD, di Bernardo D: Inference of gene regulatory networks and compound mode of action from time course gene expression profiles. Bioinformatics 2006, 22: 815–822. 10.1093/bioinformatics/btl003View ArticlePubMedGoogle Scholar
- Cho DY, Cho KH, Zhang BT: Identification of biochemical networks by S-tree based genetic programming. Bioinformatics 2006, 22: 1631–1640. 10.1093/bioinformatics/btl122View ArticlePubMedGoogle Scholar
- Kimura S, Nakayama S, Hatakeyama M: Genetic network inference as a series of discrimination tasks. Bioinformatics 2009, 25: 918–925. 10.1093/bioinformatics/btp072View ArticlePubMedGoogle Scholar
- Kimura S, Sonoda K, Yamane S, Maeda H, Matsumura K, Hatakeyama M: Function approximation approach to the inference of reduced NGnet models of genetic networks. BMC Bioinformatics 2008, 9: 23. 10.1186/1471-2105-9-23PubMed CentralView ArticlePubMedGoogle Scholar
- Huang HL, Chen KW, Ho SJ, Ho SY: Inferring S-system models of genetic networks from a time-series real data set of gene expression profiles. Proceedings of IEEE Congress on Evolutionary Computation: 1–6 June 2008; Hong Kong 2008, 2788–2793.View ArticleGoogle Scholar
- Voit EO, Almeida J: Decoupling dynamical systems for pathway identification from metabolic profiles. Bioinformatics 2004, 20: 1670–1681. 10.1093/bioinformatics/bth140View ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.