Inferring robust gene networks from expression data by a sensitivity-based incremental evolution method
BMC Bioinformatics volume 13, Article number: S8 (2012)
Reconstructing gene regulatory networks (GRNs) from expression data is one of the most important challenges in systems biology research. Many computational models and methods have been proposed to automate the process of network reconstruction. Inferring robust networks with desired behaviours remains challenging, however. This problem is related to network dynamics but has yet to be investigated using network modeling.
We propose an incremental evolution approach for inferring GRNs that takes network robustness into consideration and can deal with a large number of network parameters. Our approach includes a sensitivity analysis procedure to iteratively select the most influential network parameters, and it uses a swarm intelligence procedure to perform parameter optimization. We have conducted a series of experiments to evaluate the external behaviors and internal robustness of the networks inferred by the proposed approach. The results and analyses have verified the effectiveness of our approach.
Sensitivity analysis is crucial to identifying the most sensitive parameters that govern the network dynamics. It can further be used to derive constraints for network parameters in the network reconstruction process. The experimental results show that the proposed approach can successfully infer robust GRNs with desired system behaviors.
Gene regulatory networks (GRNs) are essential for controlling cellular metabolism and the organismal development. Under the command of transcription factors (TFs), each gene influences the activity of the cell by generating messenger RNA (mRNA) that guides the synthesis of proteins by ribosomes in the cytoplasm (the location in the cell where biochemical reactions and molecular events take place). Gene network modeling uses gene expression data to characterize the phenotypic behavior of a system under study. With the reconstructed networks, biologists can generate and test hypotheses to further understand the complex phenomena that occur in nature systems and to explore the dynamics of those systems.
Modeling GRNs manually on the basis of the experimentally measured time-series data takes a considerable amount of time. Therefore, an automated reverse-engineering procedure is recommended [1, 2]. This procedure involves altering the gene network in some way, observing the outcome, and using computational methods to infer the underlying principles of the network. To derive a realistic model, available domain knowledge (including functional and structural information) can be integrated into the computational methods. Figure 1 illustrates the general procedure of inferring GRNs from quantitative expression data. The inferring/modeling block of this figure indicates the computational procedure used to derive network parameters for a given model, to build and simulate the model, and to evaluate it by comparing the behavior of the inferred model with the original data set. In addition to making use of expression data, one recently developed strategy combines information from various sources to narrow down the search space in the network. This strategy shortens the time and effort required for the validation and discovery of networks. For example, if some gene names are known then they can be mapped into knowledge bases (e.g., the Gene Ontology) to extract biological knowledge (e.g., gene function) that can be used to determine the network structures. Our goal in this study is to establish a methodology for network inference and to investigate aspects of network dynamics that have not been addressed previously.
To infer a network with desired system behavior, the most important steps are to select a network model and then to fit the network's structural parameters to the available expression data. Many models have been proposed to address different levels of biological details, ranging from the very abstract (involving Boolean values only) to the very concrete (including full biochemical interactions with stochastic kinetics). Abstract models are easy to simulate and therefore less computationally taxing, but it has been proven that they are not able to capture certain system behaviors. In contrast, concrete models are more suitable for simulating biochemical process realistically, but, because of their computational complexities, these models can only be applied to small systems. To capture the underlying physical phenomena of a gene network we took one of the most popular and well-studied concrete models, the S-system model, as an example of a biological network, and we used both simulated and collected real gene expression data to reconstruct the model. The S-system model is a type of ordinary differential equation (ODE) model. It consists of a particular set of tightly coupled ODEs in which the component processes are characterized by power law functions [3, 4]. The system structure of an S-system is described by the following equation:
Here, x i is the expression level of gene i, and N is the number of genes in a genetic network. The non-negative parameters α i and β i are rate constants that indicate the direction of mass flow. The real-number exponents g i,j and h i,j are kinetic orders that reflect the strength of interactions from genes j to i. The above set of parameters defines an S-system model. To infer an S-system model is, thus, to estimate all of the 2N (N+1) parameters simultaneously.
Although non-linear ODEs can more accurately model the system dynamics of gene networks, they are difficult to solve by traditional optimization techniques [2, 3]. To reduce the complexity of such models, Di Bernardo et. al developed an approach that involves taking a series of steady-state gene expression measurements following transcriptional perturbations and reducing the linear ODEs to a linear regression problem that can be solved with relevant techniques . In addition, some researchers employed the strategies of gene clustering for dimension reduction and gene classification for identification of expression patterns to reduce task complexity [6–10]. As evolutionary algorithms (EAs) have been widely used to solve many difficult optimization tasks with good results, they have been suggested as a way to infer gene networks [3, 4]. Among many studies on this topic, the most relevant are those involving the use of EAs to infer S-system models (e.g, [3, 4, 11]). In these studies, the network parameters of the S-system were arranged as a string of floating numbers (i.e., the chromosome in EAs) that could be evolved by genetic operators.
In recent studies of EA-based parameter inference, one critical problem, network robustness [12–14], has not been addressed. It is important to investigate the effect of network parameter perturbations on the overall system. These parameters not only are the numerical components of the model, but also represent the activities or interactions among proteins, transcription factors, and mRNAs in the gene network. Each network parameter has an important role in determining the behavior of a biological system. Parameters that are very sensitive to variation can introduce fragility into the system. In fact, recent research has found that parameters can define the dynamics of a model, and more important, studies have shown that enforcing constraints on the parameters can limit (control) model dynamics . Specifically, by measuring and analyzing the variations of network parameters and their effects on relevant genes, the gene-gene interactions of an inferred model can be interpreted. To ensure the robustness of the inferred network and to further investigate gene interactions, it is important to derive an acceptable value range for each parameter and to restrict the parameter's value to the specified range during the network reconstruction process . Therefore, we take parameter sensitivity into consideration in the network reconstruction procedure so that robust results can be obtained.
Sensitivity analysis (SA) is an indispensable technique that can help researchers to investigate the parameter robustness properties of an inferred network. By varying the parameter values within a certain range and performing statistical calculations to measure the system instability (or fragility), researchers can identify the critical parameters or discover seemingly unimportant parameters that may have a positive or negative influence on a network. SA techniques have been used in many biological studies, such as genetic circuit design , mammalian circadian clock modeling , and target prediction in signaling pathways . In general, the sensitivity of a parameter is defined as :
Here, P represents the parameters that are varied in a given range, M is the mathematical function describing the system behavior, and ∂M means the change in M due to the value changed in ∂P with respect to P.
In a dynamic system, there are two types of methods often used to conduct SA: local SA and global SA. Local SA can measure the sensitivity of a parameter from a given range even if the system structure is unknown, but it only considers one parameter at a time and ignores the interactions between parameters [20, 21]. Alternatively, global SA can examine parameter interactions with different parameter magnitudes simultaneously [22, 23], but this can only be done if the system structure is known in advance. The above two methods have been compared extensively [24, 25]. It should be noted that neither local SA nor global SA is superior: the choice of method depends on the specific application and the information available. In the biological cases that involve the use of SA techniques, the sensitivity of parameters should be investigated from a biological point of view in order to validate the results.
From the perspective of parameter dynamics, we propose a new approach to infer robust networks. Our approach includes two major parts. The first part is a sensitivity analysis procedure that selects sensitive network parameters, determines value ranges for them, and then sends the parameters with their constraints to the inference mechanism. The second part is an evolutionary method responsible for inferring networks from the results obtained in the first part. This approach, to our knowledge, is the first work in network modeling that integrates sensitivity analysis into an inference algorithm to consider both internal (robustness) and external (behavior) characteristics of the inferred network. To validate the proposed approach, a series of experiments has been conducted, in which the proposed SA procedure was coupled with different EAs. The results have been analyzed, and they show that this approach can be used to infer robust networks successfully from artificial and real gene expression profiles.
Parameter sensitivity analysis
In the process of inferring a gene network from expression data the genes interact with each other, and the network structure is generally unclear during the modeling process. Most studies of global SA use known pathways as examples and can therefore easily select the critical parameters and control the total number of parameters (usually few than 100) for further analysis [23–25]. In the type of network inference problem addressed here, it is not possible to identify and select the most important parameters in order to perform global SA, because no prior information about network structure is available.
To consider multiple network parameters simultaneously in such situations, we present a new method that is a modification of a widely used global SA technique, called multi-parameter sensitivity analysis (MPSA) [12, 19]. MPSA is based on Monte Carlo simulations and on quantitative comparisons of cumulative frequency via corresponding Pearson correlation coefficients among parameters to identify those that are most sensitive. It can be used to explore critical genetic reactions and to examine the robustness of a gene network as other global SA methods do. In the method described here, the objective function of the SA method is the fitness function that is to be optimized in the evolutionary algorithm, which is the mean squared error over the time course:
In this expression, x i d(t) is the desired expression level of gene i at time t, x i a(t) is the actual value obtained from the inferred model, N is the number of genes in the network, and T is the number of time points used to measure gene expression data during the period.
Our modified method, m-MPSA, inherits qualities from MPSA but has several advantages. First, MPSA is easy to implement and modify. It allows users to select one parameter at a time or to consider multiple parameters at the same time to perform sensitivity analysis. This characteristic is crucial because, when the network structure is unknown or uncertain, selecting one parameter at a time is a good way to reduce the complexity of the search space. Researchers can then concentrate on each parameter and temporarily neglect the combination sets at the selection phase. Second, compared with most of the global SA methods, the mathematical equations of MPSA are more manifest and succinct, and the parameter sensitivity is thus easy to calculate. Finally, the value ranges of parameters used in MPSA are the same as in an ODE model. This means that the sensitivity of each parameter can be directly mapped into an ODE model without any extra computation.
In previous studies, researchers have shown that the most influential (sensitive) parameters play key roles in modeling a GRN; that is, a model's behavior varies largely because of its sensitive parameters . Therefore, our current study draws on the characteristics of sensitive parameters, and it incorporates the sensitivity analysis method into the incremental evolution approach to infer a GRN. The proposed m-MPSA method is described in the Sensitivity analysis algorithm below. It uses an iterative process to calculate the sensitivity of each parameter, and it then ranks the sensitivities of all parameters. The input of this algorithm is the set of network parameters to be determined; and the output, a list of parameters ranked by their sensitivity values. With the parameter sensitivities ranked, the evolutionary algorithm can then infer robust solutions by exploiting the sensitive parameters that are likely to significantly influence the genetic model as a whole.
Sensitivity analysis algorithm()
Step 1: Select one parameter, i, at a time. (i = 1, 2, .., μ, μ is the number of network parameters).
Step 2: Set the parameter range R i for each S-system parameter, i. Initially, the commonly used search regions [-3.0..3.0] and [0.0..10.0] are taken as the initial settings for the parameters of kinetic orders and rate constants respectively, as suggested in . These settings are used to define the parameter range R i , by adding/subtracting an amount of one third of the initial settings to/from the current value of parameter i.
Step 3: For each R i , a set of random values are created with a uniform distribution within the specified range. Each random value for parameter i, together with the other (μ-1) parameters, constitutes a candidate solution (In these experiments, 500 random values were generated, as suggested in .
Step 4: Construct the mathematical model (i.e., S-system model) and calculate the objective function value for each of the random values generated in Step 3.
Step 5: Determine whether the objective function value of each random point obtained in Step 4 is acceptable or unacceptable by a given threshold C r (guided by the literature or experience; it is defined as the triple of the best available objective function value in the whole population). The parameter value of a random point is classified as unacceptable if its corresponding objective function value is greater than C r ; otherwise, the parameter value is acceptable. Go back to Step 1 until all parameters have been dealt with.
Step 6: Calculate the sensitivity for parameter i by using its cumulative frequency (CF i ). CF i measures the similarity of two statistical distributions that are formed by the acceptable and unacceptable values produced from Steps 3-5 for parameter i. The parameters are ranked according to their CF, and the ones with relatively low CF are considered to be sensitive.
Step 7: Output a parameter list in which the parameters are sorted by their CF values.
In addition to presenting the m-MPSA algorithm, we provide here a walkthrough example to illustrate how it operates in practice. In the example, we randomly choose two parameters from one of the datasets (the largest one) used in the experimental section, and we calculate the sensitivity values of these parameters. The dataset consisted of ten nodes, and there were altogether 220 parameters to be estimated. Without losing generality, we assume that the algorithm selected the parameter P125 (Step 1), and then set the range R125 for P125 (Step 2). As mentioned above, we used one third of the original search region (which was [-3.0, 3.0] in the current study) to set up the bounds for each parameter. Hence, if P125 had a value of 0.59, then the lower bound for this parameter was -0.41 (i.e., 0.59+1/3×(3.0-(-3.0))), and the upper bound was 1.59. Then, 500 random values were generated within the range [-0.41, 1.59] for P125 (Step 3), and their corresponding objective (fitness) function values were calculated (Step 4). Next, a threshold C r was determined to categorize the above 500 evaluation results as acceptable or unacceptable. If the current best result (out of the 500 evaluations) was 0.52, then a threshold of 1.56 (i.e., 0.52×3) was used to determine whether a result was acceptable or not (Step 5). After that, two statistical distributions were constructed for the acceptable and unacceptable results identified above. For statistical purposes, the value range [-0.41, 1.59] was divided into ten intervals, and for each interval the numbers of acceptable and unacceptable values were counted. Figure 2 (a) shows the distributions for P125, in which acceptable and unacceptable numbers were marked. The cumulative frequency distributions were then constructed accordingly, as presented in Figure 2 (b). Finally, the sensitivity value of P125 was obtained by measuring the correlation coefficients of the two (acceptable and unacceptable) cumulative frequency distributions (Step 6). In this case, the sensitivity value of P125 was 0.8912.
A similar procedure was performed for the other parameter, P145, and a sensitivity value of 0.7695 was obtained. Because P145 had a sensitivity value lower than that of P125, it was considered to be more sensitive than P125. In this way, after sensitivity values of all parameters were calculated, they were ranked and sent to the incremental evolution algorithm.
As mentioned above, more sensitive parameters are more likely to have a significant influence on the entire genetic model. Therefore, if no prior knowledge can be applied to the reconstruction of a large GRN, researchers can conduct sensitivity analysis for parameter selection to guide the network inference method, and observe how numerical quantities within different intervals can formulate candidate results (to determine the model dynamics). Taking P145 as an example, the acceptable values for this parameter fell in the range [-0.63, 0.17] that consisted of 167 acceptable network models (dynamics) in total. We can exploit this result to infer P145 and explore the relationships of this parameter with other genetic reactions (or parameters). In fact, researchers have shown that the most influential (sensitive) parameters play important roles in modeling a GRN. That is, the behavior variation of a network model largely depends on its sensitive parameters .
In Figure 2, we can see that the variation of the parameter value range influenced the system dynamics. That is, the system robustness was significantly affected by the value range specified for a parameter. For instance, the two distributions for P145 in Figure 2 (c) show that, if the value range of P145 changed from [-0.43, -0.23] to [0.37, 0.57], then the acceptable values for P145 changed from 56 to 0. This means that the inference algorithm can maintain the system robustness if it can locate P145 within the interval [-0.43, -0.23]. The above analysis concludes that the proposed SA method can be used to find sensitive parameters and to derive suitable value ranges as search constraints, which can then be used in the inference algorithm to construct robust networks.
Evolutionary algorithm for parameter optimization
To derive the network parameters of S-system model, we also implement an evolutionary algorithm (EA) for parameter optimization to work with the above SA method. EA is a population-based approach that evaluates many solutions simultaneously in the search space, and it is likely to find a global solution for a given problem. Recently, a new population-based optimization technique, particle swarm optimization (PSO, ), was proposed as an alternative to the traditional EAs. PSO tries to mimic the goal-seeking behavior of biological swarms. The standard PSO algorithm contains a set of particles and operates in an iterative manner. Each particle is characterized by its position and velocity, and it moves in the solution space. The position of each particle represents a potential solution that is evaluated by a predefined fitness function. PSO has some attractive characteristics. In particular, it has memory, so that knowledge of a good solution can be retained by all particles (solutions). During the iterative search process, each particle remembers its previous best position and the best position of any particle in the swarm. Then the particle uses the position information to modify its position and velocity, and continues its movement in the search space. The details of the PSO algorithm can be found in . Some performance comparisons between PSO and the most popular EA method, the genetic algorithm (GA), have been made, underscoring the reliability and the convergence speed of both methods. However, the result tends to be inconclusive: each of the algorithms has shown better performance than the other for some particular applications. Consequently, hybrid techniques were proposed to effectively exploit the qualities and the uniqueness of the two methods. It is now commonly agreed that the hybrid methods can lead to further performance improvements [28–30]. Therefore, in this work, we develop a hybrid method to exploit the solution memory of PSO and the genetic operations of GA for network inference.
As has been mentioned, our primary goal is to investigate how parameter sensitivity can be used to guide an EA and derive robust networks. To concentrate on this issue, we have chosen to implement a popular GA-PSO method (to avoid the deviations caused by different computational methods) for parameter optimization, and we have built our SA method upon it. Breeding Swarms  is a frequently used GA-PSO hybrid method that has demonstrated its good performance in many benchmark test functions. Therefore, we adopt this method for our network inference application.
The hybrid GA-PSO method used in this work is illustrated in Figure 3. Initially, a population is randomly generated and evaluated. The individual solutions are ranked according to their fitness values, and then they are divided into two parts for running the PSO and GA methods separately. As shown in Figure 3, the best individuals (including (1-r)% of the whole population) are preserved and enhanced by the PSO procedure. They are then sent to the next generation. Meanwhile, on the right-hand side of the figure, the remaining individuals (i.e., r% of the population) are discarded. To replace the removed individuals, a tournament selection scheme is used to choose the same number of individuals from the best ones (before they are updated by PSO), and the selected individuals are used to create new individuals by the GA procedure.
In our implementation, we take a direct encoding scheme to represent solutions for both the GA and PSO parts, in which the network parameters (i.e., α i , β i, g i,j , and h i,j in the S-system model) are arranged as a linear string chromosome of floating-point numbers. The goal here is to minimize the accumulated discrepancy between the gene expression data recorded in the dataset and the values produced by the inferred model. Therefore, the error (objective or fitness) function defined previously is used directly for performance measurement. For the PSO part, the equations for updating the particle's velocity and position are the same as the ones listed in the original PSO work , and for the GA part, the operations of crossover and mutation described in  are used.
Sensitivity-based incremental evolution method for network inference
As described in the first section, to infer the S-system model is to determine the 2N (N+1) parameters simultaneously. Solving this high-dimensional problem is difficult, especially when the complexity of regulation increases along with the number of genes involved. One promising approach to this problem is to adopt the concept of incremental evolution to infer the large number of parameters. The underlying principle of incremental evolution [32, 33] is that a population is first evolved to solve an easier version T' of the original complex task T, in which the solution region of T is more accessible from region T'. More task versions with incremental complexity can be arranged so that the original task can be achieved progressively. Evidently, the main task involved in implementing incremental evolution is the formulation of a scheme to transfer the goal task into another more evolvable task. In the process of task transformation, the structure of the environment and the overall goal must be preserved. This can usually be achieved by arranging the task sequence manually, or, alternatively, it can be done using an automated procedure. In this work we modify the cutting plane mechanism used in the high-dimension function optimization problem to develop an adaptive strategy to automatically perform incremental evolution.
In network inference, solving an easier version of the original task means evolving partial solutions (i.e., subsets of all network parameters) that can provide some guidance for the search and move toward the target solution gradually. In the proposed method, parameter sensitivities are calculated and used to determine the priorities of the parameters to be evolved (optimized). The sensitive parameters are selected iteratively, by the proposed SA method and the evolutionary method is employed to search the parameter dimensions accordingly.
An exploration phase is also included in the algorithm for the injection of random effects. This maintains population diversity and avoids a situation in which individuals move near locally optimal solutions. The SA and exploration procedures are both performed periodically with specific generation intervals. In this way, the overall solutions can be derived gradually, and the inferred networks are more robust to internal variations. The flow of the proposed SA-based incremental evolution approach is illustrated in Figure 4, and the details of the approach are described below.
Sensitivity-based incremental evolution algorithm()
Step 1: Initialize the population and start the evolution. Note that Steps 2 and 3 are performed only at certain generations (SA interval and exploration interval, respectively).
Step 2: Perform sensitivity analysis (call the Sensitivity analysis algorithm described above) to obtain a list of parameters ranked by their sensitivity. This step is performed only if the evolutionary cycle reaches a pre-defined generation number (every 500 generations in this work).
Use a threshold (i.e., the correlation coefficient ratio of the parameter CF values obtained from the SA procedure) to select the most sensitive parameters. Then update the threshold in order to consider more parameters in the next sensitivity analysis iteration.
Set new value ranges for all network parameters. Sensitive parameters are given tight bounds, and insensitive parameters are given loose bounds.
Step 3: Start the exploration procedure, again only if the evolutionary cycle reaches a pre-defined generation number (every 1000 generations in this work). In this phase, a random value is generated for each parameter from its new bounds, replacing each parameter's current value.
Step 4: Continue the evolutionary computation, using the bounds obtained in Step 2 to restrict the corresponding parameters.
In the above algorithm, the most critical feature for enabling incremental evolution is found in Step 2. The strategy is to perform the sensitivity analysis for network parameters and to specify constraints on them. A network parameter (in other word, a search dimension) is added to the sensitive list and given a pair of tight constraints (specified by the upper and lower bounds) if its sensitivity value is less than (or equal to) the threshold; otherwise, a parameter is added to the insensitive list and given loose constraints. In the above procedure, the threshold used in Step 2(a) to identify the sensitive parameters has an initial value of 0.84 (chosen from the preliminary study), and this value is increased gradually with a step value of 0.1. The upper and lower bounds of a sensitive parameter in Step 2(b) are determined from the best value of this parameter available so far by adding and subtracting an amount equal to twice the value of a small constant (which is equal to the velocity bound generally used in a PSO-based method – 0.2 in this work). Bounds for the insensitive parameters are defined in a similar way, except that a value of five times the same constant is used.
The dimensions in the sensitive list have higher priorities to be searched, and the corresponding parameter values need to be determined at an earlier evolutionary stage. As the algorithm describes, the search is accomplished by setting small-range bounds for each sensitive parameter in accordance with the best value found so far, and the evolutionary procedure is performed next to infer a suitable parameter value within the specified bounds. The creation of constraints for parameters is important in the search for robust solutions, especially when a non-deterministic search method (such as the evolutionary algorithm) is employed to derive parameter solutions (because many feasible but fragile solutions could be evolved). If the parameter value exceeds the specified boundary value, the system dynamics will be influenced, and consequently, the network behaviors will change. To generate tight constraints for sensitive parameters in the search process is in fact to restrict search regions for the integrated algorithm, which can thus obtain robust networks with desired behaviors. As indicated in , by observing how parameters with varied boundaries direct a system to move toward different system dynamics, we can obtain new insights into a complex biological system and understand the principle of biological design.
As the evolutionary method proceeds, parameters in the insensitive list will be moved to the sensitive list incrementally as the threshold is gradually increased. Therefore, more parameters will be considered sensitive and their values will be determined. Eventually, all the parameter values will be obtained. Because sensitive parameters are more influential to network behavior, enforcing tight constraints on these parameters maintains the robustness of the entire genetic network. At the same time, it allows other parameters to evolve during the search process. In this way, the proposed method can infer robust network models with the desired system behavior.
Results and discussion
To verify the proposed approach, we conducted a series of experiments on several datasets collected from the literature. As mentioned, our main goal was to explore the use of sensitivity analysis to infer robust networks, not to compare different computational methods. Therefore, without losing generality, we chose to implement three computational methods, including the traditional PSO method, a GA-PSO hybrid method, and a differential evolution (DE) method , and to build the proposed m-MPSA on them. The main reason to select the above three methods is that they are the most representative and popular evolutionary algorithms used for optimization tasks. The same SA strategy can also be embedded into other methods if their implementation details are available.
The first and second phases of the experiments involved comparing the external behavior and internal robustness of the networks inferred by different methods. Four datasets were used. For each dataset, the above three algorithms were performed with three different settings: (1) using only the original algorithms; (2) using our m-MPSA to select sensitive parameters and to derive parameter constraints in the incremental evolution process (i.e., the proposed approach); and (3) using m-MPSA to select sensitive parameters, but with random value bounds during the incremental evolution process. The third setting was mainly included to demonstrate the importance of using appropriate bounds in the evolutionary process. In addition, one real dataset was used in the third phase, which made it possible to investigate how the proposed approach can be applied to the study of real gene networks. This dataset came from a study of gene expression in the SOS DNA repair system of E. coli. For this dataset, network models were inferred and the critical parameters were analyzed and discussed.
Performance of sensitivity analysis in network modeling
In this set of experiments, four datasets (with more genes than most examples encountered in the literature) were used to evaluate the proposed approach. The first dataset was a five-node artificial network that has become a popular model for comparing different methods in recent studies on network inference . To collect time series data, we started network operations and continued operations for thirty simulation time steps. The nodes had the following relationships:
The second dataset was taken from . It was an eight-gene network created manually by the popular GRN simulation software tool Genexp. The third dataset was a ten-node network previously described in , given by the following equations:
The fourth dataset was part of a real experimental dataset from a study of the rat central nervous system (CNS) . The original dataset included expression data for 112 genes collected at 9 time points (including the embryonic, postnatal, and adult stages). In this experiment, we selected a group of eight nodes representing the largest sub-network of the representative cluster reported in  as the target network. For the above four datasets, three inference algorithms (PSO, GA-PSO, and DE) with three different settings were arranged, and twenty independent runs were conducted for each arrangement. The population sizes for the four datasets were 800, 1000, 1600, and 1000, respectively. Tables 1, 2, 3, 4 show the results, in which the mean, standard deviation, and best and worst performance of the runs are listed for each arrangement.
As shown in Tables 1, 2, 3, 4, SA (setting 2) consistently outperformed the other two settings for inferring gene networks. It produced the best results for the average, best, and worst fitness values when used in conjunction with any of the three inference methods. Compared with other methods, the GA-PSO method produces smaller standard deviations, which indicates that it is more stable than the other methods. It should also be noted that the performance when using a m-MPSA method with random bounds (setting 3) was not as good as when using m-MPSA to select parameters and derive bounds for them. These results confirm the effectiveness of the proposed bound-setting strategy.
Evaluation of network robustness
To evaluate the robustness of the networks inferred by three different settings for each algorithm, we compared fitness and the sensitivity values (averaged over all parameters for each network model) of the best solutions recorded from the final generations in all runs. Tables 5, 6, 7 list the results of using different settings for the three algorithms. The values recorded in the tables are the averaged results of the twenty runs performed for each setting. We can see that results for the four datasets presented in Tables 5, 6, 7 are consistent for all three inference algorithms. They confirm that the proposed SA-based approach (setting 2) is able to infer networks with lower fitness/error values (better system behavior) and lower sensitivity values (more robust S-system models) simultaneously. It should be noted that, because the three algorithms have different computational features (e.g., convergence rate, search strategy, and selection approach), the thresholds (relative values determined from the candidate solutions of each run) used for constructing the sensitive parameter list in each algorithm were not the same during the evolutionary process. Therefore, the sensitivities obtained from different algorithms are not directly comparable.
Following the comparisons of the inferred networks, we used an example to further illustrate the effectiveness and importance of using sensitivity analysis to derive appropriate constraints for network parameters in the inference process. Here, we chose the third dataset (which was the largest network considered in the experiments) and investigated the corresponding parameter correlations. In this example of eight-gene network, there were 144 parameters (P1~P144) to be determined, of which the three most sensitive parameters, P14, P64, and P72 (identified as sensitive in at least fifteen out of twenty runs), were analyzed. Because the GA-PSO algorithm can give the best performance (as shown in Tables 5, 6, 7), data were collected from the runs using this method without (setting 1) and with (setting 2) the use of the SA method to derive value ranges for network parameters.
The values of the above three parameters recorded from the runs are shown in Figure 5, in which the upper and lower parts are the results of the twenty runs conducted for settings 1 and 2, respectively. In this figure, the x-axis indicates the identity of a run, and the y-axis indicates the parameter values. In the results, we noticed two parameter correlations (or patterns) in the evolutionary process for each of the experimental settings. Table 8 lists the patterns most often obtained from the runs for the two settings (these patterns appeared in at least five runs). For setting 1, the two patterns (pattern-ns1 and pattern-ns2) describe two qualitative relationships: P64 >P72 >P14 and P64 >P14 >P72. For setting 2, the patterns are P64 >P14 >P72 (pattern-s1) and P14 >P64 >P72 (pattern-s2). The runs in which the patterns were observed are also indicated in the tables.
In Table 8, we can see that the qualitative relationship P64 >P14 >P72 appeared in the results for both settings (pattern-ns2 and pattern-s1). After further investigation, we found that this relationship among the three parameters was important for producing good models (models with low fitness/error and high robustness). Using our specially designed SA method and parameter constraints, this relationship can always be derived and kept in the evolutionary process (in eleven out of twenty runs), but it is not often obtained in the runs with setting 1 (only five out of twenty runs with setting 1 showed the relationship). Similarly, another parameter relationship, P14 >P64 >P72 (pattern-s2), obtained from setting 2, was useful for inferring good models. To observe how the three parameters varied during the runs, see Figures 6, 7. These figures illustrate typical examples of the patterns presented in Table 8. In the figures, the x-axis indicates the generation number (each unit represents 100 generations), and the y-axis indicates the parameter value. As can be clearly seen in Figures 6, 7 the parameter values changed actively to move toward the appropriate positions to obtain a successful model in the runs using setting 2, whereas the values remained static in the runs using setting 1. These results verify the effectiveness of using SA in network inference.
Evaluation of the proposed approach on a real dataset
After evaluating the performance of the proposed SA-based approach in network inference, we conducted a set of experiments to investigate how our approach can be applied to the study of a real gene network. Because the GA-PSO algorithm has been shown to outperform other methods in the above experiments, we used this method with two settings (with and without SA) to conduct experiments on a real dataset.
The data set used in this experimental phase comes from a study of the SOS DNA repair system in E. coli. Figure 8 illustrates the gene regulation that occurs in this system . In this system, the LexA protein (a repressor) maintains its expression level in a normal state and is bound to the promoter regions of SOS genes, including uvrD, umuD, lexA, uvrA, recA, and polB. When DNA damage occurs, the RecA protein senses the damage and mediates LexA autocleavage. The decrease in LexA relieves the repression of the SOS genes. The expression of these genes then activates the SOS repair system. Once the damage has been repaired, the concentration of recA drops and this gene stops mediating LexA autocleavage. The LexA level increases and begins to repress the SOS genes again.
Inference of expression profiles with the decomposed S-system model
In the experiments, six genes were selected from the original experimental data reported in . The genes selected were uvrD, umuD, lexA, uvrA, recA, and polB. These were selected because the interactions of the 6-gene network have been well studied and commonly used in related studies, and the corresponding network has been inferred successfully [11, 40–43]. Using the regulatory relationships that have been described, we can validate our proposed network inference method by comparing our results to those reported previously.
Though there have been several studies on inferring the SOS repair system, none of them used a tightly coupled S-system model (the general S-system) to represent the gene network, and none of them inferred such a model from expression data. To the best of our knowledge, the most relevant work is , in which the authors utilized a decoupled S-system model to infer the SOS repair system. In a decoupled S-system, a tightly coupled system of non-linear differential equations is decomposed and analyzed as several differential equations [44, 45], each of which can describe a specific gene and can then be separately inferred (by considering one gene at a time). Motivated by this research, we thus adopted decoupled differentials to describe gene profiles, and we employed the proposed approach to infer gene interactions. In this way, we can not only examine the inferred network behaviors, but also compare the inferred gene regulations to the ones obtained in the other studies mentioned above.
The SOS repair system dataset included 50 sampling points for each gene. The six major genes (lexA, uvrA, uvrD, recA, umuD, and polB) were inferred through the decoupled S-system model. In the experiments, twenty independent runs were conducted for each of the two settings. Table 9 presents the results for the six genes of the SOS repair system dataset obtained by using settings 1 and 2. The mean, variance, best and worst fitness, and average parameter sensitivity values of the twenty runs are listed. Again, the results show that the SA-based approach (setting 2) outperformed the original algorithm in terms of both fitting the external behavior (with lower fitness/error) and exploring the internal robustness (with lower sensitivity) of a gene network. In addition, to observe the inferred network behaviors, Figures 9, 10 compare the inferred and target behaviors. In these figures, the x-axis represents time, and the y-axis represents the concentrations of particular gene products. As shown in the figures, network behaviors very similar to those of the real system can be inferred by our approach.
Analysis of critical parameters of the SOS repair system
To investigate the critical parameters that have significant influences on the system dynamics of the SOS repair system, we summarized the most sensitive parameters identified by the proposed approach in Table 10. There were 13 parameters selected and marked as crucial interactions, each of which represented a regulatory relationship between two genes (e.g., lexA -| uvrA). The letters G and H in the table indicate which of the kinetic orders g i,j and h i,j of gene i (as listed in the first column) is selected. As mentioned previously, the identified parameters (or gene regulations) can also be used to determine the network structure of the system to be inferred. By exploiting the structural information, the evolutionary algorithm (here, GA-PSO) can infer better models with robust system dynamics during the search process. In the SOS case, the 13 parameters indicated in Table 10 can be used to derive the scaffold of the SOS network, as shown in Figure 11.
To compare the gene regulation events identified by our approach to those found in the literature, we highlighted the gene interactions in Table 10 (the asterisks in the table are the ones collected from the literature) and summarized the results in Table 11. The results show that, of the thirteen gene regulation events found in our experiments, ten matched regulatory relationships known from other studies. For example, in Figure 11, the negative regulation of lexA, uvrA, and uvrD by lexA has been successfully identified as the most crucial interaction in determining system dynamics. Still, it is notable that the known negative regulation of lexA by recA was not recognized. The main reason for this is that our approach chose sensitive parameters from the kinetic orders of the S-system model on the gene level, but the above relationship was in fact interpreted as a regulation of protein LexA by protein RecA [11, 43]. The concentration of protein RecA depends on the sensing of DNA damage, and the interaction between recA and lexA therefore depends on the events occurring between DNA damage and DNA repair. Because the on or off state of DNA damage sensing was not directly described in the S-system model, the regulation of lexA by recA on the protein level was not observable on the basis of kinetic orders. In other words, the absence of this regulation was due to the model representation, rather than the inference approach. This shows that, to infer the SOS repair system accurately on both the protein and gene levels, a more comprehensive mathematical model is needed to describe the regulatory details. With only one exception (mentioned above) that was caused by limitations of the model itself, our approach has shown its strength in inferring real gene system that show the expected network behavior and internal network robustness.
After identifying the most sensitive parameters of the SOS repair system, we again analyzed the gene regulation relationships obtained from the runs to observe the correlation of parameters. The importance of investigating the relationships among parameter values of genes has been emphasized in the study of systems biology, especially when the boundary of a parameter value is taken into consideration [15, 34]. As presented and discussed in the second experimental phase, the qualitative relationships among genes can be regarded as special patterns (describing parameter correlations) that determined the system dynamics of a candidate model. By exploiting these patterns, the inference algorithm can refine the values of kinetic orders to obtain better solutions.
In the case of the SOS repair system, the patterns of regulatory interactions can be derived as in other cases, by recording and analyzing the values of the most sensitive parameters. Taking gene lexA as an example (see Table 12), the algorithm categorized genes lexA and uvrA as the critical gene regulators for lexA and two regulatory relationships, lexA -| lexA and uvrA -| lexA can be established on the basis of this information. Figure 12 illustrates the parameter values related to the patterns derived from the twenty runs. In this figure, the x-axis indicates the run number, and the y-axis represents the parameter values of the specific pattern (gene regulation) obtained in each run. In this figure, the upper and lower parts present the results of inferring the network without (setting 1) and with (setting 2) using the SA method, respectively. The regulation patterns for gene lexA are summarized in Table 12. As shown in Table 12, for experimental runs using setting 2, fifteen out of twenty runs consistently produced the same pattern uvrA -| lexA > lexA -| lexA (pattern-s1). This pattern can guide the inference algorithm to find better solutions. In contrast, the qualitative relationships between the two relationships uvrA -| lexA and lexA -| lexA obtained from the runs using setting 1 are inconsistent and unstable (pattern-ns1 and pattern-ns2 are in fact complementary). The above results show that the proposed approach can derive parameter bounds (which form useful patterns) to restrict the variation of lexA, and it can therefore infer models with better fitness and more stable (robust) system dynamics. Without using the SA method, the inference algorithm alone cannot guarantee the preservation of useful patterns during the iterative inference process.
Tables 13, 14, 15 list the regulation patterns analyzed for four other target genes in the SOS system: lexA, uvrA, uvrD, and umuD. Figures 13, 14, 15 show the parameter values involved. The genes recA and polB were ignored because there was only one gene regulatory relationship found for each of them (uvrD→recA for recA, and uvrD→polB for polB, as shown in Table 11). This means that no qualitative relationship can be established for these genes. The results in Tables 13, 14, 15 are consistent with Table 14, which again shows the success of using SA in network inference.
In this work, we developed a sensitivity-based incremental evolution approach to cope with one important issue, network robustness, which has not been addressed in gene network inference. Our approach included two parts. The first part was a sensitivity analysis method that was used to select sensitive parameters for deriving value bounds of these parameters. The second part was an evolutionary algorithm that took the bounds as constraints to perform parameter optimization. This process leads to inferred networks that are robust and produce the desired behaviors. To validate the proposed approach, a series of experiments were conducted to evaluate the external behaviors and internal robustness of the networks inferred by different methods. The results show that the proposed SA-based approach outperformed other methods in all datasets. Moreover, we analyzed in detail the results obtained from real time-series expression data for the SOS repair system. The analyses indicate that our approach can identify the critical parameters and use them to establish regulatory relationships among genes. By enforcing these relationships in the repetitive search process, our approach can successfully infer robust networks.
Ingolia NT, Weissman JS: Systems biology: reverse engineering the cell. Nature 2008, 454: 1059–1062. 10.1038/4541059a
Lee WP, Tzou WS: Computational methods for discovering gene networks from expression data. Brief Bioinform 2009, 10: 408–423.
Kikuchi S, Tominaga D, Arita M, Takahashi K, Tomita M: Dynamic modeling of genetic networks using genetic algorithm and S-system. Bioinformatics 2003, 19: 643–650. 10.1093/bioinformatics/btg027
Ho SY, Hsieh CH, Yu FC, Huang HL: An intelligent two-stage evolutionary algorithm for dynamic pathway identification from gene expression profiles. IEEE/ACM Trans Comput Biol Bioinform 2007, 4: 648–704.
Di Bernardo D, Gardner TS, Collins JJ: Robust identification of large genetic networks. Pac Symp Biocomput 2004, 486–497.
Lee WP, Yang KC: A clustering-based approach for inferring recurrent neural networks as gene regulatory networks. Neurocomputing 2008, 71: 600–610. 10.1016/j.neucom.2007.07.023
Datta S, Datta S: Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes. BMC Bioinformatics 2006, 7: 397. 10.1186/1471-2105-7-397
Zheng CH, Zhang L, Ng TY, Shiu CK, Huang DS: Metasample-based sparse representation for tumor classification. IEEE/ACM Trans Comput Biol Bioinform 2011, 8: 1273–1282.
Huang DS, Zheng CH: Independent component analysis-based penalized discriminant method for tumor classification using gene expression data. Bioinformatics 2006, 22: 1855–1862. 10.1093/bioinformatics/btl190
Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 1999, 96: 6745–6750. 10.1073/pnas.96.12.6745
Kabir M, Noman N, Iba H: Reversely engineering gene regulatory network from microarray data using linear time-variant model. BMC Bioinformatics 2010, 11: S56. 10.1186/1471-2105-11-S1-S56
van Riel NAW: Dynamic modeling and analysis of biochemical networks: mechanism-based models and model-based experiments. Brief Bioinform 2006, 7: 364–374. 10.1093/bib/bbl040
Fomekong-Nanfack Y, Postma M, Kaandorp J: Inferring Drosophila gap gene regulatory network: a parameter sensitivity and perturbation analysis. BMC Syst Biol 2009, 3: 94. 10.1186/1752-0509-3-94
MacNeil L, Walhout AJM: Gene regulatory networks and the role of robustness and stochasticity in the control of gene expression. Genome Res 2011, 21: 645–657. 10.1101/gr.097378.109
Gunawardena J: Models in systems biology: the parameter problem and the meanings of robustness. In Elements of Computational Systems Biology. Edited by: Lodhi HM, Muggleton SH. Hoboken, New Jersey: Wiley; 2010:21–47.
Feng X, Hooshangi S, Chen D, Li G, Weiss R, Rabitz H: Optimizing genetic circuits by global sensitivity analysis. Biophys J 2004, 87: 2195–2202. 10.1529/biophysj.104.044131
Leloup J, Goldbeter A: Modeling the mammalian circadian clock: sensitivity analysis and multiplicity of oscillatory mechanisms. J Theor Biol 2004, 230: 541–562. 10.1016/j.jtbi.2004.04.040
Mahdavi A, Davey R, Bhola P, Yin T, Zandstra PW: Sensitivity analysis of intracellular signaling pathway kinetics predicts targets for stem cell fate control. PLoS Comput Biol 2007, 3: e130. 10.1371/journal.pcbi.0030130
Cho K, Shin S, Kolch W, Wolkenhauer O: Experimental design in systems biology, based on parameter sensitivity analysis using a Monte Carlo method: a case study for the TNFα-mediated NF- k B signal transduction pathway. Simulation 2003, 79: 726–729. 10.1177/0037549703040943
Degenring D, Froemel C, Dikta G, Takors R: Sensitivity analysis for the reduction of complex metabolism models. J Process Control 2004, 14: 729–745. 10.1016/j.jprocont.2003.12.008
Radhakrishnan K, Edwards JS, Lidke DS, Jovin TM, Wilson BS, Oliver JM: Sensitivity analysis predicts that the ERK-pMEK interaction regulates ERK nuclear translocation. IET Syst Biol 2009, 3: 329–341. 10.1049/iet-syb.2009.0010
Bentele M, Lavrik I, Ulrich M, Stößer S, Heermann DW, Kalthoff H, Krammer PH, Eils R: Mathematical modeling reveals threshold mechanism in CD95-induced apoptosis. J Cell Biol 2004, 166: 839. 10.1083/jcb.200404158
Zhang HX, Dempsey WP, Goutsias J: Probabilistic sensitivity analysis of biochemical reaction systems. J Chem Phys 2009, 131: 094101. 10.1063/1.3205092
Marino S, Hogue IB, Ray CJ, Kirschner DE: A methodology for performing global uncertainty and sensitivity analysis in systems biology. J Theor Biol 2008, 254: 178–196. 10.1016/j.jtbi.2008.04.011
Zheng Y, Rundell A: Comparative study of parameter sensitivity analyses of the TCR-activated Erk-MAPK signalling pathway. Syst Biol (Stevenage) 2006, 153: 201–211. 10.1049/ip-syb:20050088
Kimura S, Ide K, Kashihara A, Kano M, Hatakeyama M, Masui R, Nakagawa N, Yokoyama S, Kuramitsu S, Konagaya A: Inference of S-system models of genetic networks using a cooperative coevolutionary algorithm. Bioinformatics 2005, 21: 1154–1163. 10.1093/bioinformatics/bti071
Kennedy J, Eberhart R: Swarm Intelligence. San Francisco: Morgan Kaufmann Publishers; 2001.
Grimaccia F, Mussetta M, Zich R: Genetical swarm optimization: self-adaptive hybrid evolutionary algorithm for electromagnetics. IEEE Trans Antennas Propag 2006, 55: 781–785.
Elhossini A, Areibi S, Dony R: Strength pareto particle swarm optimization and hybrid EA-PSO for multiobjective optimization. Evol Comput 2010, 18: 127–156. 10.1162/evco.2010.18.1.18105
Settles M, Soule T: Breeding swarms: a GA/PSO hybrid. Proceedings of Genetic and Evolutionary Computation Conference: 25–29 June 2005; Washington 2005, 161–168.
Michalewicz Z: Genetic Algorithms + Data Structures = Evolution Programs. Springer; 1999.
Guan SU, Zhang S: Incremental evolution of cellular automata for random number generation. Int J Mod Phys C 2003, 14: 881–896. 10.1142/S0129183103005017
Stanley K, Miikkulainen R: Evolving neural networks through augmenting topologies. Evol Comput 2002, 10: 99–127. 10.1162/106365602320169811
Kitano H: Biological robustness. Nat Rev Genet 2004, 5: 826–837.
Storn R, Price K: Differential evolution -- simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization 1997, 11: 341–359. 10.1023/A:1008202821328
Cao H, Kang L, Chen Y: Evolutionary modeling of systems of ordinary differential equations with genetic programming. Genetic Programming and Evolvable Machines 2000, 1: 309–337. 10.1023/A:1010013106294
Wen X, Fuhrman S, Michaels GS, Carr DB, Smith S, Barker JL, Somogyi R: Large-scale temporal gene expression mapping of central nervous system development. Proc Natl Acad Sci USA 1998, 95: 334–339. 10.1073/pnas.95.1.334
Sutton MD, Smith BT, Godoy VG, Walker GC: The SOS response: recent insights into umuDC-dependent mutagenesis and DNA damage tolerance. Ann Rev Genet 2000, 34: 479–497. 10.1146/annurev.genet.34.1.479
Ronen M, Rosenberg R, Shraiman BI, Alon U: Assigning numbers to the arrows: parameterizing a gene regulation network by using accurate expression kinetics. Proc Natl Acad Sci USA 2002, 99: 10555–10560. 10.1073/pnas.152046799
Bansal M, Gatta GD, di Bernardo D: Inference of gene regulatory networks and compound mode of action from time course gene expression profiles. Bioinformatics 2006, 22: 815–822. 10.1093/bioinformatics/btl003
Cho DY, Cho KH, Zhang BT: Identification of biochemical networks by S-tree based genetic programming. Bioinformatics 2006, 22: 1631–1640. 10.1093/bioinformatics/btl122
Kimura S, Nakayama S, Hatakeyama M: Genetic network inference as a series of discrimination tasks. Bioinformatics 2009, 25: 918–925. 10.1093/bioinformatics/btp072
Kimura S, Sonoda K, Yamane S, Maeda H, Matsumura K, Hatakeyama M: Function approximation approach to the inference of reduced NGnet models of genetic networks. BMC Bioinformatics 2008, 9: 23. 10.1186/1471-2105-9-23
Huang HL, Chen KW, Ho SJ, Ho SY: Inferring S-system models of genetic networks from a time-series real data set of gene expression profiles. Proceedings of IEEE Congress on Evolutionary Computation: 1–6 June 2008; Hong Kong 2008, 2788–2793.
Voit EO, Almeida J: Decoupling dynamical systems for pathway identification from metabolic profiles. Bioinformatics 2004, 20: 1670–1681. 10.1093/bioinformatics/bth140
This work has been supported by National Science Council of Taiwan (Grant Nos. NSC 99-2221-E-110-077 and NSC 100-2221-E-110-086).
This article has been published as part of BMC Bioinformatics Volume 13 Supplement 7, 2012: Advanced intelligent computing theories and their applications in bioinformatics. Proceedings of the 2011 International Conference on Intelligent Computing (ICIC 2011). The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/13/S7.
The authors declare that they have no competing interests.
YH undertook the experimental implementation, made statistical analysis and wrote a part of the manuscript. WL conceived the project, designed the algorithm, wrote a part of the manuscript, and made modifications as well as final revisions.
About this article
Cite this article
Hsiao, YT., Lee, WP. Inferring robust gene networks from expression data by a sensitivity-based incremental evolution method. BMC Bioinformatics 13 (Suppl 7), S8 (2012). https://doi.org/10.1186/1471-2105-13-S7-S8