Inferring robust gene networks from expression data by a sensitivity-based incremental evolution method

Hsiao, Yu-Ting; Lee, Wei-Po

doi:10.1186/1471-2105-13-S7-S8

Volume 13 Supplement 7

Advanced intelligent computing theories and their applications in bioinformatics. Proceedings of the 2011 International Conference on Intelligent Computing (ICIC 2011)

Proceedings
Open access
Published: 08 May 2012

Inferring robust gene networks from expression data by a sensitivity-based incremental evolution method

Yu-Ting Hsiao¹ &
Wei-Po Lee¹

BMC Bioinformatics volume 13, Article number: S8 (2012) Cite this article

3571 Accesses
12 Citations
Metrics details

Abstract

Background

Reconstructing gene regulatory networks (GRNs) from expression data is one of the most important challenges in systems biology research. Many computational models and methods have been proposed to automate the process of network reconstruction. Inferring robust networks with desired behaviours remains challenging, however. This problem is related to network dynamics but has yet to be investigated using network modeling.

Results

We propose an incremental evolution approach for inferring GRNs that takes network robustness into consideration and can deal with a large number of network parameters. Our approach includes a sensitivity analysis procedure to iteratively select the most influential network parameters, and it uses a swarm intelligence procedure to perform parameter optimization. We have conducted a series of experiments to evaluate the external behaviors and internal robustness of the networks inferred by the proposed approach. The results and analyses have verified the effectiveness of our approach.

Conclusions

Sensitivity analysis is crucial to identifying the most sensitive parameters that govern the network dynamics. It can further be used to derive constraints for network parameters in the network reconstruction process. The experimental results show that the proposed approach can successfully infer robust GRNs with desired system behaviors.

Background

Gene regulatory networks (GRNs) are essential for controlling cellular metabolism and the organismal development. Under the command of transcription factors (TFs), each gene influences the activity of the cell by generating messenger RNA (mRNA) that guides the synthesis of proteins by ribosomes in the cytoplasm (the location in the cell where biochemical reactions and molecular events take place). Gene network modeling uses gene expression data to characterize the phenotypic behavior of a system under study. With the reconstructed networks, biologists can generate and test hypotheses to further understand the complex phenomena that occur in nature systems and to explore the dynamics of those systems.

Modeling GRNs manually on the basis of the experimentally measured time-series data takes a considerable amount of time. Therefore, an automated reverse-engineering procedure is recommended [1, 2]. This procedure involves altering the gene network in some way, observing the outcome, and using computational methods to infer the underlying principles of the network. To derive a realistic model, available domain knowledge (including functional and structural information) can be integrated into the computational methods. Figure 1 illustrates the general procedure of inferring GRNs from quantitative expression data. The inferring/modeling block of this figure indicates the computational procedure used to derive network parameters for a given model, to build and simulate the model, and to evaluate it by comparing the behavior of the inferred model with the original data set. In addition to making use of expression data, one recently developed strategy combines information from various sources to narrow down the search space in the network. This strategy shortens the time and effort required for the validation and discovery of networks. For example, if some gene names are known then they can be mapped into knowledge bases (e.g., the Gene Ontology) to extract biological knowledge (e.g., gene function) that can be used to determine the network structures. Our goal in this study is to establish a methodology for network inference and to investigate aspects of network dynamics that have not been addressed previously.

To infer a network with desired system behavior, the most important steps are to select a network model and then to fit the network's structural parameters to the available expression data. Many models have been proposed to address different levels of biological details, ranging from the very abstract (involving Boolean values only) to the very concrete (including full biochemical interactions with stochastic kinetics). Abstract models are easy to simulate and therefore less computationally taxing, but it has been proven that they are not able to capture certain system behaviors. In contrast, concrete models are more suitable for simulating biochemical process realistically, but, because of their computational complexities, these models can only be applied to small systems. To capture the underlying physical phenomena of a gene network we took one of the most popular and well-studied concrete models, the S-system model, as an example of a biological network, and we used both simulated and collected real gene expression data to reconstruct the model. The S-system model is a type of ordinary differential equation (ODE) model. It consists of a particular set of tightly coupled ODEs in which the component processes are characterized by power law functions [3, 4]. The system structure of an S-system is described by the following equation:

\frac{d x_{i}}{d t} = α_{i} \prod_{j = 1}^{N} x_{j}^{g_{i, j}} - β_{i} \prod_{j = 1}^{N} x_{j}^{h_{i, j}}

Here, x_i is the expression level of gene i, and N is the number of genes in a genetic network. The non-negative parameters α_i and β_i are rate constants that indicate the direction of mass flow. The real-number exponents g_i,j and h_i,j are kinetic orders that reflect the strength of interactions from genes j to i. The above set of parameters defines an S-system model. To infer an S-system model is, thus, to estimate all of the 2N (N+1) parameters simultaneously.

Although non-linear ODEs can more accurately model the system dynamics of gene networks, they are difficult to solve by traditional optimization techniques [2, 3]. To reduce the complexity of such models, Di Bernardo et. al developed an approach that involves taking a series of steady-state gene expression measurements following transcriptional perturbations and reducing the linear ODEs to a linear regression problem that can be solved with relevant techniques [5]. In addition, some researchers employed the strategies of gene clustering for dimension reduction and gene classification for identification of expression patterns to reduce task complexity [6–10]. As evolutionary algorithms (EAs) have been widely used to solve many difficult optimization tasks with good results, they have been suggested as a way to infer gene networks [3, 4]. Among many studies on this topic, the most relevant are those involving the use of EAs to infer S-system models (e.g, [3, 4, 11]). In these studies, the network parameters of the S-system were arranged as a string of floating numbers (i.e., the chromosome in EAs) that could be evolved by genetic operators.

In recent studies of EA-based parameter inference, one critical problem, network robustness [12–14], has not been addressed. It is important to investigate the effect of network parameter perturbations on the overall system. These parameters not only are the numerical components of the model, but also represent the activities or interactions among proteins, transcription factors, and mRNAs in the gene network. Each network parameter has an important role in determining the behavior of a biological system. Parameters that are very sensitive to variation can introduce fragility into the system. In fact, recent research has found that parameters can define the dynamics of a model, and more important, studies have shown that enforcing constraints on the parameters can limit (control) model dynamics [15]. Specifically, by measuring and analyzing the variations of network parameters and their effects on relevant genes, the gene-gene interactions of an inferred model can be interpreted. To ensure the robustness of the inferred network and to further investigate gene interactions, it is important to derive an acceptable value range for each parameter and to restrict the parameter's value to the specified range during the network reconstruction process [13]. Therefore, we take parameter sensitivity into consideration in the network reconstruction procedure so that robust results can be obtained.

Sensitivity analysis (SA) is an indispensable technique that can help researchers to investigate the parameter robustness properties of an inferred network. By varying the parameter values within a certain range and performing statistical calculations to measure the system instability (or fragility), researchers can identify the critical parameters or discover seemingly unimportant parameters that may have a positive or negative influence on a network. SA techniques have been used in many biological studies, such as genetic circuit design [16], mammalian circadian clock modeling [17], and target prediction in signaling pathways [18]. In general, the sensitivity of a parameter is defined as [19]:

S_{P}^{M} = \frac{\partial M / M}{\partial P / P} = \frac{percentage change in M}{percentage change in P}

Here, P represents the parameters that are varied in a given range, M is the mathematical function describing the system behavior, and ∂M means the change in M due to the value changed in ∂P with respect to P.

In a dynamic system, there are two types of methods often used to conduct SA: local SA and global SA. Local SA can measure the sensitivity of a parameter from a given range even if the system structure is unknown, but it only considers one parameter at a time and ignores the interactions between parameters [20, 21]. Alternatively, global SA can examine parameter interactions with different parameter magnitudes simultaneously [22, 23], but this can only be done if the system structure is known in advance. The above two methods have been compared extensively [24, 25]. It should be noted that neither local SA nor global SA is superior: the choice of method depends on the specific application and the information available. In the biological cases that involve the use of SA techniques, the sensitivity of parameters should be investigated from a biological point of view in order to validate the results.

From the perspective of parameter dynamics, we propose a new approach to infer robust networks. Our approach includes two major parts. The first part is a sensitivity analysis procedure that selects sensitive network parameters, determines value ranges for them, and then sends the parameters with their constraints to the inference mechanism. The second part is an evolutionary method responsible for inferring networks from the results obtained in the first part. This approach, to our knowledge, is the first work in network modeling that integrates sensitivity analysis into an inference algorithm to consider both internal (robustness) and external (behavior) characteristics of the inferred network. To validate the proposed approach, a series of experiments has been conducted, in which the proposed SA procedure was coupled with different EAs. The results have been analyzed, and they show that this approach can be used to infer robust networks successfully from artificial and real gene expression profiles.

Methods

Parameter sensitivity analysis

In the process of inferring a gene network from expression data the genes interact with each other, and the network structure is generally unclear during the modeling process. Most studies of global SA use known pathways as examples and can therefore easily select the critical parameters and control the total number of parameters (usually few than 100) for further analysis [23–25]. In the type of network inference problem addressed here, it is not possible to identify and select the most important parameters in order to perform global SA, because no prior information about network structure is available.

To consider multiple network parameters simultaneously in such situations, we present a new method that is a modification of a widely used global SA technique, called multi-parameter sensitivity analysis (MPSA) [12, 19]. MPSA is based on Monte Carlo simulations and on quantitative comparisons of cumulative frequency via corresponding Pearson correlation coefficients among parameters to identify those that are most sensitive. It can be used to explore critical genetic reactions and to examine the robustness of a gene network as other global SA methods do. In the method described here, the objective function of the SA method is the fitness function that is to be optimized in the evolutionary algorithm, which is the mean squared error over the time course:

\sum_{i = 1}^{N} \sum_{t = 1}^{T} {\{\frac{x_{i}^{a} (t) - x_{i}^{d} (t)}{x_{i}^{d} (t)}\}}^{2}

In this expression, x_i^d(t) is the desired expression level of gene i at time t, x_i^a(t) is the actual value obtained from the inferred model, N is the number of genes in the network, and T is the number of time points used to measure gene expression data during the period.

Our modified method, m-MPSA, inherits qualities from MPSA but has several advantages. First, MPSA is easy to implement and modify. It allows users to select one parameter at a time or to consider multiple parameters at the same time to perform sensitivity analysis. This characteristic is crucial because, when the network structure is unknown or uncertain, selecting one parameter at a time is a good way to reduce the complexity of the search space. Researchers can then concentrate on each parameter and temporarily neglect the combination sets at the selection phase. Second, compared with most of the global SA methods, the mathematical equations of MPSA are more manifest and succinct, and the parameter sensitivity is thus easy to calculate. Finally, the value ranges of parameters used in MPSA are the same as in an ODE model. This means that the sensitivity of each parameter can be directly mapped into an ODE model without any extra computation.

In previous studies, researchers have shown that the most influential (sensitive) parameters play key roles in modeling a GRN; that is, a model's behavior varies largely because of its sensitive parameters [13]. Therefore, our current study draws on the characteristics of sensitive parameters, and it incorporates the sensitivity analysis method into the incremental evolution approach to infer a GRN. The proposed m-MPSA method is described in the Sensitivity analysis algorithm below. It uses an iterative process to calculate the sensitivity of each parameter, and it then ranks the sensitivities of all parameters. The input of this algorithm is the set of network parameters to be determined; and the output, a list of parameters ranked by their sensitivity values. With the parameter sensitivities ranked, the evolutionary algorithm can then infer robust solutions by exploiting the sensitive parameters that are likely to significantly influence the genetic model as a whole.

Sensitivity analysis algorithm()

Step 1: Select one parameter, i, at a time. (i = 1, 2, .., μ, μ is the number of network parameters).

Step 2: Set the parameter range R_i for each S-system parameter, i. Initially, the commonly used search regions [-3.0..3.0] and [0.0..10.0] are taken as the initial settings for the parameters of kinetic orders and rate constants respectively, as suggested in [26]. These settings are used to define the parameter range R_i, by adding/subtracting an amount of one third of the initial settings to/from the current value of parameter i.

Step 3: For each R_i, a set of random values are created with a uniform distribution within the specified range. Each random value for parameter i, together with the other (μ-1) parameters, constitutes a candidate solution (In these experiments, 500 random values were generated, as suggested in [19].

Step 4: Construct the mathematical model (i.e., S-system model) and calculate the objective function value for each of the random values generated in Step 3.

Step 5: Determine whether the objective function value of each random point obtained in Step 4 is acceptable or unacceptable by a given threshold C_r (guided by the literature or experience; it is defined as the triple of the best available objective function value in the whole population). The parameter value of a random point is classified as unacceptable if its corresponding objective function value is greater than C_r; otherwise, the parameter value is acceptable. Go back to Step 1 until all parameters have been dealt with.

Step 6: Calculate the sensitivity for parameter i by using its cumulative frequency (CF_i). CF_i measures the similarity of two statistical distributions that are formed by the acceptable and unacceptable values produced from Steps 3-5 for parameter i. The parameters are ranked according to their CF, and the ones with relatively low CF are considered to be sensitive.

Step 7: Output a parameter list in which the parameters are sorted by their CF values.

In addition to presenting the m-MPSA algorithm, we provide here a walkthrough example to illustrate how it operates in practice. In the example, we randomly choose two parameters from one of the datasets (the largest one) used in the experimental section, and we calculate the sensitivity values of these parameters. The dataset consisted of ten nodes, and there were altogether 220 parameters to be estimated. Without losing generality, we assume that the algorithm selected the parameter P₁₂₅ (Step 1), and then set the range R₁₂₅ for P₁₂₅ (Step 2). As mentioned above, we used one third of the original search region (which was [-3.0, 3.0] in the current study) to set up the bounds for each parameter. Hence, if P₁₂₅ had a value of 0.59, then the lower bound for this parameter was -0.41 (i.e., 0.59+1/3×(3.0-(-3.0))), and the upper bound was 1.59. Then, 500 random values were generated within the range [-0.41, 1.59] for P₁₂₅ (Step 3), and their corresponding objective (fitness) function values were calculated (Step 4). Next, a threshold C_r was determined to categorize the above 500 evaluation results as acceptable or unacceptable. If the current best result (out of the 500 evaluations) was 0.52, then a threshold of 1.56 (i.e., 0.52×3) was used to determine whether a result was acceptable or not (Step 5). After that, two statistical distributions were constructed for the acceptable and unacceptable results identified above. For statistical purposes, the value range [-0.41, 1.59] was divided into ten intervals, and for each interval the numbers of acceptable and unacceptable values were counted. Figure 2 (a) shows the distributions for P₁₂₅, in which acceptable and unacceptable numbers were marked. The cumulative frequency distributions were then constructed accordingly, as presented in Figure 2 (b). Finally, the sensitivity value of P₁₂₅ was obtained by measuring the correlation coefficients of the two (acceptable and unacceptable) cumulative frequency distributions (Step 6). In this case, the sensitivity value of P₁₂₅ was 0.8912.

A similar procedure was performed for the other parameter, P₁₄₅, and a sensitivity value of 0.7695 was obtained. Because P₁₄₅ had a sensitivity value lower than that of P₁₂₅, it was considered to be more sensitive than P₁₂₅. In this way, after sensitivity values of all parameters were calculated, they were ranked and sent to the incremental evolution algorithm.

As mentioned above, more sensitive parameters are more likely to have a significant influence on the entire genetic model. Therefore, if no prior knowledge can be applied to the reconstruction of a large GRN, researchers can conduct sensitivity analysis for parameter selection to guide the network inference method, and observe how numerical quantities within different intervals can formulate candidate results (to determine the model dynamics). Taking P₁₄₅ as an example, the acceptable values for this parameter fell in the range [-0.63, 0.17] that consisted of 167 acceptable network models (dynamics) in total. We can exploit this result to infer P₁₄₅ and explore the relationships of this parameter with other genetic reactions (or parameters). In fact, researchers have shown that the most influential (sensitive) parameters play important roles in modeling a GRN. That is, the behavior variation of a network model largely depends on its sensitive parameters [13].

In Figure 2, we can see that the variation of the parameter value range influenced the system dynamics. That is, the system robustness was significantly affected by the value range specified for a parameter. For instance, the two distributions for P₁₄₅ in Figure 2 (c) show that, if the value range of P₁₄₅ changed from [-0.43, -0.23] to [0.37, 0.57], then the acceptable values for P₁₄₅ changed from 56 to 0. This means that the inference algorithm can maintain the system robustness if it can locate P₁₄₅ within the interval [-0.43, -0.23]. The above analysis concludes that the proposed SA method can be used to find sensitive parameters and to derive suitable value ranges as search constraints, which can then be used in the inference algorithm to construct robust networks.

Evolutionary algorithm for parameter optimization

To derive the network parameters of S-system model, we also implement an evolutionary algorithm (EA) for parameter optimization to work with the above SA method. EA is a population-based approach that evaluates many solutions simultaneously in the search space, and it is likely to find a global solution for a given problem. Recently, a new population-based optimization technique, particle swarm optimization (PSO, [27]), was proposed as an alternative to the traditional EAs. PSO tries to mimic the goal-seeking behavior of biological swarms. The standard PSO algorithm contains a set of particles and operates in an iterative manner. Each particle is characterized by its position and velocity, and it moves in the solution space. The position of each particle represents a potential solution that is evaluated by a predefined fitness function. PSO has some attractive characteristics. In particular, it has memory, so that knowledge of a good solution can be retained by all particles (solutions). During the iterative search process, each particle remembers its previous best position and the best position of any particle in the swarm. Then the particle uses the position information to modify its position and velocity, and continues its movement in the search space. The details of the PSO algorithm can be found in [27]. Some performance comparisons between PSO and the most popular EA method, the genetic algorithm (GA), have been made, underscoring the reliability and the convergence speed of both methods. However, the result tends to be inconclusive: each of the algorithms has shown better performance than the other for some particular applications. Consequently, hybrid techniques were proposed to effectively exploit the qualities and the uniqueness of the two methods. It is now commonly agreed that the hybrid methods can lead to further performance improvements [28–30]. Therefore, in this work, we develop a hybrid method to exploit the solution memory of PSO and the genetic operations of GA for network inference.

As has been mentioned, our primary goal is to investigate how parameter sensitivity can be used to guide an EA and derive robust networks. To concentrate on this issue, we have chosen to implement a popular GA-PSO method (to avoid the deviations caused by different computational methods) for parameter optimization, and we have built our SA method upon it. Breeding Swarms [30] is a frequently used GA-PSO hybrid method that has demonstrated its good performance in many benchmark test functions. Therefore, we adopt this method for our network inference application.

The hybrid GA-PSO method used in this work is illustrated in Figure 3. Initially, a population is randomly generated and evaluated. The individual solutions are ranked according to their fitness values, and then they are divided into two parts for running the PSO and GA methods separately. As shown in Figure 3, the best individuals (including (1-r)% of the whole population) are preserved and enhanced by the PSO procedure. They are then sent to the next generation. Meanwhile, on the right-hand side of the figure, the remaining individuals (i.e., r% of the population) are discarded. To replace the removed individuals, a tournament selection scheme is used to choose the same number of individuals from the best ones (before they are updated by PSO), and the selected individuals are used to create new individuals by the GA procedure.

In our implementation, we take a direct encoding scheme to represent solutions for both the GA and PSO parts, in which the network parameters (i.e., α_i, β_i, g_i,j, and h_i,j in the S-system model) are arranged as a linear string chromosome of floating-point numbers. The goal here is to minimize the accumulated discrepancy between the gene expression data recorded in the dataset and the values produced by the inferred model. Therefore, the error (objective or fitness) function defined previously is used directly for performance measurement. For the PSO part, the equations for updating the particle's velocity and position are the same as the ones listed in the original PSO work [27], and for the GA part, the operations of crossover and mutation described in [31] are used.

Sensitivity-based incremental evolution method for network inference

As described in the first section, to infer the S-system model is to determine the 2N (N+1) parameters simultaneously. Solving this high-dimensional problem is difficult, especially when the complexity of regulation increases along with the number of genes involved. One promising approach to this problem is to adopt the concept of incremental evolution to infer the large number of parameters. The underlying principle of incremental evolution [32, 33] is that a population is first evolved to solve an easier version T' of the original complex task T, in which the solution region of T is more accessible from region T'. More task versions with incremental complexity can be arranged so that the original task can be achieved progressively. Evidently, the main task involved in implementing incremental evolution is the formulation of a scheme to transfer the goal task into another more evolvable task. In the process of task transformation, the structure of the environment and the overall goal must be preserved. This can usually be achieved by arranging the task sequence manually, or, alternatively, it can be done using an automated procedure. In this work we modify the cutting plane mechanism used in the high-dimension function optimization problem to develop an adaptive strategy to automatically perform incremental evolution.

In network inference, solving an easier version of the original task means evolving partial solutions (i.e., subsets of all network parameters) that can provide some guidance for the search and move toward the target solution gradually. In the proposed method, parameter sensitivities are calculated and used to determine the priorities of the parameters to be evolved (optimized). The sensitive parameters are selected iteratively, by the proposed SA method and the evolutionary method is employed to search the parameter dimensions accordingly.

An exploration phase is also included in the algorithm for the injection of random effects. This maintains population diversity and avoids a situation in which individuals move near locally optimal solutions. The SA and exploration procedures are both performed periodically with specific generation intervals. In this way, the overall solutions can be derived gradually, and the inferred networks are more robust to internal variations. The flow of the proposed SA-based incremental evolution approach is illustrated in Figure 4, and the details of the approach are described below.

Sensitivity-based incremental evolution algorithm()

Step 1: Initialize the population and start the evolution. Note that Steps 2 and 3 are performed only at certain generations (SA interval and exploration interval, respectively).

Step 2: Perform sensitivity analysis (call the Sensitivity analysis algorithm described above) to obtain a list of parameters ranked by their sensitivity. This step is performed only if the evolutionary cycle reaches a pre-defined generation number (every 500 generations in this work).

(a)
Use a threshold (i.e., the correlation coefficient ratio of the parameter CF values obtained from the SA procedure) to select the most sensitive parameters. Then update the threshold in order to consider more parameters in the next sensitivity analysis iteration.
(b)
Set new value ranges for all network parameters. Sensitive parameters are given tight bounds, and insensitive parameters are given loose bounds.

Step 3: Start the exploration procedure, again only if the evolutionary cycle reaches a pre-defined generation number (every 1000 generations in this work). In this phase, a random value is generated for each parameter from its new bounds, replacing each parameter's current value.

Step 4: Continue the evolutionary computation, using the bounds obtained in Step 2 to restrict the corresponding parameters.

In the above algorithm, the most critical feature for enabling incremental evolution is found in Step 2. The strategy is to perform the sensitivity analysis for network parameters and to specify constraints on them. A network parameter (in other word, a search dimension) is added to the sensitive list and given a pair of tight constraints (specified by the upper and lower bounds) if its sensitivity value is less than (or equal to) the threshold; otherwise, a parameter is added to the insensitive list and given loose constraints. In the above procedure, the threshold used in Step 2(a) to identify the sensitive parameters has an initial value of 0.84 (chosen from the preliminary study), and this value is increased gradually with a step value of 0.1. The upper and lower bounds of a sensitive parameter in Step 2(b) are determined from the best value of this parameter available so far by adding and subtracting an amount equal to twice the value of a small constant (which is equal to the velocity bound generally used in a PSO-based method – 0.2 in this work). Bounds for the insensitive parameters are defined in a similar way, except that a value of five times the same constant is used.

The dimensions in the sensitive list have higher priorities to be searched, and the corresponding parameter values need to be determined at an earlier evolutionary stage. As the algorithm describes, the search is accomplished by setting small-range bounds for each sensitive parameter in accordance with the best value found so far, and the evolutionary procedure is performed next to infer a suitable parameter value within the specified bounds. The creation of constraints for parameters is important in the search for robust solutions, especially when a non-deterministic search method (such as the evolutionary algorithm) is employed to derive parameter solutions (because many feasible but fragile solutions could be evolved). If the parameter value exceeds the specified boundary value, the system dynamics will be influenced, and consequently, the network behaviors will change. To generate tight constraints for sensitive parameters in the search process is in fact to restrict search regions for the integrated algorithm, which can thus obtain robust networks with desired behaviors. As indicated in [34], by observing how parameters with varied boundaries direct a system to move toward different system dynamics, we can obtain new insights into a complex biological system and understand the principle of biological design.

As the evolutionary method proceeds, parameters in the insensitive list will be moved to the sensitive list incrementally as the threshold is gradually increased. Therefore, more parameters will be considered sensitive and their values will be determined. Eventually, all the parameter values will be obtained. Because sensitive parameters are more influential to network behavior, enforcing tight constraints on these parameters maintains the robustness of the entire genetic network. At the same time, it allows other parameters to evolve during the search process. In this way, the proposed method can infer robust network models with the desired system behavior.

Results and discussion

To verify the proposed approach, we conducted a series of experiments on several datasets collected from the literature. As mentioned, our main goal was to explore the use of sensitivity analysis to infer robust networks, not to compare different computational methods. Therefore, without losing generality, we chose to implement three computational methods, including the traditional PSO method, a GA-PSO hybrid method, and a differential evolution (DE) method [35], and to build the proposed m-MPSA on them. The main reason to select the above three methods is that they are the most representative and popular evolutionary algorithms used for optimization tasks. The same SA strategy can also be embedded into other methods if their implementation details are available.

The first and second phases of the experiments involved comparing the external behavior and internal robustness of the networks inferred by different methods. Four datasets were used. For each dataset, the above three algorithms were performed with three different settings: (1) using only the original algorithms; (2) using our m-MPSA to select sensitive parameters and to derive parameter constraints in the incremental evolution process (i.e., the proposed approach); and (3) using m-MPSA to select sensitive parameters, but with random value bounds during the incremental evolution process. The third setting was mainly included to demonstrate the importance of using appropriate bounds in the evolutionary process. In addition, one real dataset was used in the third phase, which made it possible to investigate how the proposed approach can be applied to the study of real gene networks. This dataset came from a study of gene expression in the SOS DNA repair system of E. coli. For this dataset, network models were inferred and the critical parameters were analyzed and discussed.

Performance of sensitivity analysis in network modeling

In this set of experiments, four datasets (with more genes than most examples encountered in the literature) were used to evaluate the proposed approach. The first dataset was a five-node artificial network that has become a popular model for comparing different methods in recent studies on network inference [36]. To collect time series data, we started network operations and continued operations for thirty simulation time steps. The nodes had the following relationships:

\begin{gathered} Ẋ_{1} = 15.0 X_{3} X_{5}^{- 0.1} - 10.0 X_{1}^{2.0} \\ Ẋ_{2} = 10.0 X_{1}^{2.0} - 10.0 X_{2}^{2.0} \\ Ẋ_{3} = 10.0 X_{2}^{- 0.1} - 10.0 X_{2}^{- 0.1} X_{3}^{2.0} \\ Ẋ_{4} = 8.0 X_{1}^{2.0} X_{5}^{- 0.1} - 10.0 X_{4}^{2.0} \\ Ẋ_{5} = 10.0 X_{4}^{2.0} - 10.0 X_{5}^{2.0} \end{gathered}

The second dataset was taken from [6]. It was an eight-gene network created manually by the popular GRN simulation software tool Genexp. The third dataset was a ten-node network previously described in [4], given by the following equations:

\begin{gathered} Ẋ_{1} = 5.0 X_{4} X_{6}^{- 2.0} - 10.0 X_{1}^{2.0} \\ Ẋ_{2} = 10.0 X_{3} X_{8}^{1.0} - 10.0 X_{2}^{2.0} \\ Ẋ_{3} = 8.0 X_{1}^{- 1.0} X_{4}^{- 1.0} - 10.0 X_{3}^{2.0} \\ Ẋ_{4} = 10.0 X_{5}^{2.0} X_{9} - 10.0 X_{4}^{2.0} \\ Ẋ_{5} = 10.0 X_{2}^{2.0} X_{6}^{- 1.0} - 10.0 X_{5}^{2.0} \\ Ẋ_{6} = 5.0 X_{9}^{2.0} X_{10}^{- 2.0} - 10.0 X_{6}^{2.0} \\ Ẋ_{7} = 10.0 X_{6} X_{10}^{- 1.0} - 10.0 X_{7}^{2.0} \\ Ẋ_{8} = 5.0 X_{1} X_{2}^{- 2.0} X_{7} - 10.0 X_{8}^{2.0} \\ Ẋ_{9} = 10.0 X_{3} X_{8}^{- 2.0} - 10.0 X_{9}^{2.0} \\ Ẋ_{10} = 8.0 X_{1}^{2.0} X_{7}^{- 1.0} - 10.0 X_{10}^{2.0} \end{gathered}

The fourth dataset was part of a real experimental dataset from a study of the rat central nervous system (CNS) [37]. The original dataset included expression data for 112 genes collected at 9 time points (including the embryonic, postnatal, and adult stages). In this experiment, we selected a group of eight nodes representing the largest sub-network of the representative cluster reported in [6] as the target network. For the above four datasets, three inference algorithms (PSO, GA-PSO, and DE) with three different settings were arranged, and twenty independent runs were conducted for each arrangement. The population sizes for the four datasets were 800, 1000, 1600, and 1000, respectively. Tables 1, 2, 3, 4 show the results, in which the mean, standard deviation, and best and worst performance of the runs are listed for each arrangement.

Table 1 Fitness values for dataset 1

Advanced intelligent computing theories and their applications in bioinformatics. Proceedings of the 2011 International Conference on Intelligent Computing (ICIC 2011)

Inferring robust gene networks from expression data by a sensitivity-based incremental evolution method

Abstract

Background

Results

Conclusions

Background

Methods

Parameter sensitivity analysis

Sensitivity analysis algorithm()

Evolutionary algorithm for parameter optimization

Sensitivity-based incremental evolution method for network inference

Sensitivity-based incremental evolution algorithm()

Results and discussion

Performance of sensitivity analysis in network modeling

Evaluation of network robustness

Evaluation of the proposed approach on a real dataset

Inference of expression profiles with the decomposed S-system model

Analysis of critical parameters of the SOS repair system

Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors' contributions

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us