 Research
 Open access
 Published:
Reconstructing gene regulatory networks of biological function using differential equations of multilayer perceptrons
BMC Bioinformatics volumeÂ 23, ArticleÂ number:Â 503 (2022)
Abstract
Background
Building biological networks with a certain function is a challenge in systems biology. For the functionality of small (less than ten nodes) biological networks, most methods are implemented by exhausting all possible network topological spaces. This exhaustive approach is difficult to scale to largescale biological networks. And regulatory relationships are complex and often nonlinear or nonmonotonic, which makes inference using linear models challenging.
Results
In this paper, we propose a multilayer perceptronbased differential equation method, which operates by training a fully connected neural network (NN) to simulate the transcription rate of genes in traditional differential equations. We verify whether the regulatory network constructed by the NN method can continue to achieve the expected biological function by verifying the degree of overlap between the regulatory network discovered by NN and the regulatory network constructed by the Hill function. And we validate our approach by adapting to noise signals, regulator knockout, and constructing largescale gene regulatory networks using linkknockout techniques. We apply a real dataset (the mesoderm inducer Xenopus Brachyury expression) to construct the core topology of the gene regulatory network and find that Xbra is only strongly expressed at moderate levels of activin signaling.
Conclusion
We have demonstrated from the results that this method has the ability to identify the underlying network topology and functional mechanisms, and can also be applied to larger and more complex gene network topologies.
Background
The growth and development of organisms and their responses to internal and external stimuli are controlled by complex internal regulatory mechanisms, including the gene level. Gene regulation network is the mapping of complex regulation mechanism in organism at gene level. At the molecular level and in the microscopic domain, the function of genes is understood as the interaction behavior of complex networks. Cell function is controlled by the interconnections between gene expression mechanisms and gene regulation. The mapping between gene interactions and functions is one of the main research topics in systems biology [1].
Cellular networks undergo steadystate or oscillatory stimulation signals, which provide a way to reconstruct network topology. To understand how the interrelationships of genes in living organisms respond accurately to external signals and perform their functions robustly. For example, the adaptive function of cells [2] refers to the ability of the system to respond to signal changes and return to the prestimulated level, which is the key for living systems to perceive largescale changes [3]. The transient nature of this stimulus response is important to prevent cells from experiencing uncontrolled proliferation or apoptosis [4]. For example, nuclear enrichment of MAP kinase Hog1 completely adaptes to changes in external osmotic pressure and is robust to very low signal fidelity and operating noise [5].
In the construction of small networks, enumeration search [6] has obvious effect on listing all possible network topology modules, but in larger and more complex networks, enumeration method is difficult to calculate. At present, the models used for gene regulation network modeling mainly include the following: Boolean network, Bayesian network, differential equation, etc [7]. Boolean network is a relatively simple model, and the simulation of the system is fixed and relatively rough; Bayesian network is a probabilistic model that can quantitatively and randomly describe the control network; Differential equations can quantitatively and accurately predict the system behavior; Modeling and reconstructing gene regulatory networks from time series data, most of the existing methods [8,9,10,11,12] are based on ordinary differential equations (ODE).
Ordinary differential equation models include linear differential equations and nonlinear differential equations. Linear differential equation models have been used to infer largescale gene regulatory networks due to their simple structure and few parameters and expression data. For example, Matsumoto et al. [13] proposed the SCODE algorithm based on linear ordinary differential equations to study gene regulatory network information related to the process of cell differentiation. They first performed singlecell sequencing on individual cells, and then used the algorithm to assess differences in expression patterns between individual cells. Aubin et al. [14] proposed the GRISLI method that infers a velocity vector fields in the space of scRNAseq data from profiles of individual cells, and models the dynamics of cell trajectories with a linear ordinary differential equation to reconstruct the underlying GRN with a sparse regression procedure. Although linear regulatory functions can describe network regulatory system, gene regulatory networks are mostly nonlinear. Many classical nonlinear differential equations that conform to the laws of biochemistry have been proposed to infer GRN, such as Ssystem model [15], Hill function. In recent years, the Ssystem model has been widely utilized to infer GRN and biochemical reactions, which follows the theory of a biological system [16], Since the structure of the Ssystem model is fixed, heuristic search algorithms have been used to search for the optimal parameters of the Ssystem model. , such as differential evolution (DE) [17], cooperative coevolutionary algorithm [18], sensitivitybased incremental evolution method [19], bat algorithm (BA) [20], immune algorithm (IA) [21], firefly algorithm [22], dissipative particle swarm optimization (DPSO) [23], cockroach genetic algorithm (CGA) [24], hybrid algorithm based on genetic algorithm (GA) and PSO [25].
Hidde De Jong [7] proposed the Hill function as a regulatory function. Hill functions are considered suitable for building GRN models with ODEs [7, 26]. They can quantify activation and inhibition effects of genes. The regulating function can also be sigmoid function [27] commonly used in neural networks, referred to as Stype function, whose input and output characteristics are usually expressed by logarithmic curve or tangent curve. It introduces the necessary nonlinearity and defines an upper bound on the rate of change in molecular concentration. The advantage of this neural networkbased differential equation model is that a large number of effective learning algorithms have been developed for the learning of parameters in the regulatory network. For example, Mattias Wahde [28] provided a differential equation system based on feedback neural network, the regulation function is the commonly used logarithmic Sigmoid activation function, and the parameter estimation adopts genetic algorithm. Shen et al. [29] believed that deep learning could be used to search network topology more effectively and train deep neural network to find satisfactory network topology by relying on trajectory and error. The idea is to learn differential equations in data, i.e. use a neural network to train an accurate potential unknown dynamical system [30, 31], the model is difficult to be extended to a large number of genes, so it is difficult to describe the complex behavior of the system.
Inspired by the above methods, we propose a multilayer perceptronbased differential equation method, which specifically transforms the gene regulation network(GRN) system into an inputoutput regression problem, where the input is gene expression data and the output is the derivative estimated from the expression data. Our method utilizes timeseries gene expression data to train a regulatory function that simulates the transcription rate of a gene, which is a fully connected neural network(NN) with a fourlayer structure. The fully connected neural network is trained by using the gene expression of the previous moment to predict the gene expression of the next moment, and using the loss function between the obtained prediction result and the real gene expression for feedback training. After training the model, the link knockout technique is used to set the expression value of a gene to 0 and determine the regulatory relationship between genes by looking at the influence of the gene on the synthesis rate (see Materials, Methods and Results for a detailed description). Figure 1a illustrates the detailed work of the overall framework. Figure 1b is used as an example to fully understand the composition of the regulatory relationship between the three genes. The control variable method is used to obtain the relationship between the synthesis rate and the gene over time. When the synthesis rate of gene 1 is restricted to 0, That is, taking gene 1 as the stimulus signal, looking at the changes of the three genes and their corresponding synthesis rates over time, and obtaining the final regulatory relationship between individual genes through the crosssectional view of the fully connected neural network.
In this paper, we verify whether the regulatory network constructed by the NN method can continue to achieve the expected biological function by verifying the degree of overlap between the regulatory network discovered by NN and the regulatory network constructed by the Hill function (HF). Moreover, our method is verified by three cases: adaptive noise signal, link knockout, and using link knockout technology to build largescale gene regulatory network. And apply the real dataset (the mesoderm inducer Xenopus Brachyury (XBra) expression) to construct the core topology of gene regulatory network. The resulting network topology can be intuitively explained by the concentration changes between genes, and many target functions can be achieved by comparing the resulting network with existing biological networks.
Results
Simulation studies
In this section, we conduct some simulation studies to empirically evaluate our proposed framework under different settings. In what follows, we demonstrate our proposed ability to construct gene regulatory networks in three scenarios: One is a regulatory network that can adapt to the influence of Gaussian white noise, and the other is the simulation of link knockout. The regulatory network obtained by training NN is redundant, and the core gene regulatory network can be obtained by link knockout. The last is to use linked knockouts to construct largescale gene regulatory networks.
Case one: adaptation
Since adaptive systems often operate in noisy environments, exploring the adaptive properties and noise immunity of the network is the main goal of this section. It can be seen from Fig.Â 1 that the fully connected neural network (NN) transfers hidden information from the previous time point to the next time point, and we ask the output node \(g_3\) to perform the adaptation function (Fig.Â 1b). The input node \(g_2\) has no functional limitation, but can play an adjustment role. The input node \(g_1\) is used as the input signal I.
Therefore, as shown in Fig.Â 2(1), the time evolution of the input signal (I) after adding noise, the expression level of \(g_3\) (blue line) is basically consistent with its target time course value (blue line) Dotted line), with a fast response phase and a slower recovery phase. And it can be intuitively seen in Fig.Â 2 that \(f_3\) and \(f_2\) change with the increase or decrease of the input \(g_3\), \(g_2\), I, it is easy to know the adjustment logic hidden in NN, and directly construct Gene regulatory network (Fig. 2(3)). Our method achieves a reliable response to Gaussian white noise.
In the context of biological regulatory networks, we need to verify whether the resulting network is biologically feasible. Here, in order to check whether the network obtained by the NN model (Fig.Â 1b) is reliable, the f term in Equation 2 is represented by the hill function, which is widely used in biological regulation modeling. The obtained gene regulatory network (Fig.Â 1b(3)) can be successfully transferred to the hill function model (Table 1), and the expected adaptive function can be achieved. Similarly, The obtained gene regulatory network (Fig.Â 2(3)) can be successfully transferred to the hill function model (Table 2).
Case two: linkknockout knockout
According to Formula 5, the regulation logic of the gene regulatory network can be obtained by knocking out the difference before and after gene expression obtained by a certain edge, such as knocking out the regulatory link from \(g_3\) to \(g_2\), by setting \(g_3\) to 0 when calculating \(f_2\). Four examples of link knockout with \(\lambda\) set to 0 are shown in Fig. 3(1). The first panel shows that \(g_3 = 0\) is entered into NN to obtain the value of \(f_2\). \(g_2\) expression level increases from the lighter orange line to the brighter orange line, indicating that \(g_3\) inhibits \(g_2\). The second panel shows that \(g_2\) is set to 0 and input into NN to obtain \(f_3\). In the figure, the expression level of \(g_3\) decreases from lighter blue to darker blue, indicating that \(g_2\) stimulates \(g_3\). In the third panel, \(g_3\) was set to 0 and input into NN to obtain \(f_3\). The expression level of \(g_3\) increased from lighter blue to darker blue, indicating \(g_3\) selfinhibition, and in the same way, the fourth panel showed \(g_2\) selfactivation.
As above mentioned in this paper, through the order to remove unnecessary link to adjust repeatedly sparse network. The Fig.Â 3(2) depicts the sensitivity(response peak) and adaptive error(difference between the prestimulus and the fully adapted \(g_3\) levels) of the network sequence. The network shown in Panel #1 in Fig.Â 3(2) includes a basic adaptive function: incoherent feedforward loops. #1 is the regulatory network learned by NN without any constraints, which has redundancy. By linking knockout technology, applications to the existing links knockout, find the smallest change after deleting network, links to knock out after retraining within NN can get effective gene regulation network of sparse #4 (in sensitivity and adaptation error is equal). #4 is the adaptive function with the least links to achieve the minimum incoherent feedforward network.
Case three: Largescale gene regulation networks are constructed using link knockout techniques
To apply to largescale data, we evaluate the performance of our model using two datasets, each containing timeseries expression profiles. Timeseries data reflects how the network responds to perturbations and how it recovers after the perturbations are removed. The first one is the simulation data, we choose the InSilico_Size100 dataset from the DREAM4 In Silico Network Challenge [32]. The second one is from the real dataset, we select a largescale E. coli dataset (GSE20305) from the Gene Expression Omnibus (GEO) database [33]. The gold standard benchmark for E.coli consists of part from DREAM5 challenge [34] and other experimentally verified part from RegulonDB [35]. GSE20305 [33] provides real gene timeseries data of E. coli under different experimental environments. We choose the data under the three conditions (cold stress, heat stress and oxidative stress) to make up our experimental dataset. The specific information of each dataset is shown in TableÂ 3.
To evaluate the effectiveness of our method on the datasets in TableÂ 3, Serveral methods are chosen as baselines as follows:

GENIE3 [36]: an approach to infer gene regulatory networks from gene expression data. It trains a random forest model that predicts the expression of each gene in the dataset and uses the expression of transcription factors (TFs) as input.

BiXGBoost [37]: it is a bidirectionalbased method by considering both candidate regulatory genes and target genes for a specific gene. Moreover, BiXGBoost utilizes time information efficiently and integrates XGBoost to evaluate the feature importance.

SIGNET [38]: a deep learningbased framework for capturing complex regulatory relationships between genes under the assumption that the expression levels of transcription factors participating in gene regulation are strong predictors of the expression of their target genes.

GNIPLP [39]: an approach to infer GRNs from timeseries or nontimeseries gene expression data. GNIPLR projected gene data twice using the LASSO projection (LSP) algorithm and the linear projection (LP) approximation to produce a linear and monotonous pseudotime series, and then determined the direction of regulation in combination with lagged regression analyses.

PoLoBag [40]: it is an ensemble regression algorithm in a bagging framework where Lasso weights estimated on bootstrap samples are averaged. These bootstrap samples incorporate polynomial features to capture higherorder interactions.
For a fair comparison of the above methods in this experiment, we always use the default parameters when running the program. We systematically evaluate the model using seven evaluation metrics, namely True Positive Rate(TPR), False positive rate(FPR), Matthews correlation coefficient (MCC), Accuracy(ACC), Fmeasure(F1), Area Under the Receiver Operating Characteristic curve(AUROC), Area under the precisionrecall curve (AUPR). Experimental results for each method provide all predicted edges and their corresponding weights. The higher the weight, the higher the credibility of the regulatory relationship. Since different thresholds construct different GRN, the FPR, TPR, MCC, ACC and F1 measures are also correspondingly different.
The experiments are first performed on the DREAM4 InSilico_Size100 five networks. The edge weights predicted by all methods are sorted, and the first 250 predicted values are set to 1, and the other predicted values are set to 0. The following five indicators are calculated as shown in TableÂ 4. The results in TableÂ 4 show that our method outperforms the comparative methods, which indicates that our method can construct regulatory networks of timeseries gene expression data by linking knockout techniques. In order to comprehensively consider the experimental results under different thresholds, we choose AUROC and AUPR values as evaluation criteria. Due to the randomness of the NN, the results will be different from run to run. In our experiments, these methods are ran 10 times and the results are presented in Fig.Â 4. For Network 4, the GENIE3 method outperforms the rest on AUROC. In InSilico_Size100 Networks 1â€“3 and 5, our method has higher average AUROC, and the average AUPR number on Networks 1â€“5 is better than other methods.
For the E.coli dataset, we sorted the edges predicted by all methods and set the predicted value of the first 3080 edges to 1, and the value of the other predicted edges to 0. The FPR, TPR, MCC, ACC, and F1 measures are calculated between the predicted labels and the ground truth labels. The results are shown in TableÂ 5. As shown in TableÂ 5, Our methods perform the best. In order to consider the case of different thresholds, we show the results of ten average runs of all methods in Fig.Â 5. In particular, all available methods obtain worse results with less than 0.05 AUPR values on E.coli network (Fig.Â 5). This is due to the fact that AUPR tends to present smaller values on largescale networks. Compared with the other five methods, our method achieves the best AUROC and the best AUPR. To test the efficiency of our method, we compare the running time of the six methods on a 32GB RAM, Intel(R) Xeon(R) CPU E52630 computer. The comparison results on the DREAM4 InSilico_Size100 and E.coli datasets are shown in TableÂ 6. The table shows the average running time values of the six algorithms executed 10 times. Our method is relatively faster than other stateoftheart methods.
Application to the Xenopus Brachyury (XBra)
In this section, real data are used to demonstrate the effectiveness of our method in a complex situationâ€”the Activin/GSC/Xbra System. This is a wellresearched system, including experiments and modeling [41]. Here, the fully connected data network model is used to solve the inverse problem, finding the core gene regulatory network given the observed gene expression of the Xenopus Brachyury as the desired output. The results obtained by NN were compared with known biological networks.
Gene regulation in Fig. 6 follows the NN modeling in Fig. 1a, where g and f are threedimensional vectors (Activin, Goosecold, Xbra) and input signal \(g_1\)(Bcd). The expression of three genes is taken from the study of Green et al. [41], and the morphogenetic gradient (Bcd) is regarded as static. The results of 40 repetitions of NN training were all overlapped with the target graph (as shown in Fig.Â 6). When using the link knockout method, the gene regulatory network obtained is consistent with the known network structure. In TablesÂ 7, the frequency with which the link is activated, nonexistent, and inhibited by 40 repetitions of training is listed, and in accordance with the majority coloring (orange activated, blue inhibited), the majority network (Fig.Â 6 on the right) has a very similar structure to the known biological network revealed in experiment [41]. This experiment helps demonstrate the effectiveness of our method on real data.
Discussion
In this paper, we propose a multilayer perceptronbased differential equation method, which operates by training a fully connected neural network (NN) to simulate the transcription rate of genes in traditional differential equations. From the dataset validation results, our algorithm is superior to other methods, and its good performance is attributed to the use of neural networks to simulate unknown dynamical systems. This has many advantages. First, there is no detailed mathematical equation format for using the inputoutput function of a multilayer perceptron. Training a neural network is to establish the necessary logical connections between input and output nodes, without specific constraints. Second, fully connected neural networks can speed up model training and scale to largescale complex gene regulatory networks. Finally, neural networks are well suited for building gene regulatory networks on timeseries gene expression data due to their limited shortterm memory advantage.
Our goal is to visually explain how gene regulatory networks (GRNs) achieve concentrationdependent responses. However, the number of different mechanisms that may exist in cells, such as feedback or local cellcell communication, is unclear. Some welldefined biological functions may have broad kinetic interpretations (even for relatively simple threegene networks and limited forms of modeling). There are more complex cellular processes that cannot simply be attributed to activating or repressive regulation. The structure of the regulatory network itself should be reexplored in a more comprehensive context. The method developed here can provide ideas for further exploration of reconstructed gene regulatory networks in the future, and interesting future research topics can apply our method to different realworld biological and biomedical data problems.
Conclusions
A longstanding question in biology is how complex biological networks perform complex regulatory functions. One strategy is to exhaustively search all possible biological networks for single or multiple functions, which is only suitable for small gene networks. For a biological network of four genes, the computational complexity of the exhaustive search method is enormous. In this study, we propose a multilayer perceptronbased differential equation method. Figure 1a illustrates the specific work of the whole framework. Our method utilizes timeseries gene expression data to train a regulatory function that simulates the transcription rate of a gene, which is a fully connected neural network (NN) with a fourlayer structure. The fully connected neural network is trained by using the gene expression of the previous moment to predict the gene expression of the next moment, and using the loss function between the obtained prediction result and the real gene expression for feedback training. After the model is obtained after training, the link knockout technique is used to set the expression value of a gene to 0, and the regulatory relationship between genes can be judged by looking at the effect of the gene on the synthesis rate.
First we verify the adaptation function of our method. The adaptive function is performed by training a NN, and our method also performs well in the presence of Gaussian white noise on the internal and external stimulus signals. Then, through the link knockout technique, redundant links are eliminated from the gene regulatory network trained by NN, and an effective core gene regulatory network is finally obtained. Finally, to validate our approach on largescale datasets, we use InSilico_Size100 time series simulation data and E.coli real datasets. Our model is compared with three stateoftheart regression models on these two datasets. Experiments show that our method performs well in all six networks, which proves the good scalability and adaptability of our method. In addition to validating on a largescale real dataset, we also validate our method on a real dataset (Xenopus laevis) with five genes to demonstrate its effectiveness. Our method can help discover the regulatory logic and network topology of complex tasks. For the resulting network topologies, it is possible to intuitively explain how their structures generate their functions, thus linking network topology to function.
Methods
Nonlinear ordinary differential equation models
In the gene regulation system, the time effect variable xi is used to represent the expression level of the ith gene at time t, and the value of this variable is nonnegative. Then, the regulatory relationship between n genes in the system can be expressed by ordinary differential equations: One is a regulatory network that can adapt to the influence of gaussian white noise, and the other is the simulation of link knockout. The regulatory network obtained by training NN has redundancy, and the most core gene regulatory network can be obtained by link knockout. The last is the use of linked knockouts to build largescale gene regulatory networks.
The above equations are also called kinetic equations. where \(\frac{dx_i}{dt}\) indicates the rate of change of the expression level of the ith gene at time t, \({x_1},{x_2},\ldots ,{x_n}\) represents the expression level of each gene. Therefore, the expression change rate of the ith gene at time t depends on the expression levels of other genes, including its own expression level \(x_i\). The structure of the function \({f_i}({x_1},{x_2},\ldots ,{x_n})\) on the righthand side of Equation 1 indicates the internal regulatory mechanism between genes, that is, the structure of the regulatory network.
In most cases, the interactions between genes exhibit complex nonlinear relationships. At this time, the nonlinear regulation function \(f_i(x)\) can better explain the real situation in the organism, it is usually considered that the function f is a continuously differentiable and monotonically increasing bounded function. Here, we use the hill function to model the complex GRN structure. The dynamics of this GRN can be modeled as:
Here \(h_{ij}^ + = \frac{{{b_{ij}}g_j^n}}{{K_{ij}^n + g_j^n}}\) represents the activation item, and \(h_{il}^  = \frac{{K_{il}^n}}{{K_{il}^n + g_l^n}}\) represents the inhibitory item. For simplicity, we set the Hill coefficient \(n = 2\) in the enumeration study. Each activation link \(h_{ij}^ +\) has two parameters K and b, while the inhibitory link \(h_{il}^ \) has only one parameter K. For each network topology, the network topology is considered â€™successfulâ€™ when the parameters (K and b) are sampled independently of the exponential distribution \(p(x) = {e^{  x}}\)(refer to the study of Ehsan et al. [42]), 100,000 groups of random parameters are sampled and no less than 2 groups of parameters are obtained. The exhaustive search of hill function model used in this paper is only a verification step, and some false positives do not affect our conclusions.
Fully connected neural network model
In this paper, multilayer perceptron is used to obtain the gene synthesis rate f in time series gene expression data. In biological cells, the regulatory relationship between genes may be timelag. Therefore, the input layer of multilayer perceptron in our algorithm is the expression level of all genes at the t time point, and the output layer is the synthesis rate f of corresponding genes at the t time point. In this paper, the activation function of the hidden layer shown in Fig.Â 1a is ReLU, and the activation function of the output layer is sigmoid, so the value of the synthesis rate f is between 0 and 1. They are respectively expressed as:
As shown in Fig.Â 1b using three genes as an example, the ordinary differential equation of the corresponding gene regulatory network is expressed as:
where the \(f_i(g_1,g_2,g_3)\) function contains the regulatory relationship between genes. The Euclidean distance between \({\hat{g}}(t + dt)\) calculated by the ordinary differential equation formula in Fig.Â 1a and the expression level of gene \(g(t + dt)\)(time step is 1) at the next moment used as the training loss function of NN, \(Loss = \root 2 \of { {\textstyle \sum _{t}}({\hat{g}}(t + dt)  g(t+dt))^{2} }\). \(\lambda {g_i}\) represents the degradation term, we simply set \(\lambda =1\). In reality the degradation term can be represented by the diagonal term of the synthetic term f.
In Fig.Â 1b, three genes are used as an example to demonstrate that our method can intuitively construct gene regulatory networks. Taking \(g_1\) as the stimulus signal, the neural network training principle is shown in Fig.Â 1b(1), and Fig.Â 1b(2) represents the crosssectional information obtained by training the NN. When \(f_1=0\) is satisfied, what is shown in Fig.Â 1b(2) is that \(f_2\) (blue dotted line) and \(f_3\) (orange dotted line) increase with the increase of \(g_1\), that is, \(g_1\) activates \(g_2\), and \(g_1\) activates \(g_3\), inhibited by \(g_3\) after \(g_3\) reaches a steady state. The regulatory network extracted from the information of Fig.Â 1b(2) is composed of an incoherent feedforward loop, and b(3) is the regulatory network obtained from the crosssectional information of the neural network of Fig.Â 1b(2).
Link knockout technique
For regulatory networks with many more genes, direct visualization of the ffunction is difficult. Once we have a predictive model between all the genes and the synthetic term f, we question which genes in the gene pool have a strong influence on the synthetic term f. Therefore, we introduced the linked knockout technique, which passes raw data to the data for gene knockout, i.e., sets the expression of one gene at a time to 0, and uses the expression of the remaining genes as input to predict a specific synthetic term expression status. Therefore, this method can effectively improve the ability of constructing the regulatory network without reading the weight of NN. A disadvantage of this method is that when the synthesis rate of \(g_j\) is strongly inhibited by the highly expressed \(g_i\) gene. that is, when the expression of gene \(g_i\) is set to 0, the value of synthesis rate \(f_j\) obtained through neural network training will be very large. Therefore, a more accurate measure of the change in \(f_j\) with a fold change in \(g_i\) is:
This formula represents the link knockout experiment. \(\mu\) represents discount factor, \(\mu = 0\) represents link knockout. We truncated the domain where transcription factor i binds to gene j. With the regulatory link from node i to j being knocked down by a factor \(\mu\), the NN output (synthesis term \(f_j\) ) changes accordingly. \(\Delta _{ij}\) reflect the regulation effect of \(g_i\) on \(g_j\). A more intuitive example in Fig.Â 3 depicting the mutational trajectory where the regulatory link from \(g_3\) to \(g_2\) is knocked out is given by:
FigureÂ 3(1) first panel shows that the increase of \(g_3\) level means that \(g_2\) negatively regulates or inhibits \(g_3\), whereas the decrease of \(g_3\) level means that \(g_2\) positively regulates or promotes \(g_3\).
Availability of data and materials
Our source codes and data can be found in URL:https://github.com/lhfkd/NNGRN. InSilico_Size100 time series data is automatically generated by GeneNetWeaver 2.0 (http://gnw.sourceforge.net/).
References
Zhang W, Fang JA, Tang Y. Robust stability for genetic regulatory networks with linear fractional uncertainties. Commun Nonlinear Sci Numer Simul. 2012;17(4):1753â€“65. https://doi.org/10.1016/j.cnsns.2011.09.026.
Ma W, Trusina A, ElSamad H, Lim WA, Tang C. Defining network topologies that can achieve biochemical adaptation. Cell. 2009;138(4):760â€“73. https://doi.org/10.1016/j.cell.2009.06.013.
Gardner ST, Cantor RC, Collins JJ. Construction of a genetic toggle switch in Escherichia coli. Nature. 2000;403:339â€“42. https://doi.org/10.1038/35002131.
Ferrell JE. Perfect and nearperfect adaptation in cell signaling. Cell Syst. 2016;2(2):62â€“7. https://doi.org/10.1016/j.cels.2016.02.006.
Muzzey D, GÃ³mezUribe CA, Mettetal JT, van Oudenaarden A. A systemslevel analysis of perfect adaptation in yeast osmoregulation. Cell. 2009;403:160â€“71. https://doi.org/10.1016/j.cell.2009.04.047.
Qiao L, Zhao W, Tang C, Nie Q, Zhang L. Network topologies that can achieve dual function of adaptation and noise attenuation. Cell Syst. 2019;17(9):271â€“85. https://doi.org/10.1016/j.cels.2019.08.006.
de Jong H. Modeling and simulation of genetic regulatory systems: a literature review. J Comput Biol. 2002;9(1):67â€“103. https://doi.org/10.1089/10665270252833208.
Oates CJ, Dondelinger F, Bayani N, Korkola J, Gray JW, Mukherjee S. Causal network inference using biochemical kinetics. Bioinformatics. 2014;30(17):468â€“74. https://doi.org/10.1093/bioinformatics/btu452.
Andrejr A, Dirk H, Marco G. Approximate bayesian inference in semimechanistic models. Stat Comput. 2017;27:1003â€“40. https://doi.org/10.1007/s1122201696688.
Mangan NM, Brunton SL, Proctor JL, Kutz JN. Inferring biological networks by sparse identification of nonlinear dynamics. IEEE Trans Mol Biol MultiScale Commun. 2016;2(1):52â€“63. https://doi.org/10.1109/TMBMC.2016.2633265.
Brunton SL, Proctor JL, Kutz JN. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc Natl Acad Sci. 2016;113(15):3932â€“7. https://doi.org/10.1073/pnas.1517384113.
Penfold CA, Shifaz A, Brown PE, Nicholson A, Wild DL. CSI: a nonparametric bayesian approach to network inference from multiple perturbed time series gene expression data. Stat Appl Genet Mol Biol. 2015;14(3):307â€“10. https://doi.org/10.1515/sagmb20140082.
Matsumoto H, Kiryu H, Furusawa C, Ko MSH, Ko SBH, Gouda N, Hayashi T, Nikaido I. SCODE: an efficient regulatory network inference algorithm from singlecell RNASeq during differentiation. Bioinformatics. 2017;33(15):2314â€“21. https://doi.org/10.1093/bioinformatics/btx194.
AubinFrankowski PC, Vert JP. Gene regulation inference from singlecell RNAseq data with linear differential equations and velocity inference. Bioinformatics. 2020;36(18):4774â€“80. https://doi.org/10.1093/bioinformatics/btaa576.
Ren HP, Huang XN, Hao JX. Finding robust adaptation gene regulatory networks using multiobjective genetic algorithm. IEEE/ACM Trans Comput Biol Bioinf. 2016;13(3):571â€“7. https://doi.org/10.1109/TCBB.2015.2430321.
Savageau MA. Finding multiple roots of nonlinear algebraic equations using ssystem methodology. Appl Math Comput. 1993;55(2):187â€“99. https://doi.org/10.1016/00963003(93)90020F.
Koduru P, Das S, Welch S, Roe JL, LopezDee ZP. A coevolutionary hybrid algorithm for multiobjective optimization of gene regulatory network models. In: Proceedings of the 7th Annual Conference on Genetic and Evolutionary Computation. GECCO â€™05, pp. 393â€“399. Association for Computing Machinery, New York, NY, USA 2005. https://doi.org/10.1145/1068009.1068073.
Kimura S, Ide K, Kashihara A, Kano M, Hatakeyama M, Masui R, Nakagawa N, Yokoyama S, Kuramitsu S, Konagaya A. Inference of Ssystem models of genetic networks using a cooperative coevolutionary algorithm. Bioinformatics. 2004;21(7):1154â€“63. https://doi.org/10.1093/bioinformatics/bti071.
Hsiao YT, Lee WP. Inferring robust gene networks from expression data by a sensitivitybased incremental evolution method. BMC Bioinform. 2012;13(7):8. https://doi.org/10.1186/1471210513S7S8.
Mandal S, Khan A, Saha G, Pal RK. Reverse engineering of gene regulatory networks based on ssystems and bat algorithm. J Bioinform Comput Biol. 2016;14(03):1650010. https://doi.org/10.1142/S0219720016500104 (PMID: 26932274).
Nakayama T, Seno S, Takenaka Y, Matsuda H. Inference of ssystem models of gene regulatory networks using immune algorithm. J Bioinform Comput Biol. 2011;09(supp01):75â€“86. https://doi.org/10.1142/S0219720011005768.
Mandal S, Saha G, Pal RK. Ssystem based gene regulatory network reconstruction using firefly algorithm. In: Proceedings of the 2015 Third International Conference on Computer, Communication, Control and Information Technology (C3IT), 2015;1â€“5. https://doi.org/10.1109/C3IT.2015.7060217
Palafox L, Noman N, Iba H. Reverse engineering of gene regulatory networks using dissipative particle swarm optimization. IEEE Trans Evol Comput. 2013;17(4):577â€“87. https://doi.org/10.1109/TEVC.2012.2218610.
Wu SJ, Wu CT. Computational optimization for stype biological systems: Cockroach genetic algorithm. Math Biosci. 2013;245(2):299â€“313. https://doi.org/10.1016/j.mbs.2013.07.019.
Hsiao YT, Lee WP. Reverse engineering gene regulatory networks: coupling an optimization algorithm with a parameter identification technique. BMC Bioinform. 2014;15(15):8. https://doi.org/10.1186/1471210515S15S8.
Karlebach G, Shamir R. Modelling and analysis of gene regulatory networks. Nat Rev Mol Cell Biol. 2008;9(01):67â€“103. https://doi.org/10.1038/nrm2503.
Dâ€™Haeseleer P. Reconstructing gene networks from large scale gene expression data. PhD thesis 2000. AAI9993496
Wahde M, Hertz J. Coarsegrained reverse engineering of genetic regulatory networks. Biosystems. 2000;55(1):129â€“36. https://doi.org/10.1016/S03032647(99)000908.
Jingxiang S, Feng L, Yuhai T, Chao T. Finding gene network topologies for given biological function with recurrent neural network. Nat Commun. 2021;12(1):3125â€“37. https://doi.org/10.1038/s41467021234205.
Raissi M, Perdikaris P, Karniadakis GE. Multistep neural networks for datadriven discovery of nonlinear dynamical systems. arXiv preprint 2018. arXiv:1801.01236
Rackauckas C, Ma Y, Martensen J, Warner C, Zubov K, Supekar R, Skinner D, Ramadhan A, Edelman A. Universal differential equations for scientific machine learning. arXiv preprint 2020. arXiv:2001.04385
Marbach D, Prill RJ, Schaffter T, Mattiussi C, Floreano D, Stolovitzky G. Revealing strengths and weaknesses of methods for gene network inference. Proc Natl Acad Sci. 2010;107(14):6286â€“91. https://doi.org/10.1073/pnas.0913357107.
Jozefczuk S, Klie S, Catchpole G, Szymanski J, CuadrosInostroza A, Steinhauser D, Selbig J, Willmitzer L. Metabolomic and transcriptomic stress response of Escherichia coli. Mol Syst Biol. 2010;6(1):364. https://doi.org/10.1038/msb.2010.18.
Marbach D, Costello JC, RobertÂ KÃ¼ffner NMV, Prill RJ, Camacho DM, Allison KR, Consortium TD, Kellis M, Collins JJ, Stolovitzky G. Wisdom of crowds for robust gene network inference. Nature Methods 2012;9(8):796â€“804. https://doi.org/10.1038/nmeth.2016
GamaCastro S, Salgado H, SantosZavaleta A, LedezmaTejeida D, MuÃ±izRascado L, GarcÃaSotelo JS, AlquiciraHernÃ¡ndez K, MartÃnezFlores I, Pannier L, CastroMondragÃ³n JA, MedinaRivera A, SolanoLira H, BonavidesMartÃnez C, PÃ©rezRueda E, AlquiciraHernÃ¡ndez S, PorrÃ³nSotelo L, LÃ³pezFuentes A, HernÃ¡ndezKoutoucheva A, MoralChÃ¡vez VD, Rinaldi F, ColladoVides J. RegulonDB version 9.0: highlevel integration of gene regulation, coexpression, motif clustering and beyond. Nucleic Acids Res 2015;44(D1):133â€“143. https://doi.org/10.1093/nar/gkv1156.
HuynhThu VA, Irrthum A, Wehenkel L, Geurts P. Inferring regulatory networks from expression data using treebased methods. PLoS ONE. 2010;5(9):12776. https://doi.org/10.1371/journal.pone.0012776.
Zheng R, Li M, Chen X, Wu FX, Pan Y, Wang J. BiXGBoost: a scalable, flexible boostingbased method for reconstructing gene regulatory networks. Bioinformatics. 2018;35(11):1893â€“900. https://doi.org/10.1093/bioinformatics/bty908.
Luo Q, Yu Y, Lan X. SIGNET: singlecell RNAseqbased gene regulatory network prediction using multiplelayer perceptron bagging. Brief Bioinform. 2021. https://doi.org/10.1093/bib/bbab547.
Zhang Y, Chang X, Liu X. Inference of gene regulatory networks using pseudotime series data. Bioinformatics. 2021;37(16):2423â€“31. https://doi.org/10.1093/bioinformatics/btab099.
Ghosh Roy G, Geard N, Verspoor K, He S. PoLoBag: polynomial Lasso Bagging for signed gene regulatory network inference from expression data. Bioinformatics. 2020;36(21):5187â€“93. https://doi.org/10.1093/bioinformatics/btaa651.
Green J. Morphogen gradients, positional information, and xenopus: interplay of theory and experiment. Dev Dyn. 2002;225(4):392â€“408. https://doi.org/10.1002/dvdy.10170.
Ehsan Elahi F, Hasan A. A method for estimating hill functionbased dynamic models of gene regulatory networks. Royal Society Open Sci. 2018;5(2): 171226. https://doi.org/10.1098/rsos.171226.
Acknowledgements
The authors would like to express thanks to Xinhai Chen, Jintao Peng and Ruigeng Zeng. And we would like to thank all authors of the cited references.
Funding
This research work was supported in part by the National Key Research and Development Program of China (2021YFB0300101).
Author information
Authors and Affiliations
Contributions
GM, RZ, JP, and KZ contributed the idea, designed the study and revised the manuscript. GM and ZP implemented and performed most of the experiments. GM and JL wrote the manuscript. GM contributed to this work. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
Authors have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Mao, G., Zeng, R., Peng, J. et al. Reconstructing gene regulatory networks of biological function using differential equations of multilayer perceptrons. BMC Bioinformatics 23, 503 (2022). https://doi.org/10.1186/s12859022050555
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12859022050555