 Research
 Open access
 Published:
Periodic synchronization of isolated network elements facilitates simulating and inferring gene regulatory networks including stochastic molecular kinetics
BMC Bioinformatics volume 23, Article number: 13 (2022)
Abstract
Background
The temporal progression of many fundamental processes in cells and organisms, including homeostasis, differentiation and development, are governed by gene regulatory networks (GRNs). GRNs balance fluctuations in the output of their genes, which trace back to the stochasticity of molecular interactions. Although highly desirable to understand life processes, predicting the temporal progression of gene products within a GRN is challenging when considering stochastic events such as transcription factor–DNA interactions or protein production and degradation.
Results
We report a method to simulate and infer GRNs including genes and biochemical reactions at molecular detail. In our approach, we consider each network element to be isolated from other elements during small time intervals, after which we synchronize molecule numbers across all network elements. Thereby, the temporal behaviour of network elements is decoupled and can be treated by local stochastic or deterministic solutions. We demonstrate the working principle of this modular approach with a repressive gene cascade comprising four genes. By considering a deterministic time evolution within each time interval for all elements, our method approaches the solution of the system of deterministic differential equations associated with the GRN. By allowing genes to stochastically switch between on and off states or by considering stochastic production of gene outputs, we are able to include increasing levels of stochastic detail and approximate the solution of a Gillespie simulation. Thereby, CaiNet is able to reproduce noiseinduced bistability and oscillations in dynamically complex GRNs. Notably, our modular approach further allows for a simple consideration of deterministic delays. We further infer relevant regulatory connections and steadystate parameters of a GRN of up to ten genes from steadystate measurements by identifying each gene of the network with a single perceptron in an artificial neuronal network and using a gradient decent method originally designed to train recurrent neural networks. To facilitate setting up GRNs and using our simulation and inference method, we provide a fast computeraided interactive network simulation environment, CaiNet.
Conclusion
We developed a method to simulate GRNs at molecular detail and to infer the topology and steadystate parameters of GRNs. Our method and associated userfriendly framework CaiNet should prove helpful to analyze or predict the temporal progression of reaction networks or GRNs in cellular and organismic biology. CaiNet is freely available at https://gitlab.com/GebhardtLab/CaiNet.
Background
Dynamics and progression of many fundamental processes in cells and organisms, including metabolism [1, 2], the cell cycle [3], the circadian clock [4], differentiation [5, 6] and development [7] are governed by GRNs. These networks generally control the activity of genes by regulatory motifs such as feedback or feed forward loops [8, 9], which ensure spatially and temporally controlled gene expression. A striking property of gene transcription is its distinct stochastic behavior. Rather than producing their product continuously, most genes stochastically transit from an inactive state into an active state where they produce the gene product [10, 11]. During the ontime, which often is short compared to the offtime, a burst of expression occurs that might significantly increases the expression level. In combination with stochastic production (birth) and degradation (death) processes, this leads to complex stochastic trajectories of gene products. This noise of gene expression was shown to play an important role in decision making within GRNs [12,13,14,15]. Furthermore, theoretical work suggests that this stochastic behavior can influence the functionality of networks such as the circadian clock [16, 17] and the pluripotency network [18, 19].
To model and simulate the average temporal behavior of GRNs, ordinary differential equations (ODEs) and corresponding solvers are well established [20, 21]. To add stochasticity, the Langevin method adds normal distributed gene product noise to the system, assuming that all reactions in the system went through multiple discrete reaction events during a single simulation time step [22]. While this method achieves a reasonable description of systems with large molecule numbers, it does not explicitly account for gene on/off switching. An exact approach to simulate stochastic processes is the Gillespie direct method [23]. In this method, two random numbers drawn from a probability distribution comprising all possible reactions in the network determine the time point and the type of the next reaction event. In this approach, the duration of a timestep is limited by the fastest occurring reaction in the network. Thus, for networks including large molecule numbers or reactions with fast rates that occur on a timescale much shorter than the total simulation time, the Gillespie direct method is computationally expensive. To mitigate this problem, the tauleaping method has been developed [24]. There, a simulation time step may span over several reaction events. The number of reaction events that occurred during a time step is approximated by an average number. This method requires additional modifications to prevent negative particle numbers [25] and to calculate step sizes [26]. To further speed up simulations, hybrid approaches that combine all of the aforementioned approaches have been developed [27,28,29,30,31]. These approaches treat specific reactions in the network with different algorithms. To do so, complex considerations to decouple a generic network have to be performed [27]. Considering special cases of networks, i.e. networks with genes stochastically switching between on and off states, led to more effective dedicated hybrid stochastic deterministic simulation approaches [32]. However, a dedicated hybrid approach capable of modeling gene product synthesis in detail and providing biochemical reactions to simulate signaling pathways is still missing.
The inverse problem to simulating expression levels of GRNs with given parameters is to infer parameters and connections between genes from a set of given gene product levels, knockout data or time trajectories. A prominent approach to infer the topology of networks is based on Boolean interactions [33, 34]. Another approach looks for correlation coefficients [35] between steady state networks. Furthermore, linearized differential equations have been used to fit experimental data [36]. Using knockout data, a confidence matrix for network interactions was obtained [37]. Recent progress to the inference problem has been made using time trajectories of gene product levels [38,39,40,41]. Approaches such as WASABI [42] infer how perturbations of gene product levels propagate through a network. However, many inference approaches identify the relative importance of network parameters or connections, instead of yielding parameters that can directly be entered into physiological simulations.
Since genes may combine the actions of several transcription factors for their specific output behavior, they have been identified with perceptrons or neurons [43,44,45,46]. This enabled optimizing sophisticated algorithms originally designed for artificial neural networks, such as gradient decent methods, to the inference problem [47,48,49]. Using such an approach, a neural network was trained to reproduce a timeseries of gene product levels [49]. During training, the neural network had to learn both how to simulate the ODE system corresponding to the gene network as well as the network topology. Thus, it remained a challenge to disentangle network parameters and topology from the coefficients of the neural network.
Here, we report a computer aided interactive gene network simulation tool (CaiNet), dedicated for simulation of GRNs including molecular kinetics and for inference of steadystate network parameters. We simplified the simulations by determining the temporal evolution of each network element isolated from all other network elements between fixed time steps, after which the gene product levels are synchronized within the GRN. This modular approach enabled us to include local analytical solutions of the temporal evolution of a network element such as a gene, thereby speeding up simulation time. For genes, we included local analytical solutions to the time behavior for various different regulatory promoter motifs and variable numbers of ratelimiting steps in gene product synthesis. In addition, we provided local analytical solutions for elementary biochemical reactions such as dimerization, that can be combined to model complex biochemical reaction networks. We tested CaiNet by comparing its simulation of a repressive gene cascade of four genes to the solution of the global, networkwide ODEs by an ODE solver and to a full stochastic Gillespie simulation. Further, we verified that CaiNet is able to reproduce noiseinduced bistability and oscillations occurring in a positive and a negative autoregulatory feedback loop including enzymemediated degradation. For parameter inference, CaiNet uses a recurrent network approach to directly infer steadystate parameters. We tested CaiNet for up to 10 densely connected genes and find that CaiNet is able to recover the network topology and the network parameters well. The combination of a simulation algorithm and an inference algorithm directly yielding physiological simulation parameters will remove hurdles in understanding experimental results and investigating associated GRNs.
Results
Setup of GRNs with the graphical user interface of CaiNet
GRNs commonly comprise several elements: external signaling inputs, biochemical reactions and regulatory motifs including several genes (Fig. 1a, upper panel). In CaiNet, we characterized each element by certain structural and kinetic parameters (Fig. 1b). Inputs may be any kind of molecule. Each molecule input can be assigned a time course of molecule abundance, which might be for example sinusoidal or rectangular (Fig. 1c). Complex biochemical reactions can be set up by combining a certain set of elementary reactions [23]. In CaiNet, this set comprises homodimerisation, heterodimerisation and a transformation of a species by an enzyme. For example, formation of a homotetramer can be implemented by combining the homodimerization of a monomer and a subsequent homodimerization of the homodimer. We modeled genes by two states, an onstate and an offstate [50], and a promoter with a certain number of binding sites for transcription activators or repressors (Fig. 1b, d). The gene switches to the onstate with the effective rate λ_{eff} upon association of activating transcription factors according to the regulatory logic of the promoter, and switches to the offstate with the effective rate µ_{eff} upon their release (Fig. 1d and “Methods” section). Transcription repressors keep the gene in the offstate (“Methods” section). In the onstate, the gene product is produced with the production rate ν. The gene product can either be understood as mRNA or as protein. In the latter case, we initially simplified the process of protein production by combining transcription and translation of mRNA into a common ratelimiting step with one production rate. However, we introduced the possibility for an additional delay to account for production processes such as splicing and translation (see below). Moreover, gene products are associated with a degradation rate.
We designed a graphical user interface (GUI) for CaiNet to facilitate setting up complex GRNs (Fig. 1e), inspired by an effort to find general and understandable representations of biological networks [51]. With this GUI, icons representing the network elements ‘input’, ‘bimolecular reaction’ and ‘gene’ can be connected intuitively using activating or inhibiting links represented by wires. For each network element, relevant structural and kinetic parameters can be defined (Fig. 1e and Table 1). In addition, we implemented means to manipulate a genetic network by knocking down one or more genes (Fig. 1f). For a knocked down gene, the transcription rate is adjusted to a low value or zero.
Computeraided implementation of network simulations
To simulate GRNs with CaiNet, we developed a modular approach of solving the system of differential equations associated with the GRN or of simulating the stochastic behaviour of genes and of the numbers of gene products. At the heart of this modularisation lies the assumption that changes in molecule numbers within the GRN are small during a time step Δt and that it is sufficient to synchronize the numbers of molecules between all elements of the network only after each time step Δt. According to this assumption, each network element can be treated as if it was isolated from other network elements during Δt. Thus, for each Δt, CaiNet iterates through all network elements and calculates the local changes in molecule abundance over time for each element using elementspecific functions (“Methods” section, pseudo code) and the molecule numbers present in the GRN at the beginning of this time step Δt. We derived various elementspecific functions for bimolecular reactions and different promoter structures of genes (“Methods” section). To synchronize molecule numbers after each Δt, CaiNet calculates the total change in molecule numbers for all molecular species by summing over the local changes in molecule numbers of each network element. These new networkwide numbers are then used as starting values for elementspecific calculations during the next time step. Importantly, our modular approach enabled simulating different scenarios with increasing level of stochastic detail, ranging from an approximate solution of the system of deterministic differential equations associated with the GRN to an approximate solution of the corresponding stochastic chemical master equations.
To demonstrate the working principle of our modular approach, we chose a cascade of negative regulation comprising four genes (Fig. 2a). In the cascade, each gene encodes a repressive transcription factor, which represses the subsequent gene. Such a cascade has been shown to exhibit pathologic behaviour, i.e. large fluctuations in gene product levels, in presence of stochasticity [52, 53]. We set the number of the first repressive transcription factor, which acts as initial input, to a constant value of 100 molecules (high). We then determined the numbers of subsequent transcription factors during the simulation. In the following, we discuss the different scenarios considering increasing levels of stochastic detail.
Scenario 1 (Fig. 2b, c): In this scenario, we implemented the approximate solution of CaiNet to the system of deterministic differential equations associated with the GRN of the repressive gene cascade (“Methods” section, Eq. (59)). We modelled the protein output of each gene by
where n is the number of proteins, ν is the production rate and δ is the degradation rate. Gene elements directly use an analytical solution to calculate the expression level. We calculated the probability function of activated expression, a(q Δt), using the expression levels from the last synchronization time point, q Δt, and the equilibrium constants of transcription factors of the appropriate promoter structure (“Methods” section, Eqs. (13) or (16)). According to our central assumption, the probability function a is constant during the time interval Δt (Fig. 2b). Thus, we could find an individual local analytical solution for each gene element in the network. Using CaiNet, we simulated the gene cascade with a fixed set of kinetic rates for the transcription factors (Additional file 1: Table S1) and for three different time steps Δt (Δt = 10 s, 100 s and 1000 s) (Fig. 2c). If Δt was too large, the synchronization of transcription factor abundance and thus the change in gene activation was delayed. Hence, CaiNet’s solution for the transient behavior did not match the global, networkwide solution of the ODE solver (“Methods” section, Eq. (59)) anymore. However, the steady state level was still reproduced correctly. If Δt was sufficiently small, the approximation of CaiNet approached the global, networkwide solution of the system of ODEs associated with the GRN.
Scenario 2 (Fig. 2d, e): We included that genes can stochastically switch between on and off states (Fig. 2d). For simplicity, we assumed that genes were switched on with rate λ(n_{TF}) upon binding of a transcription factor, and switched off with rate µ upon unbinding of the transcription factor. Thus, the switching events are Poisson processes. For every time step Δt, we calculated the effective on/offrates, λ, µ, of the gene using the appropriate promoter structures (“Methods” section Eqs. (14) or (17)). According to our central assumption, λ, µ, were constant within a time step Δt. In contrast, the gene was able to switch states within a time step. To obtain the time points of switching, t_{i}, we drew the uniformly distributed random number X and used the current switching rate constant
Dependent on the state of the gene, we distinguished between the two equations
Gene elements directly used the corresponding analytical solutions to calculate the expression level. Using CaiNet, we simulated the gene cascade with a fixed set of kinetic rates for the transcription factors (Additional file 1: Table S1) and allowed for gene switching (Fig. 2e). We observed that protein abundance increased during the gene onstate and decreased while the gene was off. In combination with stochastic switching between these two states, the variance of the expressionlevel was significantly increased compared to the global, networkwide solution of the ODEsolver (Fig. 2e). Correspondingly, while the autocorrelation of the protein output of the gene cascade was constant for the solution of the ODEsolver, the autocorrelation for the solution including switching of genes exhibited a decay since protein levels varied over time (Fig. 2e).
Scenario 3 (Fig. 2f, g): In addition to the switching of genes, we included discretized and stochastic production and degradation events (birth and death events) of proteins (Fig. 2f). We initially determined the time points of gene switching according to Eq. (2) using constant protein numbers for a given Δt. For each onperiod of the gene with duration τ_{on}, we determined the number of birth events n_{B} of gene products by drawing a random number from the probability distribution corresponding to protein production:
while the number of degraded gene product is determined deterministically according to Eq. (3). We chose this simplification to circumvent a complex probability distribution for gene product numbers. For small changes in gene product levels, it should not deviate much from a full stochastic treatment (see below). In a period without production, we determined the number of gene products k by drawing a random number from the probability distribution to find k gene products after the time interval τ_{off}
assuming there were n proteins at the beginning of the time interval. Using CaiNet, we simulated the gene cascade with a fixed set of kinetic rates for the transcription factors (Additional file 1: Table S1), allowed for gene switching, and accounted for birth/death events (Fig. 2g). Compared to the case including gene switching alone (scenario 2), the variance further increased and approached the variance of a Gillespie simulation of the gene cascade considering the promoter structure explicitly and including stochastic birth and death events of gene products (“Methods” section).
We further tested the behavior of the gene cascade in a situation where the rates of gene switching were much faster than those for production (birth) and degradation (death) events (Fig. 2h, i and Additional file 1: Table S1). For scenario 2 with deterministic birth/death events of proteins, fast rates of gene switching underestimated the variance in protein levels compared to a global, networkwide Gillespie simulation including both stochastic gene switching and stochastic birth/death events (Fig. 2i). This indicates that the main contribution to variance in protein levels in this situation is due to stochasticity in birth/death events. Accordingly, when we included both stochastic gene switching and stochastic birth/death events in CaiNet (scenario 3), the variance in protein levels well approached the variance of the global, networkwide Gillespie simulation, despite the simplified treatment of birth and death events in CaiNet. Similarly, the autocorrelation of protein output, which characterizes the temporal variation of protein levels, was similar for the protein levels obtained by CaiNet with scenario 3 and Gillespie.
At small molecule numbers, the simplification for the onperiod of a gene of treating only birth events stochastically but degradation of gene products deterministically might produce artifacts. We therefore tested the behavior of the gene cascade for small numbers of transcription factors by setting the number of the first repressive transcription factor to the constant value of 1 (Fig. 2j). Again, scenario 2 without birth/death events of proteins deviated from a global, networkwide Gillepsie simulation. In contrast, the CaiNet simulation of the low protein situation with scenario 3 including both gene switching and birth/death events yielded very good agreement of the gene product levels and the temporal variation of these levels with the global Gillespie simulation. Thus, our semistochastic treatment of birth/death events constitutes a very good approximation.
To ensure sound consideration of gene product numbers in all three scenarios, we implemented that CaiNet monitors the synchronization time step and returns a warning if the change of a gene product, Δn, during one interval violates the criterion
For the transcription rate ν, for example, this leads to the condition
Thus, if ν was the fastest rate constant in the system, choosing the synchronization time step Δt < ν^{−1} ensures small changes in gene elements.
Consideration of delays
The processes in a cell may give rise to temporal delays. For example, the process of elongation may be considered as a constant delay in gene transcription. We implemented the possibility to account for deterministic, constant delays in the response of a network element in CaiNet as a shift of the element’s output by a fixed number of synchronization steps Δt (Fig. 3a). For example, if the output of a network element is delayed by 10 Δt, the element internally stores the value from ten synchronization steps and reports the delayed value during synchronization. A limitation of this approach is that delays can only be given as multiples of the synchronization time step Δt.
Moreover, we integrated the possibility to account for several ratelimiting steps in gene product synthesis. Such delays may occur due to transcription termination, mRNA splicing or translation. We modeled each ratelimiting step as a Poisson process with rate β_{i}, i.e. with exponentially distributed waiting times. Thus, changes in gene activity are blurred if average waiting times are on the order of or slower than gene on/off switching processes (Fig. 3b). We only simulated ratelimiting steps deterministically. Therefore, Eq. (1) was replaced by an analogous equation including the appropriate number of ratelimiting steps (“Methods” section, Eqs. (24) to (29)). At the synchronization time step Δt, the gene element returns the number of gene products of the corresponding solution.
Consideration of biochemical reactions
We further implemented the possibility to include and combine network elements of bimolecular reactions in CaiNet (Fig. 4). To demonstrate the working principle of simulating enzymatic reactions, we chose an example of two enzymes that mutually take each others product and catalyze it to each others substrate (Fig. 4a). The differential equations for the substrate N and the enzyme F that transforms the substrate N into the product M are
where δ is the degradation rate and ν_{1} and ν_{2} are the catalytic rates of the enzymes F and G. The association and dissociation rates of the substrate to the enzymes are denoted by λ_{1} and µ_{1} for N and F and by λ_{2} and µ_{2} for M and G. The differential equations for the enzyme G that transforms M back into N are
In Eqs. (8) and (9), the molecule numbers and fluxes that are synchronized at the end of each synchronization time step Δt are indicated with a circumflex accent. These parameters stay constant during each synchronization time step. With this assumption, we were able to decouple the differential equations of both bimolecular elements and solve them separately (“Methods” section). The resulting product numbers are reported as output of the bimolecular elements to all other network elements after each synchronization step Δt. We tested how the duration of the synchronization time step Δt influenced the accuracy of the results using a fixed set of reaction rates (Additional file 2: Table S2) and Δt = 1 s or 10 s (Fig. 4b). Similar to the case of gene elements, if Δt was too large, synchronization of reaction products was delayed compared to a global ODE solution and approximation of the transient behavior of the global ODE solution was poor. However, the steady state level was still reproduced correctly independent of the size of the time step, since the correct system of differential equations was used to calculate the outputs of the bimolecular reaction elements. If Δt was sufficiently small, the approximation of CaiNet approached the global solution of the associated system of ODEs using a numerical solver.
To ensure a sound consideration of molecule abundance using the flux among elements, we implemented that CaiNet returns a warning once the difference in flux Δj between start and end time of the synchronization interval violates
CaiNet reproduces noiseinduced bistability and noiseinduced oscillations in GRNs
GRNs often combine both gene regulatory motives and biochemical reactions. To further test the performance of CaiNet, we thus applied it to two heterogeneous GRNs. Both exhibited complex dynamic behavior in presence of stochasticity, as before with the repressive gene cascade.
As first example of a heterogeneous GRN, we chose a network including a positive autoregulatory feedback loop combined with enzymecoupled degradation (Fig. 5a, Eq. (60) in “Methods” section and Additional file 3: Table S3). Recently, noiseinduced bistability of such a network has been studied using a new mathematical method, linear mapping approximation, which is able to reproduce even critical dynamic characteristics of GRNs [54]. We verified presence of noiseinduced bistability in our network with a global Gillespie simulation (Fig. 5b). A CaiNet simulation of scenario 2, where only gene switching is treated stochastically but transcription of RNA and protein birth/death events are treated deterministically, did not reproduce this bistability. In contrast, when we included stochastic gene switching and stochastic transcription and birth/death events in the CaiNet simulation (scenario 3), the fluctuations in protein levels agreed to the fluctuations of the global, networkwide Gillespie simulation (Fig. 5b). Small deviations in the distribution of protein numbers remained for this heterogeneous GRN, since enzymatic reactions are only considered deterministically in CaiNet (Fig. 5b). Overall, CaiNet well recovered noiseinduced bistability of the GRN.
We further compared the time necessary to simulate this positive autofeedback GRN. One day of simulation time took 0.07 s for deterministic CaiNet simulations (scenario 1), 0.08 s for CaiNet simulations including stochastic gene switching (scenario 2) and 0.18 s for CaiNet simulations including both stochastic gene switching and stochastic transcription and birth/death events of the gene product (scenario 3). A networkwide Gillespie simulation implemented in Matlab took 2 s. We note however, that the GillespieSimulation was specifically written and optimized for the positive autofeedback GRN while CaiNet is a framework for general GRNs.
Second, we applied CaiNet to a heterogeneous GRN including a negative autoregulatory feedback loop combined with enzymecoupled degradation (Fig. 5c, Eq. (61) in “Methods” section and Additional file 3: Table S3). A similar network has been shown to exhibit noiseinduced oscillations by using slowscale linear noise approximation [55]. The CaiNet simulation of our network using only stochastic treatment of gene switching (scenario 2) settled at a constant steady state, after quickly decaying oscillatory transient behavior (Fig. 5d). In contrast, a global Gillespie simulation considering full molecular stochasticity revealed continuous, noiseinduced oscillations, as expected. Similarly, when we considered both stochastic gene switching and stochastic transcription and birth/death events (scenario 3), the CaiNet simulation exhibited continuous noiseinduced oscillations (Fig. 5d). As before, the CaiNet simulation did not fully reproduce the protein levels of the Gillepsie simulation due to deterministic treatment of the enzyme reaction. However, CaiNet well recovered the periodicity of the oscillations as revealed by a Fourier transformation of the simulated time trace (Fig. 5d). Overall, CaiNet simulations considering both stochastic gene switching as well as stochastic birth/death events of the gene product well recovered complex dynamic behavior of heterogeneous GRNs including both gene regulatory motives and biochemical reactions.
Inference of GRNs with a neural network approach
We took advantage of the modular approach of CaiNet to facilitate inference of kinetic rates of network elements and the topology of a GRN by using a gradient method originally designed to train recurrent neural networks. The analogy between GRNs and neural networks (NNs) has been pointed out before, by discussing genes as information processing units [43,44,45,46]. Accordingly, we directly identified each network element and its divers physiological steadystate input and output parameters with a unique perceptron. Thus, we could directly interpret the GRN laid out in CaiNet as a NN. This identification enabled us to apply a training method designed for recurrent NNs to the GRN, and the process of training corresponded to the process of inference of network parameters. As experimental ground truth, several known steady state measurements of gene product levels including different input conditions of the unknown network or knockout experiments of one or several genes may be taken. During the inference process, we minimized the difference between the current network behavior and the experimental ground truth by optimizing steadystate parameters of the GRN. In particular, we used a backpropagation algorithm introduced for recurrent NNs, which does not perform a global gradient descent step for all nodes in the network, but rather performs a local gradient decent and propagates remaining errors to other nodes [56] (“Methods” section). The modular layout of the GRN in CaiNet resulting from our central assumption facilitated calculating this gradient.
For an initial proofofconcept, we applied the process of inference to a network of three genes, for which we assigned a set of reaction rates (Additional file 4: Table S4) and activating or repressing connections between the genes (Fig. 6a). We then used CaiNet to simulate the ground truth behavior of the GRN for two different inputs. The resulting expression levels served as measurements for the proofofconcept inference. For simplicity, we only inferred the equilibrium constants of transcription factor binding, i.e. of on and offswitching of the genes, while leaving production and degradation rates fixed. We assigned false starting values for the equilibrium constant of each gene (Fig. 6a), simulated the resulting network behavior and calculated the difference to the two measurements of the ground truth network. Next, we iteratively applied the adapted gradient method until the difference between simulated and groundtruth values dropped below a certain threshold (“Methods” section). During each iteration, the parameters in the guessed network were changed such that the difference between simulated and groundtruth values became smaller. These changes of parameters propagate through the network. Due to feedback loops in the network structure, such a change may cause worse performance of the guessed network. Therefore, the distance between ground truth and guessed network may oscillate before reaching a stable value (Fig. 6b–d). The resulting values for the equilibrium constants and thus the performance of the network were well regained during the inference process (Fig. 6a).
Next, we applied our inference approach to multiple different networks with N = 5 to 10 genes. To generate these networks, we set up fully connected networks with N genes, i.e. that have N·(N − 1) connections (Fig. 6e). To obtain a specific ground truth network, we randomly modified the fully connected network. For each connection, we determined whether it was deleted by comparing a random number with a deletion probability p_{delete}, such that on average N·(N − 1)(1 − p_{delete}) connections remained. For each equilibrium constant of transcription factor binding, i.e. of on and offswitching of a gene, we drew a random value out of a probability distribution spanning over two orders of magnitude between 1e−3 and 1e−2. As a result of these two steps, we obtained a series of networks with N genes and randomly chosen topology and gene switching kinetics. Each series consisted of 12 networks, which served as ground truth networks.
To generate measurements for the inference process, we used CaiNet to simulate the expression levels of each gene of a network under different conditions. Here, we knocked out each gene in a given ground truth network one at a time and simulated the resulting expression levels of the other genes. This resulted in a set of N^{2} measurements, which we used as training data set in the inference process. Such large training data sets enabled inferring the actually implemented connections of the ground truth network when starting the inference process from the fully connected network. We repeated this process for each ground truth network in the series of networks with N genes.
We then trained the fully connected, nonrandomized, starting network with the gradient descent method on a measurement dataset of a specific network (“Methods” section). During the inference process, CaiNet did not explicitly delete connections. Rather, the equilibrium constants of unneeded or contradictory connections approached zero. We interpreted a connection as deleted, if its equilibrium constant was smaller than 1e−4. With equilibrium constants smaller or equal to this value, the affinity of the corresponding transcription factor is too weak to regulate the expression level of the respective gene. We repeated this process of inference for each ground truth network in the series of networks with N genes.
To quantify the performance of the inference approach, we counted the number of true positive and false positive connections of a trained network, as compared to the ground truth network topology (Fig. 6c and Additional file 5: Table S5). For networks of N = 5 to 10 genes, the false positive fraction was on average ≈ 20%, and the true positive fraction was ≈ 90%. We further quantified the inference of the equilibrium constants by calculating the difference between ground truth and inferred value and normalized this with the ground truth value (Fig. 5c, Inset). On average, the relative error of equilibrium constants was ≈ 1%.
Discussion
CaiNet is designed to assist users in setting up and simulating complex GRNs at molecular detail. Simplifying the process of setting up networks, we designed gene elements to assume a twostate behaviour with off and on state of the gene and with a variable number of successive production and processing rates and a degradation rate of the gene product. After assigning the promoter structure and number of successive gene synthesis rates, CaiNet assigns the corresponding local analytical solutions or stochastic models to the gene elements without further input by the user. This is possible since we treat each gene element isolated from other network elements between two synchronization time steps.
Our modular algorithm enables adding more and more realistic kinetic behaviour and noise in molecule abundance to initial deterministic solutions, since the local analytical treatment of the twostate model and gene product synthesis of a gene element can be readily replaced by a stochastic treatment of gene on/off switching and stochastic product synthesis for each gene. Thus, it is possible to quickly assess the effect of stochasticity on the behaviour of a network. Another great advantage of the modular approach of CaiNet is that new GRNs that follow the implemented regulatory structure can be set up and simulated very quickly.
Simulations of the modular stochasticdeterministic elements in CaiNet may be fast compared to the Gillespie method. The time increment of a Gillespie simulation depends on the fastest rate in the system, which often scales with the abundance of a reactive species. In contrast, the synchronization time step of CaiNet has to be adjusted such that changes in the abundance of the species are small within this time step, which allows for longer time steps. In addition, since gene elements are decoupled in CaiNet, it is possible to distribute the calculations to multiple processor cores. Moreover, the CaiNet algorithm enables fast transitions between deterministic and stochastic simulations. This in principle allows approximating parts of a large GRN deterministically, while treating others stochastically.
The precalculated analytical solutions applied in CaiNet set a limit for the generality of modelling biological networks. For now, we implemented modelling of genes only with two states and a few promoter structures. We further assumed a direct relationship between transcription factor binding and activation of a gene. Epigenetic alterations to the gene locus or further steps such as recruitment of the transcription machinery are not modelled explicitly. In principle, new network elements with further promoter structures or more complex multistate models can be implemented in CaiNet as specialized effective twostate models. It has been found that such effective models are suitable to well describe the histogram of mRNA levels of a cell population [57]. Nevertheless, singlecell experiments revealed an effect of cell size, cell division and DNA content on the kinetics of gene transcription and mRNA content, for example in dosage compensation after DNA replication [58, 59]. Accordingly, the twostate model has been extended to include such effects [58]. Recently, a more comprehensive and analytically tractable stochastic model of mRNA dynamics has been devised [60]. In the future, incorporating such complex models will enable enhancing CaiNet to arrive at more and more realistic simulations of GRNs.
While CaiNet is optimized for setting up GRNs and simulating stochastic processes in gene product synthesis, biochemical reactions can also be implemented to account for molecular signalling pathways and reaction cascades that are oftentimes connected to GRNs. For these pathways, we did not implement stochastic fluctuations in the expression levels, since biochemical reactions typically are fast compared to the kinetics of gene on/off switching. Due to this simplification, some deviations in gene product levels compared to a Gillespie simulation may occur. However, even complex dynamic behaviour such as noiseinduced bistability or oscillations were well recovered by a CaiNet simulation that considered both stochastic gene switching and stochastic birth/death events of the gene product. For biochemical reactions, CaiNet’s elements are limited to dimerization reactions and enzymemediated transformations of biomolecules. These basic bimolecular reactions can be combined to more complex biochemical reactions. However, biochemical reactions with more complex response functions or hill coefficients larger than two are not implemented and require calculating new CaiNet elements.
CaiNet is able to infer GRNs from steady state expression levels. The inferred parameters are therefore limited to equilibrium constants. Since it is challenging to define a goldstandard for inference methods [42, 61], we refrained from comparing CaiNet with other inference methods, such as previously attempted in the DREAM challenge. Instead, we used a comprehensive randomized approach to evaluate the performance of CaiNet. Our simulations indicate that if each gene in a network is knocked out individually, the resulting gene product levels provide sufficient data to infer the topology and parameters of the network. A limit for the size of the inferred network is given by the computation time of the recurrent network algorithm and by the amount of known features of the GRN. Importantly, the inferred equilibrium constants directly correspond to physiological parameters of the GRN and therefore directly allow for subsequent forward simulation of the inferred GRN.
Conclusion
We provide a userfriendly framework, CaiNet, to simulate GRNs at molecular detail and to infer the topology and steadystate parameters of GRNs. In CaiNet, biochemical reactions and genes with various regulatory logics can be combined to complex GRNs, for which the temporal progression of gene product levels is simulated. Inversely, experimental steadystate measurements of expression levels can be used to infer the underlying GRN. The combination of a forward simulation tool and inverse inference tool in the single package CaiNet may facilitate the process of evaluating biologically relevant GRNs and interpreting associated measurement results.
Methods
Pseudo code of network simulations by CaiNet
Effective rates given by the promoter structure
Activation by a single transcription factor
For the most simplified case of activation of a gene we assume that a promoter is activated once a transcription factor (TF) binds. This means that the onrate is given by the arrival rate λ_{eff} of the TF at the promoter. Once a TF has arrived at the Promoter the gene is ‘on’ for the time 1/µ_{eff}. This time may vary depending on the TF and the promoter of the gene. If not otherwise stated we assume that the binding time of the TF, 1/µ, corresponds to this ontime.
where n_{TF} is the number of transcription factors and λ_{0} is the arrival rate of a single transcription factor.
Promoter with ANDlogic
The ANDlogic refers to a promoter that is only activated if TF_{1} and TF_{2} up to TF_{N} are bound (Fig. 2d). Once a single TF leaves, the activation criterion is immediately violated and the promoter is off. Therefore, the offrate µ_{eff} of the promoter is the sum over all offrates µ_{TFi} of the TFs.
Combinatorically, we can also write down the probability of the promoter to be on as the product of the probability of all TFs to be bound
where λ_{TFi} = λ_{TFi,0} n_{TFi} is the arrival rate of a TF at the promoter. From p_{on,eff} and µ_{eff}, we can calculate the onrate of the promoter
Promoter with OR logic
When a promoter is active if TF_{1} or TF_{2} up to TF_{N} or a combination of all is bound, we refer to this promoter as ORlogic (Fig. 2d). We start the calculation of effective rates with the onrates of individual TFs, λ_{TFi}. Since the arrival of any TF is enough to activate the promoter, the onrate is
The probability to be on can be calculated combinatorically with p_{on,TFi} implicitly defined in (13)
With (14) we find the effective off rate of the promoter
Competitive repression
We now enhance all promoters developed above by an additional feature, which is the blocking of the promoter by a repressor. We assume that the promoter cannot be activated once a repressor is bound. Once a gene is activated however, the repressor does not abort expression. Therefore, the repressor directly affects the effective onrate of an arbitrary promoter while the offrate remains unchanged
To proof this, we write down the differential equations describing the change in the blocked and free promoter populations. We denote blocked promoters with b, free but inactive promoters with f and active promoters with a
Assuming that the changes in the repressorpopulation are small, we find
If we now calculate the sum of blocked and unblocked promoters, we obtain
We add up Eq. (19), plug the result in (21) and obtain
From this result we conclude for the new effective rate modified by the repressor
This is equivalent to Eq. (18) and the proof is complete.
Gene expression including delays
In a simplified approach, we considered one ratelimiting step with rate constant ν for synthesis of the gene product. This assumption neglects the fact that certain processes in gene product synthesis may introduce a delay between initiation of product synthesis and availability of the synthesized product, e.g. elongation of RNA, termination of transcription, initiation of translation, elongation of the protein and termination of translation. In addition, the mRNA needs time to be transported to ribosomes and might be subject to posttranslational modifications. To account for such processes in gene product synthesis, we included additional ratelimiting steps corresponding to Poisson processes with rates β_{1}…β_{N}.
To determine the resulting function n(t), we solve the general system of ODEs for an arbitrary number of delay processes. Using the switching function a(t) of a single gene, we arrive at the system of differential equations
where c_{N} represents the output expression level of the gene. Our first step to solve the system for c_{N}(t) is to calculate the Laplace transform of all equations in (24). The result for the first and the ith equation is
where p the transform variable and ã is the Laplace transform of a. With these results we can easily find an equation for the Laplacetransform of the final product c_{N}
Before we calculate the inverse Laplace transform to obtain c_{N}(t) we simplify the denominators in (26) by partial fraction composition with the coefficients α_{i}
For few delays this simplification can most easily be verified by plugging in α_{i} and rewriting the right hand side of (27). For a high number of delays the result can be derived by the wellknown technique of partial fraction decomposition. We plug this result in (26) and obtain
We next use the multiplication theorem for the Laplace transform and find
Training of recurrent regulatory networks
We assume that steady state expression levels for a subset of species of a GRN are given. Our goal is to infer parameters characterizing the steady state behaviour of the network from this information. Importantly, since we work with steady state information, our approach is limited to the inference of equilibrium constants rather than kinetic rate constants. To infer these constants, we apply a gradient method particularly design for recurrent network topologies [56]. Once these equilibrium parameters are inferred, the temporal evolution of the network can still be simulated.
We start by writing down the minimization problem for the objective function W(ϕ) of the difference between the current values X and the target expression levels T of the network. The objective function shall be minimized by varying the parameters of the GRN, ϕ = {λ, µ, ν, δ}.
where we used the l^{2}norm of the distance between target expression level and corresponding simulated values as a lossfunction for the ith element of the training data set.
To obtain the gradient for a parameter ϕ_{j} ∈ ϕ we calculate the derivative of the objective function with respect to ϕ_{j} and obtain
Using the delta rule introduced by Pineda et al. [56], we find a step size for the parameter ϕ_{j}
where we use the empirical parameter η to scale the step size. In the above equation, the derivatives of x_{i} with respect to ϕ_{j} are unknown. To derive a formula for Δϕ_{j}, we in the following formally work on a generic system of ODEs corresponding to a GRN.
The time evolution of all expression levels X = {x_{1}, …, x_{i}}, is given by the set of differential equations \({\dot{\mathbf{x}}} = {\mathbf{F}}({\mathbf{x}})\). To obtain the missing derivatives we take the total derivative with respect to ϕ_{j} of the steady state case 0 = F(X). We obtain
In this equation we recognize the partial derivative of x_{k} with respect to ϕ_{j}. Importantly, the system of equations is linear with respect to this derivative. We introduce the abbreviation
With this abbreviation, we can rewrite (33) as a linear system of equations for the derivatives of x_{k} with respect to ϕ_{j}.
By introducing a new variable z, we can make the step of solving this system of ODEs independent from the parameter ϕ_{j}, such that we only need to solve the system once in order to obtain the gradients for all parameters in ϕ. This variable z is characterized by the equation
We plug this equation in (32) and identify z by introducing the identity matrix. We obtain
To calculate this gradient using a network set up in CaiNet, we need to calculate the derivatives of the differential equations \(\partial {\mathbf{F}}/\partial \varphi_{j}\) and the derivatives \(\partial x_{k} /\partial \varphi_{j}\). Since the parameter \(\varphi_{j}\) and the species \(x_{i}\) only occur in a single network element, each network element can return these derivatives independently of other elements in the network.
For a promoter with orlogic, the differential equation is
where \(A_{j}\) is the set of all transcription factors that activate the element \(j\). We now take the derivative of the steady state equation with respect to \(\lambda_{k}\) and obtain
where \(\Delta_{ik}\) is the Kronecker delta. From this equation we can identify the variables that the network element returns to calculate a gradient decent step:
For a complete gradient descent scheme, we first obtain the steady state expression levels by simulating the network with CaiNet until all elements have reached a steady state. Next we use Eq. (37) to calculate the change for all parameters in ϕ based on the derivatives above and the difference between ground truth expression levels and simulated expression levels. Using the new parameters, we again simulate the steady state expression levels. We proceed in this manner until an abortion criterium is met. This abortion is \(W < b\), i.e. that the sum of all differences between ground truth and expression levels of the trained network are smaller than an upper bound \(b\).
Chemical reactions
In the following we derive analytical solutions for the elementary reactions ‘homodimerization’, ‘heterodimerization’ and ‘transformation of a species by an enzyme’. We then use these analytical solutions to couple multiple elementary reactions to obtain more complex biochemical reaction networks.
Heterodimerization
We start with the heterodimerization of the two species \(n_{1}\) and \(n_{2}\) that forms the species \(f\). For a closed system of these three species the change in \(f\) over time is
where \(\lambda\) is the association rate of the two dimerizing species and \(\mu\) is the dissociation rate of the dimer. The rates \(\delta_{1}\) and \(\delta_{2}\) correspond to the degradation of the monomers \(n_{1}\) and \(n_{2}\) respectively. The law of mass conservation yields equations for the species \(n_{1} ,n_{2}\)
Plugging in the law of conservation we obtain
To account for coupling to other elementary systems, we introduce the flow \(j_{in}\) that represents flux of the species \(f\) from other elements into the element at hand. The new differential equation for \(f\) is
Next, we solve this equation for \(f(t)\) for an initial value of \(f_{0}\). We start by calculating the fixed points (defined by \(\dot{f} = 0\)) of \(f\). The corresponding quadratic equation yields the fixed points \(f_{1} ,f_{2}\)
where \(K = \lambda \cdot \left( {\mu + \delta_{1} + \delta_{2} } \right)^{  1}\). Using these fixed points, we can rewrite Eq. (44) as
Integration of this ODE yields
Since the reaction consumes the species \(n_{1} ,n_{2}\), we need to calculate the flux of the respective elements into the species \(f\). This flux is calculated by
During each synchronization timestep, these fluxes are reported to the network element representing the corresponding species.
Homodimerization
The differential equations for a homodimerization are
where \(\lambda\) is the association rate of the two dimerizing monomers and \(\mu\) is the dissociation rate of the dimer. The rate \(\delta_{1}\) corresponds to the degradation of the monomers \(N\). The shape of this equation is similar to the heterodimer case. Thus, by setting
we can reuse our solution (47) to calculate the number of dimers and (48) to calculate the flux out of the monomer into the homodimer. During each synchronization timestep, this flux is reported to the monomer element.
Enzyme kinetics
In case of a dimerization with an enzyme, the substrate can be subject to a reaction catalysed by the enzyme. Here, the species \(m\) is produced.
where \(\lambda\) is the association rate of substrate to enzyme and \(\mu\) is the dissociation rate of the substrateenzyme complex. The rates \(\delta_{1}\), \(\delta_{2}\) and \(\delta_{3}\) correspond to the degradation of the substrate \(n_{1}\), degradation of the enzyme \(n_{2}\) and degradation of the product respectively. The rate \(\nu\) is the transformation rate of substrate to product. The shape of this equation is similar to the heterodimer case. Thus, by setting
we can reuse our solution (47) to calculate the number of enzyme–substrate–complexes and (48) to calculate the flux out of substrate and enzyme into the enzyme–substrate–complex. To obtain the amount of product, we integrate (52) and obtain
where
We arrived at this result by reusing our solution (47). During each synchronization timestep, m(Δt) is reported to all other elements in the network.
Simulation of the negative feedback cascade
Gillespie simulation
All genes in the cascade have three states, since the promoter structure is not modelled with an effective twostate model as in the case of the CaiNet simulation. The promoter is empty in the second state. By binding of a transcription repressor the promoter enters the first state. This state can only be exited upon unbinding of the transcription repressor. The third state is entered upon binding of a transcription activator
This state model leads to the following system of stochastic reactions. We consider the nth link of the chain. \(P_{n}\) then is transcriptionally repressed by the gene product of the (n − 1)th link. The nth link of the chain produces the transcription repressor \(R_{n}\)
We implemented this set of reactions for \(n = 1,2,3,4\).
System of ODEs
For the system of ODEs we do not explicitly simulate the association and dissociation of transcription factors to the promoters. Rather, we give the onprobability of the gene. The state model (56) corresponds to a competitive repression model of a promoter and we can give the onprobability of the nth link of the chain using (18)
With this onprobability we can give the ODE for the gene product
We implemented this set of reactions for \(n = 1,2,3,4\). For solving the system of differential equations we used the ode45 solver of Matlab 2019a.
Noiseinduced bistability and oscillations
The positive feedback loop
We simulated a gene that is activated by its own gene product A to transcribe RNA. The RNA is translated to yield the transcription factor, which can be degraded spontaneously or by an enzyme E.
The simulation used the rate constants given in Additional file 3: Table S3.
The negative feedback loop
We simulated a gene that is repressed by its own gene product R. If R is absent, RNA is transcribed and further translated to yield the transcription repressor. The repressor can be degraded spontaneously or by an enzyme E.
The simulation used the rate constants given in Additional file 3: Table S3.
Availability of data and materials
The CaiNet software is freely available at https://gitlab.com/GebhardtLab/CaiNet.
Abbreviations
 GRN:

Gene regulatory network
 NN:

Neural networks
 CaiNet:

Computer Aided Interactive Gene Regulatory Network Simulation Tool
References
SchmidtHeck W, et al. Fuzzy modeling reveals a dynamic selfsustaining network of the GLI transcription factors controlling important metabolic regulators in adult mouse hepatocytes. Mol Biosyst. 2015;11:2190–7.
Edwards JS, Ibarra RU, Palsson BO. In silico predictions of Escherichia coli metabolic capabilities are consistent with experimental data. Nat Biotechnol. 2001;19:125–30.
Orlando DA, et al. Global control of cellcycle transcription by coupled CDK and network oscillators. Nature. 2008;453:944U78.
Dibner C, Schibler U, Albrecht U. The mammalian circadian timing system: organization and coordination of central and peripheral clocks. Annu Rev Physiol. 2010;72:517–49.
Li M, Belmonte JCI. Ground rules of the pluripotency gene regulatory network. Nat Rev Genet. 2017;18:180–91.
Chickarmane V, Troein C, Nuber UA, Sauro HM, Peterson C. Transcriptional dynamics of the embryonic stem cell switch. PLoS Comput Biol. 2006;2:1080–92.
Davidson EH, et al. A genomic regulatory network for development. Science (80). 2002;295:1669–78.
Jacob F, Monod J. Genetic regulatory mechanisms in synthesis of proteins. J Mol Biol. 1961;3:318.
Alon U. Network motifs: theory and experimental approaches. Nat Rev Genet. 2007;8:450–61.
Larsson AJM, et al. Genomic encoding of transcriptional burst kinetics. Nature. 2019;565:251.
Suter DM, et al. Mammalian genes are transcribed with widely different bursting kinetics. Science (80). 2011;332:472–4.
Acar M, Mettetal JT, Van Oudenaarden A. Stochastic switching as a survival strategy in fluctuating environments. Nat Genet. 2008;40:471–5.
Acar M, Becskei A, Van Oudenaarden A. Enhancement of cellular memory by reducing stochastic transitions. Nature. 2005;435:228–32.
Arkin A, Ross J, McAdams HH. Stochastic kinetic analysis of developmental pathway bifurcation in phage λinfected Escherichia coli cells. Genetics. 1998;149:1633–48.
Chang HH, Hemberg M, Barahona M, Ingber DE, Huang S. Transcriptomewide noise controls lineage choice in mammalian progenitor cells. Nature. 2008;453:544–7.
Leloup JC, Goldbeter A. Toward a detailed computational model for the mammalian circadian clock. Proc Natl Acad Sci. 2003;100:7051–6.
Forger DB, Peskin CS. Stochastic simulation of the mammalian circadian clock. Proc Natl Acad Sci. 2005;102:321–4.
Chickarmane V, Olariu V, Peterson C. Probing the role of stochasticity in a model of the embryonic stem cell—heterogeneous gene expression and reprogramming efficiency. Bmc Syst Biol. 2012;6:12.
Lin YT, Hufton PG, Lee EJ, Potoyan DA. A stochastic and dynamical view of pluripotency in mouse embryonic stem cells. PLoS Comput Biol. 2018;14:24.
Karlebach G, Shamir R. Modelling and analysis of gene regulatory networks. Nat Rev Mol Cell Biol. 2008;9:770–80.
Di Ventura B, Lemerle C, Michalodimitrakis K, Serrano L. From in vivo to in silico biology and back. Nature. 2006;443:527–33.
Gillespie DT, Physics, D. G.T. J. of C. & 2000, undefined. The chemical Langevin equation. aip.scitation.org. J Chem Phys. 2000;113:64114.
Gillespie DT. Exact stochastic simulation of coupled chemical reactions. J Phys Chem. 1977;81:2340–61.
Gillespie DT. Approximate accelerated stochastic simulation of chemically reacting systems. J Chem Phys. 2001;115:1716–33.
Cao Y, Gillespie DT, Petzold LR. Avoiding negative populations in explicit Poisson tauleaping. J Chem Phys. 2005;123:54104.
Cao Y, Gillespie DT, Petzold LR. Efficient step size selection for the tauleaping simulation method. J Chem Phys. 2006;124:44109.
Haseltine EL, Rawlings JB. Approximate simulation of coupled fast and slow reactions for stochastic chemical kinetics. J Chem Phys. 2002;117:6959–69.
Pahle J. Biochemical simulations: stochastic, approximate stochastic and hybrid approaches. Brief Bioinform. 2009;10:53–64.
Hoops S, et al. COPASI—a COmplex PAthway SImulator. Bioinformatics. 2006;22:3067–74.
Adalsteinsson D, McMillen D, Elston TC. Biochemical Network Stochastic Simulator (BioNetS): Software for stochastic modeling of biochemical networks. BMC Bioinform. 2004;5:1–21.
Harris LA, Clancy P. A ‘partitioned leaping’ approach for multiscale modeling of chemical reaction dynamics. J Chem Phys. 2006;125:144107.
Lin YT, Buchler NE. Efficient analysis of stochastic gene dynamics in the nonadiabatic regime using piecewise deterministic Markov processes. J R Soc Interface. 2018;15:20170804.
Akutsu T, Miyano S, Kuhara S. Identification of genetic networks from a small number of gene expression patterns under the Boolean network model. In: Pacific symposium on Biocomputing; 1999. p. 17–28. https://doi.org/10.1142/9789814447300_0003.
Liang S, Fuhrman S, Biocomputing, R. S.P. Symposium on & 1998, undefined. Reveal, a general reverse engineering algorithm for inference of genetic network architectures. Citeseer. 1998;3:18–29.
Yip KY, Alexander RP, Yan KK, Gerstein M. Improved reconstruction of in silico gene regulatory networks by integrating knockout and perturbation data. PLoS ONE. 2010;5:e8121.
Scheinine A, et al. The challenges of systems biology inferring gene networks: dream or nightmare? Part 2: challenges 4 and 5. Ann N Y Acad Sci. 2009;1158:287–301. https://doi.org/10.1111/j.17496632.2008.04100.x.
Pinna A, Soranzo N, de la Fuente A. From knockouts to networks: establishing direct causeeffect relationships through graph analysis. PLoS ONE. 2010;5:e12912.
Pratapa A, Jalihal AP, Law JN, Bharadwaj A, Murali TM. Benchmarking algorithms for gene regulatory network inference from singlecell transcriptomic data. Nat Methods. 2020;17:147–54.
Casadiego J, Nitzan M, Hallerberg S, Timme M. Modelfree inference of direct network interactions from nonlinear collective dynamics. Nat Commun. 2017;8:1–10.
Papili Gao N, UdDean SMM, Gandrillon O, Gunawan R. SINCERITIES: inferring gene regulatory networks from timestamped single cell transcriptional expression profiles. Bioinformatics. 2018;34:258–66.
Che D, Guo S, Jiang Q, Chen L. PFBNet: a priorifused boosting method for gene regulatory network inference. BMC Bioinform. 2020;21:1–13.
Bonnaffoux A, et al. WASABI: a dynamic iterative framework for gene regulatory network inference. BMC Bioinform. 2019;20:1–19.
Chu DF, Zabet NR, Hone ANW. Optimal parameter settings for information processing in gene regulatory networks. BioSystems. 2011;104:99–108.
Zabet NR, Chu DF. Computational limits to binary genes. J R Soc Interface. 2010;7:945–54.
BenHur A, Siegelmann HT. Computation in gene networks. Chaos. 2004;14:145–51.
Ziv E, Nemenman I, Wiggins CH. Optimal signal processing in small stochastic biochemical networks. PLoS ONE. 2007;2:e1077.
Hu X, Maglia A, Wunsch DC. A general recurrent neural network approach to model genetic regulatory networks. In: Annual international conference on IEEE engineering and medical biology—proceedings, vol 7; 2005. p. 4735–4738.
Raza K, Alam M. Recurrent neural network based hybrid model for reconstructing gene regulatory network. Comput Biol Chem. 2016;64:322–34.
Ling H, Samarasinghe S, Kulasiri D. Novel recurrent neural network for modelling biological networks: oscillatory p53 interaction dynamics. Biosystems. 2013;114:191–205.
Peccoud J, Ycart B. Markovian modeling of geneproduct synthesis. Theor Popul Biol. 1995;48:222–34.
Novère NL, et al. The systems biology graphical notation. Nat Biotechnol. 2009;27:735–41.
Blake WJ, Kaern M, Cantor CR, Collins JJ. Noise in eukaryotic gene expression. Nature. 2003;422:633–7.
Hooshangi S, Thiberge S, Weiss R. Ultrasensitivity and noise propagation in a synthetic transcriptional cascade. Proc Natl Acad Sci. 2005;102:3581–6.
Cao Z, Grima R. Linear mapping approximation of gene regulatory networks with stochastic dynamics. Nat Commun. 2018;9:1–15.
Thomas P, Straube AV, Grima R. The slowscale linear noise approximation: an accurate, reduced stochastic description of biochemical networks under timescale separation conditions. BMC Syst Biol. 2012;6:1–23.
Pineda FJ. Generalization of backpropagation to recurrent neural networks. Phys Rev Lett. 1987;59:2229.
Popp AP, Hettich J, Gebhardt JCM. Altering transcription factor binding reveals comprehensive transcriptional kinetics of a basic gene. Nucleic Acids Res. 2021;49:6249–66.
Skinner SO, et al. Singlecell analysis of transcription kinetics across the cell cycle. Elife. 2016;5:e12175.
PadovanMerhar O, et al. Single mammalian cells compensate for differences in cellular volume and DNA copy number through independent global transcriptional mechanisms. Mol Cell. 2015;58:339–52.
Cao Z, Grima R. Analytical distributions for detailed models of stochastic gene expression in eukaryotic cells. Proc Natl Acad Sci U S A. 2020;117:4682–92.
Stolovitzky G, Monroe D, Califano A. Dialogue on reverseengineering assessment and methods: the DREAM of highthroughput pathway inference. Ann N Y Acad Sci. 2007;1115:1–22.
Acknowledgements
We thank Jana Meles for writing a userfriendly manual explaining the application and functionalities of CaiNet.
Funding
Open Access funding enabled and organized by Projekt DEAL. This work was supported by the European Research Council (ERC) under the European Union’s Horizon 2020 Research and Innovation Programme [637987 ChromArch] and the German Research Foundation (DFG) [GE 2631/21 and GE 2631/31].
Author information
Authors and Affiliations
Contributions
JH and JCMG designed the study; JH performed calculations and programmed CaiNet; JH and JCMG wrote the manuscript. Both authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1: Table S1.
Kinetic parameters of the repressive gene cascade (Figure 2 of the main text).
Additional file 2: Table S2.
Kinetic parameters of the exemplary biochemical reaction (Figure 4 of the main text).
Additional file 3: Table S3.
Kinetic parameters of the heterogeneous GRNs (Figure 5 of the main text).
Additional file 4: Table S4.
Kinetic parameters of the exemplary GRN used to demonstrate the network inference (Figure 6 of the main text).
Additional file 5: Table S5.
Number of inferred true and false positive and true and false negative connections for networks of N = 5 to 10 genes as result of the network inference process (Figure 6 of the main text).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Hettich, J., Gebhardt, J.C.M. Periodic synchronization of isolated network elements facilitates simulating and inferring gene regulatory networks including stochastic molecular kinetics. BMC Bioinformatics 23, 13 (2022). https://doi.org/10.1186/s12859021045416
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12859021045416