TopoDoE: a design of experiment strategy for selection and refinement in ensembles of executable gene regulatory networks

Bouvier, Matteo; Zreika, Souad; Vallin, Elodie; Fourneaux, Camille; Gonin-Giraud, Sandrine; Bonnaffoux, Arnaud; Gandrillon, Olivier

doi:10.1186/s12859-024-05855-x

Research
Open access
Published: 19 July 2024

TopoDoE: a design of experiment strategy for selection and refinement in ensembles of executable gene regulatory networks

Matteo Bouvier^1,2,3,
Souad Zreika¹,
Elodie Vallin¹,
Camille Fourneaux¹,
Sandrine Gonin-Giraud¹,
Arnaud Bonnaffoux² &
…
Olivier Gandrillon^1,3

BMC Bioinformatics volume 25, Article number: 245 (2024) Cite this article

66 Accesses
Metrics details

Abstract

Background

Inference of Gene Regulatory Networks (GRNs) is a difficult and long-standing question in Systems Biology. Numerous approaches have been proposed with the latest methods exploring the richness of single-cell data. One of the current difficulties lies in the fact that many methods of GRN inference do not result in one proposed GRN but in a collection of plausible networks that need to be further refined. In this work, we present a Design of Experiment strategy to use as a second stage after the inference process. It is specifically fitted for identifying the next most informative experiment to perform for deciding between multiple network topologies, in the case where proposed GRNs are executable models. This strategy first performs a topological analysis to reduce the number of perturbations that need to be tested, then predicts the outcome of the retained perturbations by simulation of the GRNs and finally compares predictions with novel experimental data.

Results

We apply this method to the results of our divide-and-conquer algorithm called WASABI, adapt its gene expression model to produce perturbations and compare our predictions with experimental results. We show that our networks were able to produce in silico predictions on the outcome of a gene knock-out, which were qualitatively validated for 48 out of 49 genes. Finally, we eliminate as many as two thirds of the candidate networks for which we could identify an incorrect topology, thus greatly improving the accuracy of our predictions.

Conclusion

These results both confirm the inference accuracy of WASABI and show how executable gene expression models can be leveraged to further refine the topology of inferred GRNs. We hope this strategy will help systems biologists further explore their data and encourage the development of more executable GRN models.

Peer Review reports

Background

For the last 60 years, it has been commonly admitted that a precise knowledge of gene regulatory interactions is required to fully understand the processes of cell decision making (differentiation, proliferation or death) in response to a stimulus [1, 2]. Therefore, for the last three decades, the Systems Biology field has dedicated a great deal of effort to infer the structure of Gene Regulatory Networks (GRNs). Initial attempts suffered from the imprecision of bulk RNA-seq, in which the expression data from millions of cells was averaged, masking cellular heterogeneity and stochastic phenomena. Algorithms developed in the last ten years benefited from the development of single-cell RNA-seq technologies, which now allows to access mRNA distributions in more details and investigate causal dependencies between genes. Indeed, the single-cell resolution was shown to contain a much richer information that the mean value alone [3,4,5].

However, the precise identification of biological parameters of a GRN and the taks of distinguishing between multiple possible topologies remain to this day challenging problems. Attempts at solving those problems were for example made in the context of the DREAM (Dialogue on Reverse Engineering Assessment and Methods) challenges [6] where GRN inference methods were developed. Current GRN inference algorithms can broadly be categorized into two classes: (i) those relying solely on dynamic gene expression data gathered after a stimulation [7,8,9,10] and (ii) those also exploiting the information of a perturbation design matrix (i.e. associations between stimuli and target genes) [11,12,13]. Additionally, experimental design strategies have been developed as standalone methods to select most informative perturbation experiments given an uncertain network, in a cost effective manner [14,15,16]. The general goal of those strategies is to decide under which perturbation (gene knock-out (KO), knock-down (KD) or over-expression), at which time point(s) and through which kind of data (bulk or single-cell RNA-seq, proteomics, ...) a process of interest should be observed to discard the largest amount of incorrect GRNs, therefore leading to a small number of most relevant GRNs (and ideally leaving only one). Those strategies must respond to the difficult question of maximizing the amount of newly acquired data while minimizing the costs (financial costs, time required), dealing with measurement uncertainties and accounting for the stochastic nature of gene expression.

Recently our team developed WASABI [17], a tool which allows to (1) infer GRNs from time-stamped scRNA-seq data and (2) simulate those GRNs. Simulations are made possible by a mechanistic model of gene expression, previously described in [18], where a stochastic process controls promoter activation and a set of ODEs determines RNA and protein synthesis, resulting in a Piecewise Deterministic Markov Process (PDMP). The algorithm works by iteratively building and simulating ensembles of candidate GRNs from which the best performing are selected.

This GRN inference algorithm was applied to a dataset of single-cell RTqPCR data obtained on differentiating chicken erythrocytic cells. As expected, it did not produce a single GRN but rather a collection of 364 candidate GRNs equally well suited for reproducing experimental data when simulated. It is therefore the ideal playground for the development of a Design of Experiment strategy able to efficiently reduce the number of candidate GRNs previously generated by a GRN inference algorithm.

To do this, we introduce TopoDoE, an iterative method for the in silico identification of the most informative perturbation—that is eliminating as many incorrect candidate GRNs as possible from the data gathered in one experiment. That method is a 4 step process in which:

1.
a topological analysis is performed on the set of candidate GRNs to identify the most promising gene targets. This is essential to avoid the heavily time-consuming simulation of all possible gene perturbations.
2.
in silico perturbation and simulation of the identified gene targets and ranking of those perturbations to identify the most informative one.
3.
in vitro execution of the selected perturbation and scRNA-seq data acquisition.
4.
selection of the subset of candidate GRNs which accurately predicted the novel experimental data.

This strategy led to the identification of the FNIP1 gene as a promising target, that was knocked-out in chicken erythrocytic progenitor cells. The in silico predictions of FNIP1 KO were verified for 48 out of 49 genes in our GRNs. The DoE strategy helped reduce the 364 candidates into 133 most relevant ones. The merging of those 133 GRNs led to one GRN with a much improved goodness of fit to experimental data than any other candidate.

Results

Initial setting

WASABI has previously been applied to the inference of the GRN governing the differentiation process of avian erythrocyte progenitor cells (T2ECs) into mature erythrocytes [17]. Bulk RNA-seq data was first obtained for both self-renewing and differentiating for 48 h T2EC [19]. 424 genes were initially selected based on differential expression and gene ontology analyses. This number was further reduced to only 92, so that single-cell data could be obtained using the scRTqPCR technology, using a K-means clustering to retain genes from multiple expression kinetics. Time-stamped single-cell RTqPCR data was acquired 0, 8, 24, 33, 48 and 72 h after differentiation stimulation. Finally, the waves analysis procedure described in [17] was applied to select the 49 genes most relevant to the differentiation process of T2ECs and supplied to WASABI for GRN inference. It generated 364 candidate GRNs, all made of the same 49 genes (S1 Table) and of a unique stimulus mimicking the change of culture medium which triggers the differentiation process [20]. As shown in S1 Fig, all 364 GRNs shared an overall close but always different topology. When comparing all GRNs two-by-two, we found on average a low number of different interaction values between pairs of genes: only 7.72 different values out of the 160 total existing interactions. Fig. 1A shows the graph of interactions of one such candidate GRN.

GRNs generated by WASABI were defined by a mechanistic model of gene expression based on coupled Piecewise-Deterministic Markov Processes (PDMPs) governing how the mRNA and Protein quantities change over time. In this model, the gene promoter activation (i.e. gene bursting frequency) is function of the expression level of all other genes. Gene A is said to regulate gene B when the interaction value $\theta _{A,B}$ is not null.

Because this model was executable, it allowed us to simulate the behavior of GRNs over some period of time by solving the underlying PDMPs. The result of a GRN simulation was a collection of matrices of cells $\times$ genes values of mRNA counts, one for each time point. When simulated, all 364 candidate GRNs produced similar count matrices: distances between simulated and experimental data were indeed all close (see S2 Fig), with distance variations explained purely by the randomness of the simulations and no GRN performing significantly better than others. Here, distances were computed using the Kantorovich distance [21] taken on marginals (i.e. computed one gene at a time). Our next objective was thus to identify a perturbation which would produce different GRN responses in the form of diverse count matrices.

Step 1: Topological analysis

Depending on the number of genes in the GRNs of interest, simulation of all possible perturbations on all genes might be very time consuming or even completely unfeasible. We thus sought to develop a preliminary step to our strategy, based on the topological analysis on the set of candidate GRNs, that would allow us to identify genes having the highest chance of producing informative perturbations. This analysis was motivated by the fact that, while some gene-to-gene (or stimulus-to-gene) interactions appeared in all GRNs, others were present in very few of them (for example, the regulation of FNIP1 by GAB1 only appears in one of the 364 candidates; see S3 Fig). In particular, the gene FNIP1 had many possible regulator genes: interactions with 47 out of the 49 genes could be found in the set of candidates, but only in at most 9 GRNs at a time (except for the regulation by MFSD2B which was found in all GRNs). This configuration is intuitively promising since the GRNs would produce many different regulatory dynamics upon perturbation of a single target gene, because of their many distinct gene-to-gene interactions.

To identify the genes with the most variable interactions with its descendants (the downstream genes it regulates), we proposed the Descendants Variance Index (DVI). Briefly, this index considers one gene at a time and measures how much interactions between that gene and its descendants qualitatively change in the whole set of candidate GRNs (change from activation to inhibition or to no interaction at all). High values for a gene on the DVI indicate that many different types of regulations can be found in the GRNs while low values show that most GRNs have the same regulations. Here, we focused only on downstream genes since we will only consider KO experiments in later steps, thus only affecting the expression of genes regulated by the KO target, as it is the case for most kinds of perturbations.

DVI values where highest for genes FNIP1 (DVI=0.4934), DHCR7 (DVI=0.2707), BATF (DVI=0.2687), FHL3 (DVI=0.2487) and MID2 (DVI=0.2255), as shown in Fig. 1B. Those 5 genes were thus selected for studying the effect of their in silico perturbation.

Step 2: In silico perturbations

We simulated the 364 candidate GRNs after KO of each of the genes identified in the previous analysis. Simulated data had to be compared to reference unperturbed data to obtain predictions on which genes would display significant expression variations upon perturbation. Finally, a measure of entropy was used to select the most informative perturbation.

Before any simulation could be run however, constant hyper-parameters of our model needed to be chosen so as to obtain balanced simulations. The goal here was to set the initial state of the simulations so that they would reproduce experimental data from unperturbed cells in a stable way in the absence of any perturbation. This was an essential preliminary step for correct interpretation of the results since incorrect simulation balancing would introduce simulation biases, either spontaneously drifting away from the initial state or remaining stuck on it and thus masking the effect of a perturbation.

Close inspection of the experimental data however revealed that cells from early timepoints were already spread out heterogeneously along the differentiation trajectory. This violated the assumptions of the hyper-parameter calibration algorithm originally implemented in WASABI, which computes values from a cell population average of RNA counts. We thus devised a novel initialization method based on a modified simulated annealing algorithm which simulated gene expression values for 40 h with different hyper-parameters’ values to identify optimal combinations even in an heterogeneous context.

Table 1 Effects of in-silico perturbations

Full size table

Then, in-silico data was obtained for all 6 conditions (reference data with no perturbation and each of the 5 selected genes knocked-out independently). For each, all 364 candidate GRNs were simulated for 100 h, so that they had enough time to reach a new stable state after perturbation. Data was recorded at the end of the 100 h to obtain mRNA count matrices of 200 cells each.

Counts of mRNA molecules after each perturbation were compared to the reference dataset using one-sided t-tests for both ‘greater‘ and ‘less‘ mean value hypothesis. The effects on downstream genes are shown in Table 1, where the number of GRNs for which we measured significant expression variation is reported (p-value < 0.01). Numbers followed by an asterisk indicate uninformative effects of a KO. Indeed, when the effect of a perturbation was the same in all of the 364 candidate GRNs or when no GRN responded to that perturbation, it provided no valuable information to discriminate GRNs. To assess the total amount of information given by a KO experiment, we measured for each gene the entropy on the proportions of GRNs with variations in expression levels, calculated from column GRNs (with variation) of Table 1. For gene FNIP1, the entropy was thus computed on the vector $(\frac{88}{364}, \frac{45}{364}, \frac{102}{364}, 0\ldots )$ with 46 trailing zeros for the 46 genes not predicted to show expression variation. Measured entropy values are given in Table 2. The KO of FNIP1 had the largest entropy (i.e. carried the most information) with a value of 1.0466. This perturbation was thus selected for the next steps.

Table 2 Measured values of entropy after single gene KO

Full size table

Predictions of gene expression variation

Since we had selected FNIP1 as target for a KO experiment, we sought to obtain in-silico predictions of the expected gene expression variations. To that end, the mRNA counts from all GRNs simulated under FNIP1’s KO were pooled together and compared to the reference condition to produce average predictions. As shown in Fig. 2 and as expected from the results in Table 1, genes SNX22 and SLC25A37 had overall decreased mRNA counts in the KO condition and ENSGALG00010025565 had increased mRNA counts (one-sided t-tests with p-value < 0.01). Also as expected, none of the 45 other genes had a significant expression variation (see S4 Fig).

Step 3: In vitro perturbations

To test experimentally our in-silico predictions, we devised a dedicated strategy to obtain single cell transcriptomics data on cells that had been validated for the KO of FNIP1 (Fig. 3A). One of the main challenges we had to face was that the T2ECs being primary cells, they have a finite lifetime of 30 days [20] during which the cells had to be transfected, cloned, amplified, molecularly validated, and seeded in 96 wells plate for subsequent scRTqPCR analysis.

Following FNIP1’s KO, we acquired single cell transcriptomics data for 61 KO cells and for 12 cells transfected with an empty plasmid which we used as control (Fig. 3B). We recovered expression data for 45 of the 49 genes in the GRNs. Genes ABCG2, LDHA and GAB1 displayed poor quality data and were removed from the dataset.

Using one-sided t-tests, we found that the expression of genes SNX22 and SLC25A37 dropped significantly in the KO condition when compared to the control, which matched our predictions (see S2 Table and S5 Fig). Surprisingly however, the expression also dropped for genes ENSGALG00010025565 and SLC6A9. This indicated a flaw in WASABI’s inference method where it overestimated the basal expression level or the auto-activation strength for gene. For example, ENSGALG00010025565’s expression level was often supported by it’s own expression alone, as much as to not be regulated by any other gene in some candidate GRNs.

Importantly however, as previously predicted, no significant expression level variation was measured for the 45 other genes. Altogether, these results allowed us to confirm that WASABI was able to infer mostly correct GRNs with a remarkable accuracy. Indeed, the probability of making at most 2 errors on the qualitative responses of the 49 studied genes was only $P(E \le 2) = 2.0071 \times 10^{-20}$ with E the number of errors and with a probability of $\frac{2}{3}$ of making an error for each gene.

Step 4: GRN selection and refinement

From the previous step, we had identified a KO target, generated predictions on the gene expression variations after perturbation and verified most of our predictions from experimental KO data. The last step of our strategy was to rule out incorrect GRN candidates. From the novel information gathered in the experiment, we were also able to build a GRN more accurately reproducing the data.

GRNs were selected by retaining only those with topologies coherent with the obtained experimental results. 45 GRNs with FNIP1 positively regulating SNX22 matched the decreased expression of SNX22 and 88 GRNs with FNIP1 positively regulating SLC25A37 matched the expression drop of that gene. We decided to select those 133 GRNs among the 364 candidates and rule out the 231 others.

Interestingly, no GRN had FNIP1 regulating both SNX22 and SLC25A37 simultaneously. This revealed a limitation in WASABI’s exploration of possible GRN topologies which was caused by the limited computational resources available at the time WASABI was developed. This limitation prevented WASABI from exploring more complex topologies in which FNIP1 regulated more that one gene at once. To validate the hypothesis that such a topology would better fit the experimental data, the 133 selected GRNs were merged into a single GRN by computing the average of interaction values for each stimulus-to-gene or gene-to-gene regulation.

To measure the performance of this new GRN, we computed the distance of simulated to experimental KO data for all 364 candidate GRNs. The distribution of such distances is shown as the blue histogram in Fig. 4. Similarly, we computed the distance of simulated data obtained with the merged GRN to the experimental data (green dotted line). This distance was much lower than those obtained with any other GRN, demonstrated the improved goodness of fit of the constructed GRN.

Finally, to verify the relevance of our selection, we merged all 364 candidate GRNs in the same way as before and again computed its distance to the experimental data (red dotted line in Fig. 4). Again, we obtained a better performance than any of the 364 GRNs, which could be explained by the fact that, by merging all of the GRNs, we recovered the simultaneous regulation of SNX22 and SLC25A37 by FNIP1. However, that merged GRN did not perform as well as the one obtained by merging the 133 selected GRNs, confirming the relevance of our selection.

Discussion

We have presented TopoDoE, a DoE strategy that was designed for selecting the most informative experiment to perform to significantly reduce the number of previously inferred GRNs. When applied as a follow-up step to WASABI’s GRN inference algorithm, the presented strategy of network selection allowed to first identify and remove incorrect GRN topologies and then to recover a new GRN better fitting experimental data than any other candidate.

Validation of the inference algorithm

Initial simulations of GRNs inferred by WASABI showed they all fitted equally well the experimental data used in the inference step. This motivated the generation of new experimental data able to distinguish between the candidate networks. The simulation and then the experimental completion of FNIP1’s KO further showed that the 364 candidate GRNs proposed by WASABI closely matched the “true” GRN. Indeed, among the 49 studied genes, the expression level variation after FNIP1’s KO was incorrectly predicted for only 1 gene (ENSGALG00010025565). Most importantly, the sole up-regulations of SNX22 and SLC25A37 were anticipated, indicating that no other interactions with FNIP1 as regulator were missed during the inference step. The probability of generating GRNs which would produce such accurate predictions purely by chance was extremely low. This finding provides us confidence on the quality of both WASABI’s algorithm and the inferred GRNs.

Identification of WASABI’s limitations

One interesting finding was the ability of TopoDoE to also identify limitations in the GRN inference algorithm. A closer inspection of the candidate GRNs indeed revealed two main issues in the initial implementation of WASABI: (i) the exploration of possible GRN topologies was incomplete and (ii) selected topologies has a bias towards strong auto-activation regulations.

When WASABI was run on a super computer, it required one entire CPU node per tested topology for its simulation. Combined with the simulation slowness, this lead to high needs for computation resources which in turn meant that too few GRN topologies could be explored per iteration of the algorithm. WASABI thus could not explore more complex alternatives in which FNIP1 would regulate more than one gene at a time. This is a common issue in the general scope of Machine Learning in which exploration—testing different solutions—and exploitation—evaluating a particular solution’s relevance—need to be correctly balanced (see for example [22] or [23]).

Also, WASABI allowed some genes to have high basal expression level supported by strong auto-activations. In some cases, genes such as ENSGALG00010025565 were in fact only regulated in that way and were thus completely disconnected from the rest of the GRN. This evidently prevented the prediction of the positive regulation of ENSGALG00010025565 by FNIP1 as shown in the experimental data. This behavior can easily be corrected by adding a penalisation term to auto-activations in the future use of WASABI.

Quality of the predictions

Even though SNX22’s and SLC25A37’s variation of expression levels after FNIP1’s KO were statistically significant in both simulation and experimental data, it must be noted that the predictions were only qualitatively accurate, but not quantitatively. Indeed, expression levels for both genes were very low in simulated data as compared with experimental data of FNIP1 Knock-Out. This observation can be explained by the variability in mRNA counts between experiments. Indeed, mRNA counts did not always perfectly coincide between the training data (used to infer the GRNs) and the experimental data obtained after FNIP1’s KO. Also, the small number of cells in the experimental data (12 wild type and 61 knocked-out cells) could have induced a bias.

GRN selection and refinement

In the final step of our strategy, we selected the candidate GRNs which qualitatively predicted the expression variation for at least one gene, after FNIP1’s KO. This resulted in a total of 133 selected GRNs, thus eliminating two thirds of the candidates.

Even though the “true GRN” was not in the initial set of candidates, it was possible to recover it at least partially by merging promising GRNs together. Here, when merging the 133 selected candidates, we built a new GRN which performed significantly better than all other candidates in reproducing the reference data.

Expanding to other perturbations

In this work, we only worked with KO perturbations since we had mastered the CRISPR-cas9 system in T2ECs, which allowed to perturb all genes downstream of the target. However, TopoDoE could easily be expanded to other types of perturbations such as knock-downs (where short interfering RNA fragments inhibit the translation of specific mRNAs [24, 25]) or over-expressions (obtained by introducing a dedicated plasmid in a cell [24]). In some cases, these approaches would be easier to apply than full knock-outs or allow to perturb genes that cannot be knocked-out because they are essential for the cell’s viability. Knock-downs and over-expressions can also be easily introduced in our expression model by increasing by some factor the d0 and s0 parameters respectively. Here we would focus on d0 and s0 to stay close to the biological processes since knock-downs increase the rate of mRNA degradation while over-expressions increase the amount of mRNA molecules synthesized. This flexibility in our method denotes with existing DoE strategies which often rely on full gene KO [15, 16].

Interestingly, we found that the variability in interaction values among the candidate GRNs was maximal not when considering interactions between FNIP1 and the genes it regulated, but when considering interactions with the genes that regulated FNIP1 (see S6 Fig). It is however difficult to devise a perturbation targeting at once all of the interactions between a gene and those upstream of it, thus producing the maximum amount of different GRN responses.

One possibility might be to use a reversion experiment, in which the differentiation stimulus is interrupted early. Depending on the exact regulation dynamics, the expression level of FNIP1 could change greatly. From previous work [26], we know that T2ECs definitely commit to the differentiation after a precise amount of time after stimulation, but remain able to fully revert to the progenitor state before that. The exact time at which the commitment happens heavily depends on the GRN’s topology and would thus allow to discriminate between different candidate topologies.

Iterating the DoE strategy

Our strategy can be repeated iteratively to further decrease the number of candidates: the set of selected GRNs could now be used as input for the topological analysis step. With a sustained rate of GRN selection of 50 to 70%, about 5 cycles of our strategy would be needed to reduce the original set of 364 GRNs to only 10 candidates. This would dramatically improve our confidence on the topology of the true GRN and thus greatly improve the precision of our predictions.

Finally, one should note that this strategy is not restricted to WASABI-generated GRNs, but is applicable whenever an ensemble of executable models of GRNs can be obtained.

Applicability

Common perturbation design approaches rely on perturbation matrices where numerous experiments that need to be carried out prior to the GRN inference task [27]. Importantly, TopoDoE does not rely on such matrices but instead performs an analysis of already inferred networks to identify a small set of informative perturbation experiments. This makes our strategy easier and less costly to put into practice and can be applied as a complement to existing inference techniques, whether they rely on perturbation matrices or not.

As discussed in this work, TopoDoE however heavily relies on the simulation of sets of candidate GRNs. To apply it in other settings as the one presented in this work, it is thus necessary to have produced ensembles of executable GRN models. To our knowledge, WASABI currently is the only algorithm able to produce both ensembles of GRNs and executable models, making it the ideal test case for our method but also the only current possibility. This shows that the inference of ensembles of executable GRNs is a very valuable characteristic of WASABI.

However, several executable models have been published in the last years, many coming from the field of Boolean networks in which GRNs are executable by nature [28, 29]. Additionally, some models use an internal model of reactions to simulate single cell data accurately [30, 31] as it is done in WASABI. Also, although ensemble inference algorithms are still very uncommon to our knowledge, notable exceptions are being developed, such as [32, 33]. Our DoE strategy still remains applicable by combining inference and simulation methods cited above and the growing interest for executable ensembles of GRNs makes us believe that more such algorithms will become available in the next years.

Conclusions

Inference of GRNs has been at the heart of the systems biology field for decades, with numerous algorithms having been proposed. Their success has however been only partial because of the extreme complexity of the problem. In recent years, important progress has been made by:

exploiting the richness of single-cell transcriptomics data
introducing executable models of gene expression
inferring ensembles of GRN topologies at once

Our strategy was specifically designed for such settings, where the goal—after some GRN inference process has produced and ensemble of executable candidate GRNs—is to identify where information is lacking for a more precise identification of the true GRN topology.

TopoDoE is divised into 4 simple steps, each aimed at 1) reducing the complexity of identifying informative perturbations, 2) generating predictions on the effects of a perturbation and selecting the best perturbation, 3) collection of experimental data after perturbation and finally 4) comparison with predictions for GRN selection.

In this work, we limited our demonstration to single gene knock-outs but our strategy can easily be expanded to any kind of perturbation that can be simulated with the leveraged gene expression model. This, together with its iterative nature, makes us believe that our strategy has the potential to be used by many biologists wishing to refine their knowledge on the GRN they are studying.

It is also important to note that our results confirmed the remarkable efficiency of WASABI. Average predictions from the full ensemble of candidate GRNs proved to be correct for 48 out of the 49 genes in the networks, after the gene knock-out. With corrections made to the identified limitations of the algorithm, this gives us great confidence in WASABI and TopoDoE’s ability to build high quality GRNs, providing us with a tool for efficiently exploring and understanding complex cellular processes and diseases.

Methods

Average number of differences between GRNs

To measure the number of differences between candidate GRNs topologies, we considered GRNs as directed graphs where nodes were the genes and edges were the $\theta$ interaction values. We computed for each pair of GRNs a and b the number $d_{a,b}$ of different $\theta$ values between all pairs of genes i and j :

$$\begin{aligned} d_{a,b}, = \sum _{i = 1}^{G} \sum _{j = 1}^{G} \mathbbm {1}_{\iota (\theta _{i,j}^a) \ne \iota (\theta _{i,j}^b)} \end{aligned}$$

(1)

with

$$\begin{aligned} \iota (\theta ) = {\left\{ \begin{array}{ll} 1, &{} \text {if } \theta > 0 \\ -~1, &{} \text {if } \theta < 0 \\ 0 &{} \text {otherwise} \end{array}\right. } \end{aligned}$$

(2)

where G is the number of genes in the GRNs and $\theta _{i,j}^x$ is the interaction value between genes i and j in GRN x. All $d_{a,b}$ values were finally averaged to obtain the mean number of pairwise differences.

Mechanistic model of gene expression

Simulations were run using a mechanistic model of gene expression described in [18] and based on the two-state model. Briefly, a gene i is described by a promoter $E_i$ which can be in states on or off and randomly switches between those states at rates $k_{on,i}$ and $k_{off,i}$ respectively. When the promoter is active (on state), mRNA molecules ($M_i$) are synthesized at rate $s_0,i$. At any time, proteins ($P_i$) are produced at rate $s_1,i$ from mRNA molecules, mRNAs are degraded at rate $d_0,i$ and proteins are degraded at rate $d_1,i$. The following equations summarize the model:

$$\begin{aligned} {\left\{ \begin{array}{ll} E(t): 0 \xrightarrow {k_{on}} 1, 1 \xrightarrow {k_{off}} 0 \\ M'(t) = s_0 E(t) - d_0 M(t) \\ P'(t) = s_1 M(t) - d_1 P(t) \end{array}\right. } \end{aligned}$$

(3)

Here, the first line of the model describes a discrete, Markov random process. The second and third lines are ordinary differential equations describing the evolution of mRNA (M) and protein (P) levels.

Interactions between a set of S stimuli and G genes in a GRN are encoded by interaction parameters $\theta$ and by letting rates $k_{on,i}$ and $k_{off,i}$ be functions of proteins $P = (P_1,\ldots , P_G)$ and stimuli $Q = (Q_1,\ldots , Q_S)$ levels as described in equation 4 (see Fig. 5).

$$\begin{aligned} k_{on,i}(P, Q) = \frac{k_{on\_min,i} + k_{on\_max,i} \beta _{kon,i} \Phi _i(P,Q)}{1 + \beta _i \Phi _i(P,Q)} \end{aligned}$$

(4)

with

$$\begin{aligned} \Phi _i(P,Q) = \prod _{s=1}^{S} \frac{1 + e^{\theta _Q,i,s} \left( \frac{Q_s}{H_{Q,s}}\right) ^\gamma }{1 + \left( \frac{Q_s}{H_{Q,s}}\right) ^\gamma } \prod _{j=1}^{G} \frac{1 + e^{\theta _P,i,j} \left( \frac{P_j}{H_{P,j}}\right) ^\gamma }{1 + \left( \frac{P_j}{H_{P,j}}\right) ^\gamma } \end{aligned}$$

(5)

In equation 5, $H_{Q,s}$ and $H_{P,j}$ are interaction thresholds for stimuli and proteins respectively, defining the minimal stimulus (or protein) quantity required to have a significant effect on a target gene. The Hill exponent parameter $\gamma$ is set to 4 in all cases. Similarly, the interaction function for the rate $k_{off}$ is given by equation 4 when replacing occurrences of index $_{kon}$ by $_{koff}$. Equation 5 is a modified version of those introduced in [17, 18] to account for multiple stimuli and with the added stimulus threshold parameter $H_{Q,s}$.

For each gene, parameters $\beta _{kon,i}$ and $\beta _{koff,i}$ modify the gene’s basal expression level to account for the constant influence of genes outside of the modelled GRN. $\beta$ values are estimated from experimental single-cell mRNA distributions but their correct identification is challenging and the algorithm used to that end is described in section Simulation balancing.

Knock-Out perturbation implementation

Gene KOs were implemented in the simulation model by setting all $\theta$ interaction values between the perturbed gene and its neighbors to 0. $\beta _{Kon}$ and $\beta _{Koff}$ values were also set to 0 for that gene. During simulation, the probability of promoter activation (E being in state on) was forced to 0 so that the gene would not even be transcribed at the basal level.

During balancing and simulation, reference mRNA and protein counts were systematically set to 0 for the knocked-out gene so that the simulation would start completely devoid of such molecules.

Variance Indices

Per-gene variances in the gene-to-gene and stimulus-to-gene interactions in a set of GRNs (all sharing the same set of genes and stimuli) were computed using a variance index. First, all interaction values were categorized into activation (if the interaction value was greater than 0), inhibition (if the interaction value was lesser than 0) and no-interaction (if the interaction values was equal to 0). Those values were consequently replaced by 1, $-$ 1 and 0 respectively using the $\iota (\theta )$ function defined in equation 2.

The Ancestors Variance Index (AVI) only considers interactions between a given gene and its parents (i.e. the genes regulating that given gene), while the Descendants Variance Index (DVI) only considers interactions with its children (i.e. genes regulated by that gene). Those two indices make it easy to identify if a gene’s interactions vary mostly because of the interactions with genes upstream or downstream. In particular, a gene KO is expected to have an effect on downstream genes.

For each gene i in a collection of GRNs, the indices were defined as:

$$\begin{aligned} AVI_i = \sum _{g=1}^G Var(P_g) \end{aligned}$$

(6)

$$\begin{aligned} DVI_i = \sum _{g=1}^G Var(C_g) \end{aligned}$$

(7)

where $P_g$ is the vectors of interaction values between gene g and its parents and $C_g$ the vector of interactions values with its children. G is the number of genes in the GRNs.

Measure of distance between multivariate distributions

Distances between multivariate distributions (simulated or experimental data) were measured using the Kantorovich distance [21] (also referred to as Wassertein or EMD distance). Because of the high number of variables (i.e. genes), the number of sample points (i.e. cells) was however too low to correctly estimate the multivariate Kantorovich distance. We thus devised a modified distance which computes the sum of Kantorovich distances on marginals (i.e. one variable at the time), making it practically usable.

This distance, named $Kantorovich_{1D}$, has the following form:

$$\begin{aligned} Kantorovich_{1D}\big (D^{(1)}, D^{(2)}\big ) = \sum _{g=1}^G W_2\left( D^{(1)}_g, D^{(2)}_g\right) \end{aligned}$$

(8)

where $D^{(1)}$ and $D^{(2)}$ are 2 multivariate distributions (both with the same G variables) and $W_2$ is the regular Kantorovich distance. $D^{(1)}_g$ and $D^{(2)}_g$ refer to the vector of values in $D^{(1)}$ and $D^{(2)}$ for variable g.

Simulation balancing

Before any simulation could be executed, it was essential to correctly balance it, i.e. constant hyper-parameters had to be chosen such that the initial state produced by the simulation was a desired stable state. Here, parameters $\beta _{Kon}$ and $\beta _{Koff}$ act as adjustment variables which can force genes into high or low expression regimes by increasing or decreasing the value of the interaction function $\phi$.

Finding the correct $\beta$ values is a non trivial task since 2 parameters ($\beta _{Kon}$ and $\beta _{Koff}$) need to be fitted for each gene of the G genes in the GRN. In our case, $49 \times 2 = 98$ parameters needed to be fitted per GRN. Such high dimensional optimization problems suffer from the curse of dimensionality and either converge to a solution in very long times or don’t allow a solution to be found at all.

Fortunately, each gene could be considered independent from the other since the goal of the balancing process was to find $\beta$ values such that the expression level of all genes remained totally unchanged. In this case, all mRNA and protein distributions (apart from that of the gene we are trying to balance) can be considered constant through time, thus transforming a $G \times 2$ dimensional optimization problem into G 2 dimensional problems.

Resolution of these problems was however made difficult by the stochastic nature of the simulation outputs. To that end, we adapted a simulated annealing algorithm to the noisy cost function case as described in [34, 35]. We designed a cost function taking as input a tuple of $\beta _{Kon}$ and $\beta _{Koff}$ values, which executes the simulation of a single gene with those $\beta$ values for 20 h and returns the Kantorovich distance between the simulated data (at t=20 h) and the initial data (at t=0 h). The simulated annealing method is detailed in Additional file 1.

Simulation initialization

After balancing, simulations were initialized by setting mRNA counts for each gene from the distribution of single-cell RT-qPCR data which was used in the GRN inference task. Only data at the initial time point was used here.

Measure of information gained after perturbation

Simulations of perturbations on the set of candidate GRNs predicted effects on varying numbers of genes and significant gene level variations were observed for different proportions of the 364 GRNs. To determine which perturbation carried the most information, we computed the entropy of the proportion of GRNs with significant expression variation for each of the 49 genes. As an example, FNIP1 had significant expression variation for genes SNX22 (in 88 out of the 364 GRNs), SLC25A37 (in 45 GRNs) and ENSGALG00010025565 (in 102 GRNs). We thus computed the entropy using equation 9, where p was the vector $(\frac{88}{364}, \frac{45}{364}, \frac{102}{364}, 0,\ldots , 0)$ (with 46 trailing zeros for the 46 genes with no significant variation) encoding the proportion of GRNs with expression variation.

$$\begin{aligned} -\sum p_k \times log(p_k) \end{aligned}$$

(9)

Statistical tests

Statistical analyses were performed in Python 3.10 [36]. Difference in expression levels across conditions were tested using a one-sided T-test implemented by the ttest_ind function of the scipy package [37]. p-values were corrected using the multipletests function of the statsmodels package [38] using the Benjamini-Hochberg algorithm and default family-wise error rate of 0.05. Entropy values were computed using the entropy function from the scipy package. Box plots were produced with the plotly package [39].

Cell culture

Fertlized SPAFAS white leghorn chicken eggs provided by INRAE (Tours, France) were incubated in our facilities. Embryos were euthanized by decapitation in accordance with the DIRECTIVE 2010/63/EU OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 22 September 2010 on the protection of animals used for scientific purposes, a procedure approved by our local animal ethics committee. After embryo sacrifice, bone marrow cells were harvested by perfusion from tibias of 19-days old embryos [40], seeded into culture and maintained in self-renewal state in LM1 medium [20].

CRISPR plasmids construction

A guide RNA against FNIP1’s sequence (ENSGALG00000017462) was designed using the CRISPOR design tool [41] to target the exon number 5. Oligonucleotides were purchased from Eurogentec (Table 3). The guide was cloned after hU6 promoter into BbsI-digested pCRISPR-P2A-tRNA vector [42]. The efficiency of our CRISPR vector in T2EC cells was confirmed by analyzing mutations after sequencing.

Table 3 Oligonucleotides sequences used for CRISPR plasmid construction

Full size table

Cells transfection

After 12 days in culture, $30 \times 10^5$ cells were resuspended in 100µL of transfection medium (Cell Line Nucleofector Kit V—Amaxa) and transfected with 6,5µg of pCRISPR-P2A-tRNA empty vector or pcrFNIP#9 vector using the T16 K652 program. 500µL of RPMI (RPMI 1640 Medium no phenol red—Gibco) were added to the cell solution for a recovery step of 8 min. Then cells were transferred at $1,25.10^6$ cells/ml in LM1 medium without penicillin and streptomycin and grown in standard culture conditions.

Single cells sorting

24 H after transfection, cells were harvested and resuspended in LM1 medium. Sorting was performed at room temperature using BD FACS Aria 1 flow cytometer. Living GFP-expressing cells were sorted in 96 wells U-shape culture plates containing 50µL of regular LM1. Non-transfected cells were also sorted to be used as a negative control. Plates were then placed back in incubator at 37 °C, with 5% CO₂ (Fig. 3).

Identification of FNIP1 KO clones by sequencing

30 clones were selected 7 days post-sorting and half of the culture was collected for DNA/RNA extraction with Quick-DNA/RNA™ Microprep Plus Kit (Zymo) according to the manufacturer’s protocol. The amplified DNA fragments (with FNIP1-S1/FNIP1-R3 primers (Table 3)) were cloned into the pCR™4-TOPO® TA vector (TOPO™ TA Cloning™ Kit for Sequencing without competent cells—Invitrogen). We selected a clone presenting a frame shift leading to an early stop codon on both alleles for subsequent transcriptomics analysis.

Single-cell RT-qPCR analysis

Individual cells from clones transfected with the pcrFNIP#9 vector (FNIP1 KO cells) or the empty vector clones (wild type cells) were sorted into 96-well plates using BD FACS Aria 1 flow cytometer. All the manipulations related to the high-throughput scRT-qPCR experiments in microfluidics were performed according to the protocol recommended by the Fluidigm company (PN 68000088 K1, p.157–172). All steps from single-cell isolation, gene selection, data generation by scRT-qPCR are described in details in [19].

Availability of data and materials

The datasets analysed during the current study are available in the OSF repository, https://osf.io/r3ujs/. Additionally, TopoDoE is made available as a Python library which can be downloaded from github at https://github.com/Vidium/topodoe. The simulated annealing algorithm modified for noisy cost functions that we used for balancing our simulations is also available as a Python library. Source code can be downloaded from github at https://github.com/Vidium/josiann. The library can also be installed with pip by running ‘pip install josiann‘.

References

Britten RJ, Davidson EH. Gene Regulation for higher cells: a theory—new facts regarding the organization of the genome provide clues to the nature of gene regulation. Science. 1969;165(3891):349–57.
Article CAS PubMed Google Scholar
Jacob F, Monod J. On the regulation of gene activity. In: Cold spring harbor symposia on quantitative biology. vol. 26. Cold Spring Harbor Laboratory Press; 1961. p. 193–211.
Mar JC. The rise of the distributions: why non-normality is important for understanding the transcriptome and beyond. Biophys Rev. 2019. https://doi.org/10.1007/s12551-018-0494-4.
Article PubMed PubMed Central Google Scholar
Munsky B, Neuert G, van Oudenaarden A. Using gene expression noise to understand gene regulation. Science. 2012;336(6078):183–7. https://doi.org/10.1126/science.1216379.
Article CAS PubMed PubMed Central Google Scholar
Nicolas D, Phillips NE, Naef F. What shapes eukaryotic transcriptional bursting? Mol BioSyst. 2017;13(7):1280–90. https://doi.org/10.1039/c7mb00154a.
Article CAS PubMed Google Scholar
Meyer P, Cokelaer T, Chandran D, Kim KH, Loh PR, Tucker G, et al. Network topology and parameter estimation: from experimental design methods to gene regulatory network kinetics using a community based approach. BMC Syst Biol. 2014;8(1):1–18.
Article Google Scholar
Haury AC, Mordelet F, Vera-Licona P, Vert JP. TIGRESS: trustful inference of gene regulation using stability selection. BMC Syst Biol. 2012;6:1–17.
Article Google Scholar
Huynh-Thu VA, Irrthum A, Wehenkel L, Geurts P. Inferring regulatory networks from expression data using tree-based methods. PLoS One. 2010;5(9): e12776.
Article PubMed PubMed Central Google Scholar
Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Favera RD, et al. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. In: BMC bioinformatics. vol. 7. Springer; 2006. p. 1–15.
Singh N, Vidyasagar M. bLARS: an algorithm to infer gene regulatory networks. IEEE/ACM Trans Comput Biol Bioinf. 2015;13(2):301–14.
Article Google Scholar
Küffner R, Petri T, Tavakkolkhah P, Windhager L, Zimmer R. Inferring gene regulatory networks by ANOVA. Bioinformatics. 2012;28(10):1376–82.
Article PubMed Google Scholar
Xiong J, Zhou T. Gene regulatory network inference from multifactorial perturbation data using both regression and correlation analyses. 2012.
Morgan D, Studham M, Tjärnberg A, Weishaupt H, Swartling FJ, Nordling TRE, et al. Perturbation-based gene regulatory network inference to unravel oncogenic mechanisms. Sci Rep. 2020;10(1):14149.
Article CAS PubMed PubMed Central Google Scholar
Dehghannasiri R, Yoon BJ, Dougherty ER. Optimal experimental design for gene regulatory networks in the presence of uncertainty. IEEE/ACM Trans Comput Biol Bioinf. 2014;12(4):938–50.
Article Google Scholar
Ud-Dean SM, Gunawan R. Optimal design of gene knockout experiments for gene regulatory network inference. Bioinformatics. 2016;32(6):875–83.
Article CAS PubMed Google Scholar
Sussex S, Uhler C, Krause A. Near-optimal multi-perturbation experimental design for causal structure learning. Adv Neural Inf Process Syst. 2021;34:777–88.
Google Scholar
Bonnaffoux A, Herbach U, Richard A, Guillemin A, Gonin-Giraud S, Gros PA, et al. WASABI: a dynamic iterative framework for gene regulatory network inference. BMC Bioinform. 2019;20(1):1–19.
Article Google Scholar
Herbach U, Bonnaffoux A, Espinasse T, Gandrillon O. Inferring gene regulatory networks from single-cell data: a mechanistic approach. BMC Syst Biol. 2017;11(1):1–15.
Article Google Scholar
Richard A, Boullu L, Herbach U, Bonnafoux A, Morin V, Vallin E, et al. Single-cell-based analysis highlights a surge in cell-to-cell molecular variability preceding irreversible commitment in a differentiation process. PLoS Biol. 2016;14(12): e1002585.
Article PubMed PubMed Central Google Scholar
Gandrillon O, Schmidt U, Beug H, Samarut J. TGF-$\beta$ cooperates with TGF-$\alpha$ to induce the self-renewal of normal erythrocytic progenitors: evidence for an autocrine mechanism. EMBO J. 1999;18(10):2764–81.
Article CAS PubMed PubMed Central Google Scholar
Vershik AM. Kantorovich metric: initial history and little-known applications. J Math Sci. 2006;133(4):1410–7.
Article Google Scholar
Sutton RS, Barto AG. Reinforcement learning: an introduction. MIT Press; 2018.
Google Scholar
Nguyen H, La H. Review of deep reinforcement learning for robot manipulation. In: 2019 third IEEE international conference on robotic computing (IRC). IEEE; 2019. p. 590–595.
Bresson C, Gandrillon O, Gonin-Giraud S. sca2: a new gene involved in the self-renewal of erythroid progenitors. Cell Prolif. 2008;41:726–38.
Article Google Scholar
Mejia-Pous C, Damiola F, Gandrillon O. Cholesterol synthesis-related enzyme oxidosqualene cyclase is required to maintain self-renewal in primary erythroid progenitors. Cell Prolif. 2011;44(5):441–52. https://doi.org/10.1111/j.1365-2184.2011.00771.x.
Article CAS PubMed PubMed Central Google Scholar
Zreika S, Fourneaux C, Vallin E, Modolo L, Seraphin R, Moussy A, et al. Evidence for close molecular proximity between reverting and undifferentiated cells. BMC Biol. 2022;20(1):155. https://doi.org/10.1186/s12915-022-01363-7.
Article CAS PubMed PubMed Central Google Scholar
Seçilmiş D, Hillerton T, Nelander S, Sonnhammer EL. Inferring the experimental design for accurate gene regulatory network inference. Bioinformatics. 2021;37(20):3553–9.
Article PubMed PubMed Central Google Scholar
Xu H, Ang YS, Sevilla A, Lemischka IR, Ma’ayan A. Construction and validation of a regulatory network for pluripotency and self-renewal of mouse embryonic stem cells. PLoS Comput Biol. 2014;10(8): e1003777.
Article PubMed PubMed Central Google Scholar
Woodhouse S, Piterman N, Wintersteiger CM, Göttgens B, Fisher J. SCNS: a graphical tool for reconstructing executable regulatory networks from single-cell genomic data. BMC Syst Biol. 2018;12:1–7.
Article Google Scholar
Cannoodt R, Saelens W, Deconinck L, Saeys Y. dyngen: a multi-modal simulator for spearheading new single-cell omics analyses. BioRxiv. 2020; p. 2020–02.
Tjärnberg A, Morgan DC, Studham M, Nordling TE, Sonnhammer EL. GeneSPIDER-gene regulatory network inference benchmarking with controlled network and data properties. Mol BioSyst. 2017;13(7):1304–12.
Article PubMed Google Scholar
Ud-Dean SM, Gunawan R. Ensemble inference and inferability of gene regulatory networks. PLoS One. 2014;9(8): e103812.
Article PubMed PubMed Central Google Scholar
Aluru M, Shrivastava H, Chockalingam SP, Shivakumar S, Aluru S. EnGRaiN: a supervised ensemble learning method for recovery of large-scale gene regulatory networks. Bioinformatics. 2022;38(5):1312–9.
Article CAS PubMed Google Scholar
Gutjahr WJ, Pflug GC. Simulated annealing for noisy cost functions. J Glob Optim. 1996;8(1):1–13.
Article Google Scholar
Tóth J, Tomán H, Hajdu A. Efficient sampling-based energy function evaluation for ensemble optimization using simulated annealing. Pattern Recogn. 2020;107: 107510.
Article Google Scholar
Foundation PS.: Python language reference, version 3.10. http://www.python.org.
Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods. 2020;17:261–72. https://doi.org/10.1038/s41592-019-0686-2.
Article CAS PubMed PubMed Central Google Scholar
Seabold S, Perktold J. statsmodels: econometric and statistical modeling with python. In: 9th Python in science conference; 2010. .
Inc PT. Collaborative data science. Montreal: Plotly Technologies Inc. https://plot.ly.
Gandrillon FNO, Michaille J, Roze L, Zile M, Samarut J. C-erbA$\alpha$/T3R and RARs control commitment of hematopoietic self-renewing progenitor cells to apoptosis or differentiation and are antagonized by the v-erbA oncogene. Oncogene. 1994;04(9):749–58.
Google Scholar
Concordet JP, Haeussler M. CRISPOR: intuitive guide selection for CRISPR/Cas9 genome editing experiments and screens. Nucleic Acids Res. 2018;46(W1):W242–5.
Article CAS PubMed PubMed Central Google Scholar
Richard A, Vallin E, Romestaing C, Roussel D, Gandrillon O, Gonin-Giraud S. Erythroid differentiation displays a peak of energy consumption concomitant with glycolytic metabolism rearrangements. PLoS One. 2019;14(9): e0221472.
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We would like to thank all members of the SBDM, Dracula and Vidium teams for their valuable advice and fruitful discussions. We also thank Alice Hugues, Emma Risson and Kaushik Karambelkar for participating as interns to the very early phases of that work. We thank the BioSyL Federation and the LabEx Ecofect (ANR-11-LABX-0048) of the University of Lyon for inspiring scientific events.

Funding

This work was supported by Vidium Solutions (MB and AB) and by Agence Nationale de la Recherche (Grant SinCity; ANR-17-CE12-0031 to OG). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author information

Authors and Affiliations

Laboratoire de Biologie Moléculaire de la Cellule, Lyon, France
Matteo Bouvier, Souad Zreika, Elodie Vallin, Camille Fourneaux, Sandrine Gonin-Giraud & Olivier Gandrillon
Vidium Solutions, Lyon, France
Matteo Bouvier & Arnaud Bonnaffoux
Inria Grenoble, Rhône-Alpes Research Center, Lyon, France
Matteo Bouvier & Olivier Gandrillon

Authors

Matteo Bouvier
View author publications
You can also search for this author in PubMed Google Scholar
Souad Zreika
View author publications
You can also search for this author in PubMed Google Scholar
Elodie Vallin
View author publications
You can also search for this author in PubMed Google Scholar
Camille Fourneaux
View author publications
You can also search for this author in PubMed Google Scholar
Sandrine Gonin-Giraud
View author publications
You can also search for this author in PubMed Google Scholar
Arnaud Bonnaffoux
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Gandrillon
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

MB designed and developed TopoDoE and Josiann. SZ, EV, CF, OG and SG designed the experimental process and generated the data analyzed in this work. MB, AB and OG drafted the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Matteo Bouvier.

Ethics declarations

Ethics approval and consent to participate

According to the directive 2010/63/eu of the European Parliament and of the Council of 22 September 2010 on the protection of animals used for scientific purposes, ‘procedure’ excludes the killing of animals solely for the use of their organs or tissues. Since this is exactly what has been done in the present paper, since no procedure was involved, no approval was required This was confirmed by the CECCAPP (Comité d’éthique en expérimentation animale n°15), our institutional ethics committee.

Consent for publication

Not applicable

Competing interests

The results of this work will be exploited within the frame of the company Vidium Solutions. MB and AB are full time employees of Vidium Solutions. All other authors declare that they do not have any kind of Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Material 1.

Supplementary Material 2.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Bouvier, M., Zreika, S., Vallin, E. et al. TopoDoE: a design of experiment strategy for selection and refinement in ensembles of executable gene regulatory networks. BMC Bioinformatics 25, 245 (2024). https://doi.org/10.1186/s12859-024-05855-x

Download citation

Received: 22 June 2023
Accepted: 03 July 2024
Published: 19 July 2024
DOI: https://doi.org/10.1186/s12859-024-05855-x

TopoDoE: a design of experiment strategy for selection and refinement in ensembles of executable gene regulatory networks

Abstract

Background

Results

Conclusion

Background

Results

Initial setting

Step 1: Topological analysis

Step 2: In silico perturbations

Predictions of gene expression variation

Step 3: In vitro perturbations

Step 4: GRN selection and refinement

Discussion

Validation of the inference algorithm

Identification of WASABI’s limitations

Quality of the predictions

GRN selection and refinement

Expanding to other perturbations

Iterating the DoE strategy

Applicability

Conclusions

Methods

Average number of differences between GRNs

Mechanistic model of gene expression

Knock-Out perturbation implementation

Variance Indices

Measure of distance between multivariate distributions

Simulation balancing

Simulation initialization

Measure of information gained after perturbation

Statistical tests

Cell culture

CRISPR plasmids construction

Cells transfection

Single cells sorting

Identification of FNIP1 KO clones by sequencing

Single-cell RT-qPCR analysis

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Supplementary Information

Supplementary Material 1.

Supplementary Material 2.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us