 Research
 Open access
 Published:
BayesianSSA: a Bayesian statistical model based on structural sensitivity analysis for predicting responses to enzyme perturbations in metabolic networks
BMC Bioinformatics volume 25, Article number: 297 (2024)
Abstract
Background
Chemical bioproduction has attracted attention as a key technology in a decarbonized society. In computational design for chemical bioproduction, it is necessary to predict changes in metabolic fluxes when up/downregulating enzymatic reactions, that is, responses of the system to enzyme perturbations. Structural sensitivity analysis (SSA) was previously developed as a method to predict qualitative responses to enzyme perturbations on the basis of the structural information of the reaction network. However, the network structural information can sometimes be insufficient to predict qualitative responses unambiguously, which is a practical issue in bioproduction applications. To address this, in this study, we propose BayesianSSA, a Bayesian statistical model based on SSA. BayesianSSA extracts environmental information from perturbation datasets collected in environments of interest and integrates it into SSA predictions.
Results
We applied BayesianSSA to synthetic and real datasets of the central metabolic pathway of Escherichia coli. Our result demonstrates that BayesianSSA can successfully integrate environmental information extracted from perturbation data into SSA predictions. In addition, the posterior distribution estimated by BayesianSSA can be associated with the known pathway reported to enhance succinate export flux in previous studies.
Conclusions
We believe that BayesianSSA will accelerate the chemical bioproduction process and contribute to advancements in the field.
Background
Chemical production using microbes, known as chemical bioproduction, has attracted attention as a key technology in a decarbonized society. Chemical bioproduction is expected to be essential for sustainable development [1], such as the production of medicines [2, 3], fuels [4], and foods [5], and the absorption of CO2 [6, 7]. For efficient chemical bioproduction, computational designs of metabolic networks are typically employed to reduce the cost of comprehensive wet lab experiments [8, 9].
One powerful strategy for computational design is to predict changes in metabolic fluxes when up/downregulating enzymatic reactions, that is, responses to enzyme perturbations [10, 11]. Up/downregulation of enzymatic reactions through genetic manipulations, such as modification, overexpression, and knockout [12, 13], can alter the metabolic fluxes. Increasing the flux of the target chemical leads to efficient chemical bioproduction, and prediction is an essential step in this process.
There are two types of computational methods for predicting responses to enzyme perturbations. The first is flux balance analysis (FBA)based methods [14,15,16,17], which maximize an objective function, instead of considering kinetics. When applied to unfamiliar strains, FBAbased methods suffer from dependence on the objective function, which is typically the biomass objective function [18]. Specifically, extensive wet lab experiments need to be conducted to measure phenotypes, such as biomass, of the strain of interest and to identify the objectives of the strain in biological activities [19]. The second is kineticsbased methods, such as sensitivity analysis in kinetic models [20,21,22] and structural sensitivity analysis (SSA) [10, 23]. Sensitivity analysis in kinetic models allows predicting quantitative responses to enzyme perturbations. There are global and local sensitivity analysis [22, 24], and the local sensitivity analysis is the more typical method for applying to kinetic models, which is called metabolic control analysis [25] and has been successfully used in many applications [26,27,28]. To construct a kinetic model, it is necessary to obtain parameters and functional forms of reaction rates of all reactions in the metabolic network of interest under specific environmental conditions [20, 21]. Therefore, parameter estimation is typically essential for unfamiliar strains, whose known information is rarely available [29]. Even though many parameter estimation methods have been developed [30,31,32,33], it can still be a cumbersome process due to the high dimensionality of the parameter space [30, 34]. In contrast, SSA can predict qualitative responses, which are the signs of responses, to enzyme perturbations only from structural information of the metabolic network. SSA does not need to determine the functional forms and parameters of the reaction rates. In other words, SSA removes the burden of parameter determination entirely. Because of its parameterfree nature, SSA could be widely applicable to chemical bioproduction.
SSA is originally a method to predict qualitative responses of a chemical reaction system to perturbations in enzyme amounts (or activities) only from structural information of the reaction network. In SSA, change in the chemical concentrations/fluxes to a perturbation is given in the form of a rational function of “SSA variables,” which are defined as derivatives of reaction rates with respect to chemical concentrations. From their definitions (cf. the “Our model” section), the SSA variables vary depending on environmental conditions, such as aerobic/anaerobic states, pH levels, and nutrient availability, as well as individual differences among microbes. By the SSA theory, whether the rational function is zero or nonzero, indicating the absence or presence of a response, is unambiguously determined from the structural information of the network alone. Furthermore, the sign of the rational function, indicating a positive or negative response, could potentially be determined by considering the signs of SSA variables, which are usually deducible from general considerations or biological knowledge. However, if the range of the rational function happens to include zero, its sign becomes undeterminable. For example, when a rational function is a subtraction of two positive variables, its sign depends on the quantitative values of these two variables. We refer to such structurally undeterminable response predictions as “indefinite predictions.” Indefinite predictions are true as the general consequences of responses drawn from structural information alone, which SSA aims to obtain rather than specific consequences.
However, these indefinite predictions in SSA can present practical challenges in the application to specific species and culture environments for chemical bioproduction, due to the necessity of precisely detecting reactions to up/downregulate. They often arise in complex and intertwined metabolic networks. Indeed, the central metabolic pathway analyzed in this study exhibits many indefinite predictions (Supplementary Table S1). As mentioned above, the SSA variables vary depending on the environmental conditions, fluctuating in accordance with the individual differences among microbes. For specific microbial species and culture environments, constraining the possible values of SSA variables on the basis of environmental information may decrease the number of indefinite predictions. This implies that application methods of SSA using environmental information are practically beneficial for chemical bioproduction.
In this study, we propose BayesianSSA, a Bayesian statistical model that extracts environmental information from perturbation datasets collected in environments of interest and integrates it into SSA predictions. BayesianSSA considers the variables of SSA as stochastic variables, and they are estimated using the perturbation data. Although BayesianSSA introduces new parameters to estimate, it still requires fewer parameters per reaction than kinetic modeling requires. For example, for a onesubstrate reaction, BayesianSSA requires one parameter while kinetic modeling with the MichaelisMenten equation requires two parameters, \(V_{\text{max}}\) and \(K_{\text{m}}\). In addition, BayesianSSA does not need to explore the functional forms of the reaction rates. The introduction of stochastic variables in BayesianSSA brings two additional advantages. First, it allows for the consideration of the uncertainty caused by individual differences among microbes. The variables of SSA may depend not only on environmental conditions but also on individual differences among microbes, and considering their uncertainty may contribute to predictive performance. Second, it enables a probabilistic interpretation of indefinite prediction by positivity confidence values. These values are defined as the probability that the predictive response is positive. We report the results of applying BayesianSSA to synthetic and real datasets of the central metabolic pathway of E. coli. To validate the practicability of BayesianSSA and assess whether BayesianSSA can integrate environmental information into SSA predictions, we compared predictive performances between BayesianSSA and a base method, which is the same as the BayesianSSA model but utilizes an initial prior distribution without incorporating perturbation datasets, as well as a naive Bayes model on these datasets. Utilizing environmental information may enhance predictions for outofsample perturbations, where chemical reactions to be perturbed are not included in the sample used to fit BayesianSSA for a given target chemical. To examine this effect, we evaluated predictions of outofsample perturbations by BayesianSSA.
Theoretical background
Structural sensitivity analysis
Algorithm
We explain the SSA algorithm [10, 11] briefly in this section. Consider the following ordinary differential equation (ODE):
where \(x_m\) is the concentration of the mth metabolite, J is the number of different reactions, \(\nu _{m,j}\) is the (m, j) element of the stoichiometric matrix \(\varvec{\nu }\) that indicates metabolites as rows and reactions as columns, \(F_j(\cdot )\) is the jth reaction rate function, which shows the reaction flux, \(k_j\) is the reaction rate constant of the jth reaction, \(\textbf{x}=(x_1,\ldots ,x_M)^{\text{T}}\) is the vector of metabolite concentrations, and M is the number of different metabolites. Under the situation where the system obeys this ODE and the assumption that \(F_j\) monotonically increases with respect to \(k_j\), the SSA algorithm can derive the rational function of a response to a perturbation of the jth reaction. Here, the perturbation and response are represented as an operation changing \(k_j\) to \(k_j + \delta k_j\) [11] and the change in metabolite concentrations/reaction fluxes from the initial steadystate to the eventual steadystate after the perturbation, respectively.
The first step of the SSA algorithm is to make a matrix \(\textbf{R}(\textbf{r})\). The (j, m) element of \(\textbf{R}(\textbf{r})\), denoted by \(r_{j,m}\), is equal to \(\partial F_j / \partial x_m\). We write nonzero elements of \(\textbf{R}(\textbf{r})\) collectively as \(\textbf{r}\in \mathbb {R}^P\), with P being the total number of nonzero values in \(\textbf{R}(\textbf{r})\). That is, the matrix \(\textbf{R}(\textbf{r})\) indicates the dependence of reactions to metabolites. For example, \(r_{j,m}>0\) if the mth metabolite is a substrate of the jth reaction and \(r_{j,m}=0\) otherwise. Using the matrix \(\textbf{R}(\textbf{r})\), the matrix \(\textbf{A}(\textbf{r})\in \mathbb {R}^{(J+L)\times (M+K)}\) is defined as
where \(\textbf{0}_{L \times K} \in \mathbb {R}^{L \times K}\) is a zero matrix, whose elements are all zero, \(\textbf{C}=(\textbf{c}_1,\ldots ,\textbf{c}_K) \in \mathbb {R}^{J \times K}\), \(\textbf{c}_k\) is the kth basis of \(\text{ker}\,\varvec{\nu }\), which indicates the right null space of \(\varvec{\nu }\), K is the number of the bases of \(\text{ker}\,\varvec{\nu }\), \(\textbf{D}=(\textbf{d}_1,\ldots ,\textbf{d}_L) \in \mathbb {R}^{M \times L}\), \(\textbf{d}_l\) is the lth basis of \(\text{ker}\,\varvec{\nu }^{\text{T}}\), which indicates the right null space of \(\varvec{\nu }\), and L is the number of the bases of \(\text{ker}\,\varvec{\nu }^{\text{T}}\). Note that \(\textbf{A}(\textbf{r})\) is in general a square matrix, i.e., \(J+L=M+K\). A sensitivity matrix is defined as
Here, the (m, j) element of \(\textbf{S}(\textbf{r})\in \mathbb {R}^{(M+K)\times (J+L)}\) is proven to be equal to a constant multiple of a quantitative response of the mth metabolite concentration/flux to a perturbation of the jth reaction/conserved quantity [11]. Even though we calculate the sensitivity matrix, we cannot determine the quantitative response value. However, each element of the sensitivity matrix has the same sign (positive, negative, or zero) as the corresponding quantitative response value, enabling us to discuss qualitative responses. Although flux responses are represented as sets of reactions that are nonzero in \(\textbf{c}_k\), we can obtain a flux response corresponding to each reaction by calculating linear combinations. Let \(\textbf{T}(\textbf{r})\in \mathbb {R}^{J \times (J+L)}\) be a matrix indicating the responses of each reaction flux, which can be written as
where \(\textbf{S}(\textbf{r})_{M+1:M+K,1:J+L} \in \mathbb {R}^{K \times (J+L)}\) is a block matrix of \(\textbf{S}(\textbf{r})\), which is extracted from the \((M+1)\)th to \((M+K)\)th rows.
Qualitative response prediction
SSA can predict qualitative responses (positive, negative, zero, or indefinite) to perturbations using only structural information, which is the metabolic network and the constraints on the \(\textbf{r}\) values. Here, we describe this in a precise way.
Let \(\textbf{Q}(\textbf{r})\in \{1, 0, 1\}^{(M+J) \times (J+L)}\) be the qualitative response matrix, which is given by
where \(\text{sign}(\cdot )\) is the elementwise sign function, which returns a matrix whose element equals 1, 0, and \(1\) if the sign of the corresponding element of the given matrix is positive, zero, and negative, respectively. The responses of the mth metabolite concentration and the jth reaction flux to perturbations correspond to the mth and \((M + j)\)th rows of \(\textbf{Q}(\textbf{r})\), respectively. We refer to the mth row and the jth column of \(\textbf{Q}(\textbf{r})\) as the mth “observation target” and the jth “perturbation target”, respectively. In addition, we call the perturbation experiment/prediction/response for the mth observation and jth perturbation targets the (m, j) experiment/prediction/response.
Since SSA is concerned with general results of qualitative responses, the elements of \(\textbf{Q}(\textbf{r})\) are examined across all possible values of \(\textbf{r}\). It is important to note that there are typically constraints on the \(\textbf{r}\) values, such as \(r_{j,m}>0\), and \(\textbf{Q}(\textbf{r})\) is evaluated within these constraints. Let \(q_{m,j}(\textbf{r})\) denote the (m, j) element of \(\textbf{Q}(\textbf{r})\) (\(m = 1,\ldots , (M+J)\), \(j = 1,\ldots ,(J+L)\)). The qualitative response for each \(q_{m,j}(\textbf{r})\) can be classified into one of four categories; (1) \(q_{m,j}(\textbf{r})\) is zero for any \(\textbf{r}\). This case is a consequence of structural properties, as explained by a theorem known as the law of localization and buffering structures [11, 35]. (2) \(q_{m,j}(\textbf{r})\) is positive for any \(\textbf{r}\). (3) \(q_{m,j}(\textbf{r})\) is negative for any \(\textbf{r}\). (4) \(q_{m,j}(\textbf{r})\) varies depending on the quantitative values of \(\textbf{r}\), making the sign of the response indefinite. For example, including a term \(r_1r_2\) in the symbolic expression corresponding to \(q_{m,j}(\textbf{r})\) makes \(q_{m,j}(\textbf{r})\) positive if \(r_1>r_2\) and negative if \(r_1<r_2\), where \(r_i\) is the ith element of \(\textbf{r}\). These four categories are referred to as zero, positive, negative, and indefinite, respectively.
There are several methods for evaluating \(q_{m,j}(\textbf{r})\) for all possible \(\textbf{r}\). One approach is to perform symbolic calculations, regarding \(\textbf{r}\) as symbolic variables [10]. Although this method is rigorous, it is only practical for relatively small metabolic networks due to its computational complexity. An alternative, more computationally tractable method is to draw a sufficient number of \(\textbf{r}\) samples from an arbitrary probabilistic distribution and numerically evaluating \(\textbf{Q}(\textbf{r})\) [36]. This latter approach has inspired us to introduce stochastic variables into SSA in this study.
Methods
Our model
In SSA, the predictive response is obtained by considering all cases of \(\textbf{r}\). \(\textbf{r}\) depends on \({\left\{ {k_j}\right\} }_j\) and \({\left\{ {x_m}\right\} }_m\), which, in turn, depend on environmental conditions, such as aerobic/anaerobic states, pH levels, and nutrient availability, as well as individual differences among microbes. The SSA qualitative response prediction described in the previous section thus focuses on the general consequences of responses drawn from structural information alone. We propose a Bayesian statistical model based on SSA, named BayesianSSA, to integrate environmental information extracted from perturbation data into SSA predictions. We consider a probability of each \(\textbf{r}\) value, regarding \(\textbf{r}\) as a stochastic variable. The BayesianSSA posterior probability of \(\textbf{r}\) reflects perturbation data and thus extracts environmental information. Predicting responses on the basis of the posterior distribution of \(\textbf{r}\) and positivity confidence values enables us to integrate the environmental information into SSA predictions. A positivity confidence value, which we proposed in this study, indicates the probability that the predictive response is positive. The positivity confidence values enable us to interpret indefinite predictions stochastically. We will describe the details in the “Positivity confidence value” section.
Prior distribution and likelihood
In this section, we describe the likelihood function and the prior distribution of BayesianSSA. Suppose that we have a perturbation dataset \(\textbf{y}\), which shows the signs of experimentally observed responses. We define the perturbation record, an element of \(\textbf{y}\), as follows:
where \(y_i\) is the ith perturbation record, obtained from the \((m_i, j_i)\) experiment, and “\(\text{increase}\)” and “\(\text{decrease}\)” indicate the cases where the observation target increases and decreases when the perturbation target is perturbed, respectively (see the “Qualitative response prediction” section for the definition of observation and perturbation targets). We consider the probability that \(y_i\ne q_{m_i,j_i}(\textbf{r})\) because experimental errors may occur even if \(q_{m,j}(\textbf{r})\) makes the correct prediction, assuming that the likelihood function is the following:
where \(\rho _{m,j}\in (0, 1)\) is a parameter indicating the reliability of the (m, j) experiment, and \(\varvec{\rho }\) is a matrix whose (m, j) element is \(\rho _{m,j}\). The likelihood function means that the probability of \(y_i\) is \(\rho _{m_i,j_i}\) if \(\textbf{r}\) can accurately predict \(y_i\), and is \(1\rho _{m_i,j_i}\) otherwise. In other words, BayesianSSA assumes that the result of the ith experiment can stochastically vary in accordance with the probability of \(\rho _{m_i,j_i}\). \(\rho _{m,j}\) is different for each (m, j), and each (m, j) experiment is assumed to have different reliability in BayesianSSA. This assumption is reasonable because the distribution of measured values is supposed to be different for each (m, j) experiment.
We consider the prior distribution of \(\rho _{m,j}\) as
where \(\mathcal {B}\left( \cdot  a, b \right)\) denotes the beta distribution probability density function with parameters \(a \in \mathbb {R}_{>0}\) and \(b \in \mathbb {R}_{>0}\). The prior distribution of \(\textbf{r}\) should be chosen in accordance with the constraint on \(\textbf{r}\). A typical constraint on \(r_{j,m}\) is \(r_{j,m}>0\) with the mth metabolite being the substrate of the jth reaction, and we consider only such type of constraints in this study. We used a weighted empirical distribution with samples drawn from a lognormal distribution as the prior distribution. Specifically, the prior distribution of \(\textbf{r}\) is the following:
where \(\mathcal{W}\mathcal{E}\left( \textbf{r}=\textbf{r}^{(v)}  \textbf{w} \right)\) denotes the weighted empirical distribution probability mass function with a stochastic variable \(\textbf{r}\), a weight parameter \(\textbf{w}={\left( {w_1, \ldots , w_V}\right) }^{\text{T}}\), and a parameter sample set \({\left\{ {\textbf{r}^{(v)}}\right\} }_{v=1}^{V}\), \(w_v>0\) and \(\sum _{v=1}^{V} w_v = 1\), and V is the size of the sample set. Here, we generated the parameter sample set \({\left\{ {\textbf{r}^{(v)}}\right\} }_{v=1}^{V}\) as follows:
where \(\mathcal{L}\mathcal{N}\left( \varvec{\mu }, \varvec{\Sigma } \right)\) denotes the lognormal distribution with parameters \(\varvec{\mu }\in \mathbb {R}^{P}\) and \(\varvec{\Sigma }\in \mathbb {R}^{P \times P}\) (\(\varvec{\mu }\) and \(\varvec{\Sigma }\) correspond to the mean and covariance matrix parameter of the normal distribution, respectively), and \(a \sim \mathcal {D}\) indicates that a stochastic variable a is drawn from a distribution \(\mathcal {D}\).
The purpose of BayesianSSA is to obtain a better distribution \(p(\textbf{r})\) for calculating positivity confidence values (Eq. (7)). Therefore, we need the marginal posterior distribution \(p(\textbf{r}\textbf{y})\). The marginal likelihood is obtained as
where \(\text{Beta}{\left( {\cdot , \cdot }\right) }\) is the beta function, \(\hat{a}_{m,j,v}=t_{m,j,v}+ a\), \(\hat{b}_{m,j,v}=f_{m,j,v}+ b\), and \(t_{m,j,v}\) and \(f_{m,j,v}\) are the number of true and false (m, j) predictions based on \(\textbf{r}^{(v)}\) in \(\textbf{y}\), respectively. Here, \(t_{m,j,v}=f_{m,j,v}=0\) for a (m, j) experiment that has not been conducted.
If a continuous prior and posterior distribution is desired, one can use them and obtain samples from the posterior distribution using the Markov chain Monte Carlo (MCMC) method. Since we discretized the lognormal distribution (Eq. (4)), we can calculate the posterior distribution without approximation using MCMC (details are described in the “Calculating posterior distribution” section).
Calculating posterior distribution
The marginalized posterior distribution \(p(\textbf{r} \textbf{y})\) in BayesianSSA can be calculated by normalizing the following formula:
We can omit calculating the constant of this formula, and we derive that
where
We implemented these calculations using the logsumexp trick.
Calculating predictive distribution
To evaluate predictive performance, we calculate the predictive probability \(p(\textbf{y}^{\text{new}}\textbf{y})\), where \(\textbf{y}^{\text{new}}\) is a new sample that is not used in BayesianSSA fitting. Using the posterior distribution \(p(\textbf{r}, \varvec{\rho } \textbf{y})\) (Supplementary Section S1), the predictive probability can be obtained as
where
and \(t_{m,j,v}^{\text{new}}\) and \(f_{m,j,v}^{\text{new}}\) are the number of true and false (m, j) predictions based on \(\textbf{r}^{(v)}\) in \(\textbf{y}^{\text{new}}\), respectively. The detailed derivation is described in Supplementary Section S2.
Bayesian updating
Bayesian updating, a procedure where the posterior distribution is used as a prior distribution for the next estimation, can be easily applied to BayesianSSA. In BayesianSSA, updating \(\hat{a}_{m,j,v}\) and \(\hat{b}_{m,j,v}\) to \(\hat{a}_{m,j,v}^{\text{new}}\) and \(\hat{b}_{m,j,v}^{\text{new}}\) is equivalent to Bayesian updating (Supplementary Section S3). This update is also derived from the fact that the final updated posterior distribution in Bayesian updating does not depend on the order of the given perturbation data due to Bayes’ theorem [37].
Positivity confidence value
The introduction of stochastic variables \(\textbf{r}\) enables interpreting indefinite predictions in SSA. We define positivity confidence values as the probabilities that the responses are positive for each indefinite prediction. The positivity confidence value of the (m, j) prediction for a distribution \(p^{\star }(\textbf{r})\) is written as
where \(\mathbb {E}_{p^{\star }(\textbf{r})}[\cdot ]\) denotes the expectation with respect to \(p^{\star }(\textbf{r})\), and \(\mathbb {I}_{a=b}\) is an indicator that is equal to 1 if \(a=b\) and 0 otherwise. A higher positivity confidence value indicates that the qualitative response is more likely to be positive, and the probability of being negative is higher than that of being positive when the positivity confidence value is below 0.5. Note that we use the predictive distributions derived in the “Calculating predictive distribution” section rather than positivity confidence values to evaluate predictive performance on perturbation datasets. This is because positivity confidence values do not consider experimental error.
Used data
Metabolic network information
We utilized the central metabolic pathway of E. coli MG1655 used in a previous study [38]. This metabolic network was originally from the EcoCyc database [39] and modified by Trinh et al. [40] and Toya et al. [38]. We preprocessed this dataset as follows:

1.
Remove the biomass objective function.

2.
Remove metabolites that have no reactions that produce or use them.

3.
Remove reactions that no longer have substrates or products due to the previous processes.

4.
Integrate cytoplasm and extracellular metabolites.
The second step is necessary to apply SSA to the network because it is typically impossible to calculate the inverse of the matrix \(\textbf{A}(\textbf{r})\) (Eq. (1)) when metabolites not involved in the flow are included in the network. Here, metabolites in flow refer to metabolites that serve as both inputs and outputs of reactions included in the network. The fourth step aims to reduce computation time, and this procedure does not alter the results from SSA. After these preprocessing steps, we converted the resulting network into the stoichiometric matrix \(\varvec{\nu }\). Constraints on \(\textbf{R}(\textbf{r})\) were determined solely by the metabolic network, where we set \(r_{j,m}> 0\) if the jth metabolite is a substrate of the mth reaction and \(r_{j,m}= 0\) otherwise. The resulting metabolite and reaction lists are shown in Supplementary Table S2 and Supplementary Table S3. We use the abbreviations in Supplementary Table S2 and Supplementary Table S3 in the following.
Synthetic data generation
We generated synthetic data to compare BayesianSSA with random and base methods. The synthetic perturbation dataset is generated in accordance with a Bernoulli distribution with parameters generated in accordance with beta distributions. Then, we replaced all 0 with \(1\) to match the support of the Bernoulli distribution and the range of the perturbation record \(y_i\) (Eq. (2)). We used GND, PTS, and PPC as the perturbation targets and succinate export (SUCCt) as the observation target. We used a beta distribution with an expectation is close to 0 for GND and PTS and a beta distribution with an expectation is close to 1 for PPC. These distribution settings were in accordance with previous reports [41,42,43,44]. We used \((a, b)=(0.1, 0.3)\) and \((a, b)=(0.3, 0.1)\) as parameters of beta distributions with expectations are close to 0 and 1, respectively. Note that these parameters are used only for the data generation and different from the parameters of the prior distributions, which are weakly informative. We used 30 as the number of records for each perturbation target. The obtained synthetic dataset is shown in Table 1.
Wet lab experiments
We conducted perturbation experiments for the reactions CS, FBP, ICL, LDH, ME1, ME2, PCK, PPC, PPS, and PTA. The genes corresponding to the reactions, listed in Table 2, were introduced into the plasmid vector pLEAD5 (NIPPON GENE CO., LTD.). We amplified the gene sequences using PrimeSTAR GXL DNA Polymerase (Takara Bio Inc.) with E. coli DH5\(\alpha\) (NIPPON GENE CO., LTD.) as a template sequence. The primer sequences were derived from NCBI Genes [45]. The nucleotide sequences of DNAs were analyzed using the BigDye\(\circledR\) Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems) with SeqStudio\(^{\text{TM}}\) Genetic Analyzer (Applied Biosystems). The nucleotide sequence data were processed using GENETYXMac NETWORK software, version 15 (GENETYX CORPORATION). We introduced the constructed plasmid vectors into E. coli JM109 (NIPPON GENE CO., LTD.). The modified strains were aerobically cultured and anaerobically fermented at 37\(^{\circ }\)C in M9 minimal medium (Thermo Fisher Scientific Inc.). To measure succinate concentrations, we used EnzyChrom Succinate Assay Kit (#ESNT100, BioAssay Systems) and the absorbance meter SH8000(CORONA ELECTRIC Co.,Ltd.) at 570 nm. The resulting absorbance, which is a constant multiple of the succinate concentrations, was normalized by the optical density of the bacterial liquid. We calculated the differences between the resulting values and the absorbances of controls, which are of a strain with only pLEAD5, and the signs of the values were used as the real perturbation data. We constructed and measured three replicates for each perturbed reaction. The obtained dataset is shown in Table 2. To validate the overexpression, we performed realtime PCR using QuantStudio5 Realtime PCR system (Thermo Fisher SCIENTIFIC) and confirmed the genes were successfully overexpressed (Supplementary Table S4).
Evaluation
Naive Bayes model
To evaluate the effectiveness of incorporating SSA, we constructed a naive Bayes model as follows:
where \(\eta _{m,j}\in (0, 1)\) is a parameter indicating the probability that \(y_i=1\) where the ith experiment is of (m, j), and \(\varvec{\eta }\) is a matrix whose (m, j) element is \(\eta _{m,j}\). The predictive distribution of one new perturbation record \(y^{\text{new}}\) is as follows:
where \(n_{m,j}^{(\text{p})}\) and \(n_{m,j}^{(\text{n})}\) are the numbers of the (m, j) experiments in \(\textbf{y}\) where \(y_i=1\) and \(y_i=1\), respectively. We used \(\mathcal {B}\left( \eta _{m,j}  1,1 \right)\) as the prior distribution.
Crossentropy loss
To evaluate the performance of BayesianSSA, we used the crossentropy loss as follows:
where N is the number of trials, which means how many times data are added, and \(p_i(\cdot )\) is the ith predictive distribution of BayesianSSA or the naive Bayes model. The ith predictive distribution is calculated using \({\left\{ {y_{i^\prime }}\right\} }_{i^\prime = 1}^{i1}\). We used \(p_t(y^{\text{new}}=1) = 0.5\) and \(p_t(y^{\text{new}}= y_i) = p_1(y^{\text{new}}= y_i)\) as the “random method” and “base method” to calculate the crossentropy loss, respectively, to be compared with BayesianSSA. Here, \(p_1(y^{\text{new}}= y_i)\) indicates the initial distribution of BayesianSSA.
Results
\(m_{\text{name}}\) denotes the index of the observation target in the following. In other words, the \(m_{\text{name}}\)th observation target is a metabolite or a reaction whose name is “name.” Similarly, \(j_{\text{name}}\) denotes the index of the perturbation target “name” in the following. Unless otherwise stated, we used \(V=10000\), \(w_v=\frac{1}{V}\), \(\varvec{\mu }=\textbf{0}_{P}\), \(\varvec{\Sigma }=\textbf{I}_{P \times P}\), and \((a, b)=(3, 1)\) as the hyperparameters of BayesianSSA where \(\textbf{0}_{P}\) is the Pdimensional zero vector, and \(\textbf{I}_{P \times P}\) is the \(P \times P\) identity matrix.
Performance evaluation on synthetic dataset
To compare predictive performance between BayesianSSA and the base method under an ideal situation that the experimental error is low, we examined the crossentropy loss \(\text{CE}(N)\) (cf. the “Crossentropy loss” section) on the synthetic dataset shown in Table 1. Figure 1 shows the crossentropy loss trajectory for each method. While crossentropy loss values at the last trial of BayesianSSA with \((a,b)=(9,1)\), \((a,b)=(6,2)\), \((a,b)=(3,1)\), and \((a,b)=(2,1)\) are 19.6, 23.3, 21.4, and 22.3, respectively, those of the random and base methods are 62.4 and 66.6, respectively. BayesianSSA outperformed the random and base method from the perspective of prediction accuracy under the ideal situation.
Positivity confidence value transition on pseudo data
To examine the transition of positivity confidence values, we applied BayesianSSA to a pseudo perturbation dataset (shown in Table 3) based on the previous studies [41,42,43]. Figure 2 shows the transition of positivity confidence values for increasing SUCCt, i.e., \(c_p(m_{\text{SUCCt}}, \cdot )\), when fitting BayesianSSA to the pseudo perturbation dataset. We found that the positivity confidence values were updated for multiple reactions rather than one reaction. The first update (from Fig. 2a to b) shows the EDA (6PG \(\rightarrow\) G3P + PYR) positivity confidence value for the production of succinate becomes higher. The second update (from Fig. 2b to c) shows the reactions included by the flux from FBP to PEP become lower. One of the reaction candidates to increase the SUCCt flux was PPC, whose reaction formula was PEP + CO2 \(\rightarrow\) OAA. While the initial PPC positivity confidence value was 0.714 (Supplementary Table S6), the updated PPC positivity confidence value was 0.869 (Supplementary Table S7). A previous report showed the succinate production of E. coli increases when PPC was upregulated [44], and the PPC positivity confidence value estimated by BayesianSSA is consistent with the report. Similarly, those of GND and PTS were updated from 0.530 and 0.779 to \(7.84\times 10^{3}\) and \(3.23\times 10^{2}\), respectively. Although these results are also consistent with the reports [41,42,43], they are naive because the information of these reports was used directly to fit BayesianSSA. The initial positivity confidence values, which were calculated on the basis of the prior distribution, and positivity confidence values updated with the pseudo perturbation dataset are shown in Supplementary Table S6 and S7, respectively.
Performance evaluation on real data
To compare the performances between BayesianSSA and the naive Bayes model, which uses only data without SSA, we examined the crossentropy loss \(\text{CE}(N)\) (cf. the “Crossentropy loss” section). Figure 3 shows the crossentropy loss trajectory for each method. The perturbation records were randomly shuffled and used in each trial. The loss of BayesianSSA is comparable to that of the base method in the early trials but smaller in the late trials. While the crossentropy loss values at the last trial of BayesianSSA with \((a,b)=(9,1)\), \((a,b)=(6,2)\), \((a,b)=(3,1)\), and \((a,b)=(2,1)\) are 14.8, 15.7, 15.4, and 16.1, respectively, those of the random method, the base method, and the naive Bayes model are 20.8, 19.0, and 17.2, respectively. BayesianSSA outperformed the random and base methods and the naive Bayes model from the perspective of prediction accuracy. While BayesianSSA made a prediction on the basis of the perturbation dataset and SSA, the prediction of the base method is only based on SSA. This result suggests that BayesianSSA can integrate environmental information of the real dataset into SSA predictions. Similarly, the difference between BayesianSSA and the naive Bayes is whether or not SSA is incorporated. This result also indicates the practicability of BayesianSSA and that incorporating SSA into statistical models improves predictive performance.
The main difference between BayesianSSA and the naive Bayes model is in the predictions for outofsample perturbations, which are of new (m, j) experiments. To calculate the predictive distribution of a (m, j) experiment, the naive Bayes model uses only the results of (m, j) experiments (Eq. (8)) while BayesianSSA uses the results of all the \((m, j) \in \Lambda\) experiments (Eq. (6)). That is, BayesianSSA can leverage data to predict the responses to outofsample perturbations through the \(\textbf{r}\) posterior distribution. To validate the predictive performance for outofsample perturbations, we examined the predictive probabilities by splitting the real dataset. For example, the predictive probabilities of the three replicates for the reaction CS were calculated by fitting BayesianSSA to the real dataset except for CS. Figure 4 shows the distribution of predictive probabilities calculated by BayesianSSA for outofsample perturbations. Here, each replicate was evaluated separately. The median of the distribution was 0.67, which is better than random (0.5), and this result indicates that fitting BayesianSSA contributes to the predictive performance for outofsample perturbations.
Positivity confidence values on real data
To interpret the BayesianSSA estimation results, we examined the positivity confidence values after BayesianSSA was fitted to the real data whose observation target is the succinate export flux. Figure 5 shows the positivity confidence values in the metabolic network. All positivity confidence values updated by the real perturbation dataset are shown in Supplementary Table S8. We found high positivity confidence values of THD_r and NDH (\(c_p(m_{\text{SUCCt}}, j_{\mathrm {THD\_r}})=0.95\), \(c_p(m_{\text{SUCCt}}, j_{\text{NDH}})=0.92\)), which both use NADH. NDH produces Q8H2, which is required for FRD. In the tricarboxylic acid (TCA) cycle, ICL, MDH, FRD, FUM_r, and CS show high positivity confidence values (\(>0.85\)). We also applied BayesianSSA with another prior distribution of \(\textbf{r}\), which is based on lognormal distributions with random parameters, to the real dataset (Supplementary Figure S1). All positivity confidence values using this prior distribution are shown in Supplementary Table S9. The reactions with \(c_{p}(m,j)>0.85\) are almost shared in the two BayesianSSA results. The only difference is the presence or absence of NDH (cf. Supplementary Table S8 and Supplementary Table S9). These results suggest that BayesianSSA is robust to changes in the prior distributions of \(\textbf{r}\). We have also examined other observation targets besides succinate export flux and found that they can also be updated to high positivity confidence values (Supplementary Figure S2 and S3).
Positivity confidence values calculated by the BayesianSSA posterior distribution were consistent with previous reports. The positivity confidence values of CS and ICL, which are included in the glyoxylate pathway, were high. The glyoxylate pathway was reported as an essential pathway for succinate production [46, 47]. Similarly, the reductive pathway, which includes FUM_r and FRD, was reported as another essential pathway [46, 47]. Despite these consistencies, other previous reports for several reactions, such as PPC [44] and PTS [42, 43] are inconsistent with our results.
Posterior distribution of \(\textbf{r}\)
To examine the differences between the prior and posterior distribution of \(\textbf{r}\), we compared \(p(\textbf{r} \textbf{y})\) with \(p(\textbf{r})\). Figure 6 shows \(p(\textbf{r})\) and \(p(\textbf{r} \textbf{y})\) on the first and second principal components of \({\left\{ {\textbf{r}^{(v)}}\right\} }_{v=1}^{V}\). We found that the posterior distribution had several peaks where the prior distribution only had one peak. This result indicates that the distribution of \(\textbf{r}\) is tailored to several cases of environments in which the used dataset was collected.
Discussion
We demonstrated the effectiveness of BayesianSSA on the basis of predictive performance using the real perturbation dataset where the observation target is the succinate export flux (SUCCt). BayesianSSA outperformed the base method (Fig. 3), which is not fitted to the dataset, and this result showed BayesianSSA could integrate environmental information into SSA predictions. For validation of practicability, BayesianSSA also outperformed the random method and the naive Bayes model, and the performance of BayesianSSA for outofsample perturbations, which are of new (m, j) experiments, was better than random. These results show that BayesianSSA can consider the relationships between different (m, j) perturbations through \(\textbf{r}\), and that this consideration contributes to predictive performance.
We considered \(\rho _{m,j}\) as a parameter and set the prior distribution in this study. As another option, \(\rho _{m,j}\) can be given by an error rate of the measurement equipment used for the perturbation experiment. For example, consider a case where the error distribution of the used measurement equipment is a normal distribution with a mean parameter that equals the true value and a variance parameter \(\sigma ^2 = 1\) as an error distribution. If the experimental value obtained by the perturbation is 1, the probability that the true value is less than zero is approximately 16%. Therefore, setting \(1\rho _{m,j}=0.16\) can make BayesianSSA consider the error distribution of the measurement equipment. In this way, we can set a certain value of \(\rho _{m,j}\). Note that we can easily calculate the posterior distribution of this model (Supplementary Section S4).
There are two directions for future work related to \(\textbf{r}\). First, the updated \(p(\textbf{r})\) may be used for response predictions in another metabolic network. When the reaction rate function \(F_j\) and the probability distribution of \(\textbf{x}\) are equal between the two metabolic networks, \(p(r_{j,m})\) can be used in another system that includes the ith reaction and the mth metabolite. Second, as previously discussed [10], we can consider allosteric regulation. Allosteric regulation is a type of regulation that increases/decreases reaction rates as a metabolite concentration increases [48, 49]. We can easily consider allosteric regulation by setting \(r_{j,m}\ne 0\). Technically, \(r_{j,m}< 0\) can be implemented by reversing the sign of \(r_{j,m}\) after sampling \(\textbf{r}^{(v)}\). However, we need to know the (j, m) pairs that have allosteric regulation in advance, and we omitted considering allosteric regulations in this study.
There is room for choice regarding the prior distribution of \(\textbf{r}\). First, continuous distributions can be adopted. We used the empirical distribution with samples from lognormal distributions as the \(\textbf{r}\) prior distribution for all experiments (cf. Eq. (4)). As long as the constraint on \(\textbf{r}\) is satisfied, other distributions can be adopted. However, the likelihood function changes discretely (cf. Eq. (3)), and the advantage of adopting a continuous distribution with employing MCMC methods may be limited. Second, ensemble approaches for several types of prior distributions may be effective. As shown in Fig. 5 and Supplementary Figure S1, the effect of the prior distribution is not negligible when dealing with a limited sample size. Using several types of prior distributions may contribute to making robust predictions.
Although we omitted the biomass production processes in this study, considering them can improve the representation of real biological activity. One commonly used approach in FBA involves optimizing the biomass objective function [18, 19]. However, since the biomass objective function depends on the specific strains and environmental conditions [50], it is difficult to use the biomass objective function when analyzing unfamiliar strains. Another difficulty is that the biomass objective function is a pseudoreaction that contains multiple reactions and cannot be treated kinetically. Unlike FBAbased methods, which can consider biomass production processes as a single reaction, kineticsbased methods including BayesianSSA need to faithfully model the biomass production process. That is, it is necessary to define the biomass production rate equations as in a previous study [21].
Utilizing the positivity confidence value calculation (the “Positivity confidence value” section) and Bayesian updating (the “Bayesian updating” section) in BayesianSSA, we can construct an iterative designbuildtestlearn (DBTL) cycle [51] on the basis of BayesianSSA for proposals of reactions to be perturbed. Specifically, the procedure is as follows:

1.
Calculate positivity confidence values by Eq. (7).

2.
Obtain a proposal of which perturbation and observation targets are validated in accordance with positivity confidence values.

3.
Conduct perturbation experiments in accordance with the proposal.

4.
Update the posterior distributions by Eq. (5).

5.
Return to the first step.
One advantage of this scheme is the high efficiency because the experimental validation of proposals obtained by BayesianSSA is also a process collecting data for updating the BayesianSSA posterior distribution.
Conclusions
In this study, we proposed BayesianSSA, a Bayesian statistical model based on SSA. SSA was previously developed as a method to predict qualitative responses to enzyme perturbations on the basis of the structural information of the reaction network. However, the network structural information can sometimes be insufficient to predict qualitative responses unambiguously, which is a practical issue in bioproduction applications. To address this, BayesianSSA extracts environmental information from perturbation datasets collected in environments of interest and integrates it into SSA predictions. We applied BayesianSSA to synthetic and real datasets of the central metabolic pathway of E. coli. As a result, BayesianSSA outperformed the base method, which is the same as the BayesianSSA model but utilizes an initial prior distribution without incorporating perturbation datasets. This result shows that BayesianSSA can successfully integrate environmental information extracted from perturbation data into SSA predictions. In addition, the positivity confidence values estimated by BayesianSSA for increasing the succinate export flux were consistent with the known pathways reported to enhance the flux in previous studies. We believe that BayesianSSA will accelerate the chemical bioproduction process and contribute to advancements in the field.
Availability of data and materials
Supplementary material is available from the journal website. The implementation of the algorithm is available on GitHub (https://github.com/shionhosodahitachi/BayesianSSA).
References
Ögmundarson Ó, Sukumara S, Herrgård MJ, Fantke P. Combining environmental and economic performance for bioprocess optimization. Trends Biotechnol. 2020;38(11):1203–14. https://doi.org/10.1016/j.tibtech.2020.04.011.
Tsopanoglou A, Jiménez del Val I. Moving towards an era of hybrid modelling: advantages and challenges of coupling mechanistic and datadriven models for upstream pharmaceutical bioprocesses. Curr Opin Chem Eng. 2021;32: 100691. https://doi.org/10.1016/j.coche.2021.100691.
Rathore AS, Mishra S, Nikita S, Priyanka P. Bioprocess control: current progress and future perspectives. Life. 2021;11(6):557. https://doi.org/10.3390/life11060557.
Behera B, Unpaprom Y, Ramaraj R, Maniam GP, Govindan N, Paramasivan B. Integrated biomolecular and bioprocess engineering strategies for enhancing the lipid yield from microalgae. Renew Sustain Energy Rev. 2021;148: 111270. https://doi.org/10.1016/j.rser.2021.111270.
Voigt CA. Synthetic biology 2020–2030: six commerciallyavailable products that are changing our world. Nat Commun. 2020;11(1):6379. https://doi.org/10.1038/s41467020201222.
Keasling J, Garcia Martin H, Lee TS, Mukhopadhyay A, Singer SW, Sundstrom E. Microbial production of advanced biofuels. Nat Rev Microbiol. 2021;19(11):701–15. https://doi.org/10.1038/s4157902100577w.
Zhuang X, Zhang Y, Xiao AF, Zhang A, Fang B. Applications of synthetic biotechnology on carbon neutrality research: a review on electrically driven microbial and enzyme engineering. Front Bioeng Biotechnol. 2022;10: 826008.
Wang L, Dash S, Ng CY, Maranas CD. A review of computational tools for design and reconstruction of metabolic pathways. Synth Syst Biotechnol. 2017;2(4):243–52. https://doi.org/10.1016/j.synbio.2017.11.002.
Sveshnikova A, MohammadiPeyhani H, Hatzimanikatis V. Computational tools and resources for designing new pathways to small molecules. Curr Opin Biotechnol. 2022;76: 102722. https://doi.org/10.1016/j.copbio.2022.102722.
Mochizuki A, Fiedler B. Sensitivity of chemical reaction networks: a structural approach. 1. Examples and the carbon metabolic network. J Theor Biol. 2015;367:189–202. https://doi.org/10.1016/j.jtbi.2014.10.025.
Okada T, Mochizuki A. Sensitivity and network topology in chemical reaction systems. Phys Rev E. 2017;96(2): 022322. https://doi.org/10.1103/PhysRevE.96.022322.
Fisher AK, Freedman BG, Bevan DR, Senger RS. A review of metabolic and enzymatic engineering strategies for designing and optimizing performance of microbial cell factories. Comput Struct Biotechnol J. 2014;11(18):91–9. https://doi.org/10.1016/j.csbj.2014.08.010.
Guo W, Sheng J, Feng X. Minireview: in vitro metabolic engineering for biomanufacturing of highvalue products. Comput Struct Biotechnol J. 2017;15:161–7. https://doi.org/10.1016/j.csbj.2017.01.006.
Pharkya P, Maranas CD. An optimization framework for identifying reaction activation/inhibition or elimination candidates for overproduction in microbial systems. Metab Eng. 2006;8(1):1–13. https://doi.org/10.1016/j.ymben.2005.08.003.
Ranganathan S, Suthers PF, Maranas CD. OptForce: an optimization procedure for identifying all genetic manipulations leading to targeted overproductions. PLoS Comput Biol. 2010;6(4):1000744. https://doi.org/10.1371/journal.pcbi.1000744.
McAnulty MJ, Yen JY, Freedman BG, Senger RS. Genomescale modeling using flux ratio constraints to enable metabolic engineering of clostridial metabolism in silico. BMC Syst Biol. 2012;6(1):42. https://doi.org/10.1186/17520509642.
Jiang S, OteroMuras I, Banga JR, Wang Y, Kaiser M, Krasnogor N. OptDesign: identifying optimum design strategies in strain engineering for biochemical production. ACS Synth Biol. 2022;11(4):1531–41. https://doi.org/10.1021/acssynbio.1c00610.
Orth JD, Thiele I, Palsson BØ. What is flux balance analysis? Nat Biotechnol. 2010;28(3):245–8. https://doi.org/10.1038/nbt.1614.
Feist AM, Palsson BO. The biomass objective function. Curr Opin Microbiol. 2010;13(3):344–9. https://doi.org/10.1016/j.mib.2010.03.003.
Usuda Y, Nishio Y, Iwatani S, Van Dien SJ, Imaizumi A, Shimbo K, Kageyama N, Iwahata D, Miyano H, Matsui K. Dynamic modeling of Escherichia coli metabolic and regulatory systems for aminoacid production. J Biotechnol. 2010;147(1):17–30. https://doi.org/10.1016/j.jbiotec.2010.02.018.
Kurata H, Sugimoto Y. Improved kinetic model of Escherichia coli central carbon metabolism in batch and continuous cultures. J Biosci Bioeng. 2018;125(2):251–7. https://doi.org/10.1016/j.jbiosc.2017.09.005.
Yasemi M, Jolicoeur M. Modelling cell metabolism: a review on constraintbased steadystate and kinetic approaches. Processes. 2021;9(2):322. https://doi.org/10.3390/pr9020322.
Mochizuki A. A structural approach to understanding enzymatic regulation of chemical reaction networks. Biochem J. 2022;479(11):1265–83. https://doi.org/10.1042/BCJ20210545.
Di Maggio J, Diaz Ricci JC, Soledad Diaz M. Global sensitivity analysis in dynamic metabolic networks. In: Jeżowski J, Thullie J, editors. Computer aided chemical engineering. 19 European symposium on computer aided process engineering, vol. 26. Hoboken: Elsevier; 2009. p. 1075–80. https://doi.org/10.1016/S15707946(09)701798.
Kacser H. The control of flux. Symp Soc Exp Biol. 1973;28:65–101.
Wildermuth MC. Metabolic control analysis: biological applications and insights. Genome Biol. 2000;1(6):1031–1. https://doi.org/10.1186/gb200016reviews1031.
Rizk ML, Liao JC. Ensemble modeling for aromatic production in Escherichia coli. PLoS ONE. 2009;4(9):6903. https://doi.org/10.1371/journal.pone.0006903.
Nishiguchi H, Hiasa N, Uebayashi K, Liao J, Shimizu H, Matsuda F. Transomics datadriven, ensemble kinetic modeling for systemlevel understanding and engineering of the cyanobacteria central metabolism. Metab Eng. 2019;52:273–83. https://doi.org/10.1016/j.ymben.2019.01.004.
Strutz J, Martin J, Greene J, Broadbelt L, Tyo K. Metabolic kinetic modeling provides insight into complex biological questions, but hurdles remain. Curr Opin Biotechnol. 2019;59:24–30. https://doi.org/10.1016/j.copbio.2019.02.005.
Saa PA, Nielsen LK. Construction of feasible and accurate kinetic models of metabolism: a Bayesian approach. Sci Rep. 2016;6(1):29635. https://doi.org/10.1038/srep29635.
John PCS, Strutz J, Broadbelt LJ, Tyo KEJ, Bomble YJ. Bayesian inference of metabolic kinetics from genomescale multiomics data. PLoS Comput Biol. 2019;15(11):1007424. https://doi.org/10.1371/journal.pcbi.1007424.
Gopalakrishnan S, Dash S, Maranas C. KFIT: an accelerated kinetic parameterization algorithm using steadystate fluxomic data. Metab Eng. 2020;61:197–205. https://doi.org/10.1016/j.ymben.2020.03.001.
Zhang X, Su Y, Lane AN, Stromberg AJ, Fan TWM, Wang C. Bayesian kinetic modeling for tracerbased metabolomic data. BMC Bioinform. 2023;24(1):108. https://doi.org/10.1186/s12859023052115.
Choudhury S, Moret M, Salvy P, Weilandt D, Hatzimanikatis V, Miskovic L. Reconstructing kinetic models for dynamical studies of metabolism using generative adversarial networks. Nat Mach Intell. 2022;4(8):710–9. https://doi.org/10.1038/s4225602200519y.
Okada T, Mochizuki A. Law of localization in chemical reaction networks. Phys Rev Lett. 2016;117(4): 048101. https://doi.org/10.1103/PhysRevLett.117.048101.
Hishida A, Okada T, Mochizuki A. Patterns of change in regulatory modules of chemical reaction systems induced by network modification. PNAS Nexus. 2023. https://doi.org/10.1093/pnasnexus/pgad441.
Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. Bayesian data analysis. Boca Raton: CRC Press; 2013. p. 9.
Toya Y, Shiraki T, Shimizu H. SSDesign: computational metabolic pathway design based on flux variability using elementary flux modes. Biotechnol Bioeng. 2015;112(4):759–68. https://doi.org/10.1002/bit.25498.
Keseler IM, ColladoVides J, GamaCastro S, Ingraham J, Paley S, Paulsen IT, PeraltaGil M, Karp PD. EcoCyc: a comprehensive database resource for Escherichia coli. Nucleic Acids Res. 2005;33(Database Issue):334–7. https://doi.org/10.1093/nar/gki108.
Trinh CT, Unrean P, Srienc F. Minimal Escherichia coli cell for the most efficient production of ethanol from hexoses and pentoses. Appl Environ Microbiol. 2008;74(12):3634–43. https://doi.org/10.1128/AEM.0270807.
Mienda BS, Shamsir MS, Illias RM. Modelguided metabolic gene knockout of gnd for enhanced succinate production in Escherichia coli from glucose and glycerol substrates. Comput Biol Chem. 2016;61:130–7. https://doi.org/10.1016/j.compbiolchem.2016.01.013.
Chatterjee R, Millard CS, Champion K, Clark DP, Donnelly MI. Mutation of the ptsG gene results in increased production of succinate in fermentation of glucose by Escherichia coli. Appl Environ Microbiol. 2001;67(1):148–54. https://doi.org/10.1128/AEM.67.1.148154.2001.
Liang Q, Zhang F, Li Y, Zhang X, Li J, Yang P, Qi Q. Comparison of individual component deletions in a glucosespecific phosphotransferase system revealed their different applications. Sci Rep. 2015;5(1):13200. https://doi.org/10.1038/srep13200.
Millard CS, Chao YP, Liao JC, Donnelly MI. Enhanced production of succinic acid by overexpression of phosphoenolpyruvate carboxylase in Escherichia coli. Appl Environ Microbiol. 1996;62(5):1808–10. https://doi.org/10.1128/aem.62.5.18081810.1996.
...Sayers EW, Bolton EE, Brister JR, Canese K, Chan J, Comeau DC, Connor R, Funk K, Kelly C, Kim S, Madej T, MarchlerBauer A, Lanczycki C, Lathrop S, Lu Z, ThibaudNissen F, Murphy T, Phan L, Skripchenko Y, Tse T, Wang J, Williams R, Trawick BW, Pruitt KD, Sherry ST. Database resources of the national center for biotechnology information. Nucleic Acids Res. 2021;50(D1):20–6. https://doi.org/10.1093/nar/gkab1112.
Vemuri GN, Eiteman MA, Altman E. Effects of growth mode and pyruvate carboxylase on succinic acid production by metabolically engineered strains of Escherichia coli. Appl Environ Microbiol. 2002;68(4):1715. https://doi.org/10.1128/AEM.68.4.17151727.2002.
van Heerden CD, Nicol W. Continuous and batch cultures of Escherichia coli KJ134 for succinic acid fermentation: metabolic flux distributions and production characteristics. Microb Cell Fact. 2013;12(1):80. https://doi.org/10.1186/147528591280.
Laskowski RA, Gerick F, Thornton JM. The structural basis of allosteric regulation in proteins. FEBS Lett. 2009;583(11):1692–8. https://doi.org/10.1016/j.febslet.2009.03.019.
Changeux JP. 50th anniversary of the word “allosteric’’. Prot Sci. 2011;20(7):1119–24. https://doi.org/10.1002/pro.658.
Simensen V, Schulz C, Karlsen E, Bråtelund S, Burgos I, Thorfinnsdottir LB, GarcíaCalvo L, Bruheim P, Almaas E. Experimental determination of Escherichia coli biomass composition for constraintbased metabolic modeling. PLoS ONE. 2022;17(1):0262450. https://doi.org/10.1371/journal.pone.0262450.
Liao X, Ma H, Tang YJ. Artificial intelligence: a solution to involution of designbuildtestlearn cycle. Curr Opin Biotechnol. 2022;75: 102712. https://doi.org/10.1016/j.copbio.2022.102712.
Acknowledgements
We would like to thank Y. Mizunuma for her technical assistance with the wet lab experiments. We also thank Dr. K. Yokoyama for technical guidance. We appreciate the advice and technical support with the wet lab experiments from Dr. T. Takeya. We are grateful to Dr. A. Kandori, Dr. K. Watanabe, and Dr. S. Yabuuchi for their guidance throughout our project. This research did not receive any specific grant from funding agencies in the public, commercial, or notforprofit sectors.
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
SH, TO, AM, and MS conceptualized this study. SH devised the model, designed the algorithms, implemented the software, and performed all the computational experiments. HI performed all the wet lab experiments. TM, HI, and MT established the wet lab experimental procedures. SH, MS, TO, and AM interpreted the computational results. SH and MS investigated previous researches for biological insights. SH wrote the draft. TO, AM, MS, MT, HI, and TM revised the manuscript critically. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons AttributionNonCommercialNoDerivatives 4.0 International License, which permits any noncommercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/byncnd/4.0/.
About this article
Cite this article
Hosoda, S., Iwata, H., Miura, T. et al. BayesianSSA: a Bayesian statistical model based on structural sensitivity analysis for predicting responses to enzyme perturbations in metabolic networks. BMC Bioinformatics 25, 297 (2024). https://doi.org/10.1186/s12859024059214
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12859024059214