Skip to main content
  • Methodology article
  • Open access
  • Published:

Estimating the size of the solution space of metabolic networks

Abstract

Background

Cellular metabolism is one of the most investigated system of biological interactions. While the topological nature of individual reactions and pathways in the network is quite well understood there is still a lack of comprehension regarding the global functional behavior of the system. In the last few years flux-balance analysis (FBA) has been the most successful and widely used technique for studying metabolism at system level. This method strongly relies on the hypothesis that the organism maximizes an objective function. However only under very specific biological conditions (e.g. maximization of biomass for E. coli in reach nutrient medium) the cell seems to obey such optimization law. A more refined analysis not assuming extremization remains an elusive task for large metabolic systems due to algorithmic limitations.

Results

In this work we propose a novel algorithmic strategy that provides an efficient characterization of the whole set of stable fluxes compatible with the metabolic constraints. Using a technique derived from the fields of statistical physics and information theory we designed a message-passing algorithm to estimate the size of the affine space containing all possible steady-state flux distributions of metabolic networks. The algorithm, based on the well known Bethe approximation, can be used to approximately compute the volume of a non full-dimensional convex polytope in high dimensions. We first compare the accuracy of the predictions with an exact algorithm on small random metabolic networks. We also verify that the predictions of the algorithm match closely those of Monte Carlo based methods in the case of the Red Blood Cell metabolic network. Then we test the effect of gene knock-outs on the size of the solution space in the case of E. coli central metabolism. Finally we analyze the statistical properties of the average fluxes of the reactions in the E. coli metabolic network.

Conclusion

We propose a novel efficient distributed algorithmic strategy to estimate the size and shape of the affine space of a non full-dimensional convex polytope in high dimensions. The method is shown to obtain, quantitatively and qualitatively compatible results with the ones of standard algorithms (where this comparison is possible) being still efficient on the analysis of large biological systems, where exact deterministic methods experience an explosion in algorithmic time. The algorithm we propose can be considered as an alternative to Monte Carlo sampling methods.

Background

Cellular metabolism is a complex biological problem. It can be viewed as a chemical engine that transforms available raw materials into energy or into the building blocks needed for the biological function of the cells. In more specific terms a metabolic network is indeed a processing system transforming input metabolites (nutrients), into output metabolites (amino acids, lipids, sugars etc.) according to very strict molecular proportions, often referred as stoichiometric coefficients of the reactions.

Although the general topological properties of these networks are well characterized [13], and non-trivial pathways are well known for many species [4] the cooperative role of these pathways is hard to comprehend. In fact, the large sizes of these networks, usually containing hundreds of metabolites and even more reactions, makes the comprehension of the principles that govern their global function a challenging task. Therefore, a necessary step to achieve this goal is the use of mathematical models and the development of novel statistical techniques to characterize and simulate these networks.

It is well known that under evolutionary pressure, prokaryote cells like E. coli behave optimizing their growth performance [5]. Flux Balance Analysis (FBA) provides a powerful tool to predict the optimal growth and production fluxes, but is not reliable about phenotypic state the cell will acquire. This is mainly due to the fact that among the infinitely many potential network states compatible with the stoichiometric constraints, FBA chooses a single one whose biological meaning is at least questionable under generic external conditions. FBA maximizes a linear function (usually the growth rate of the cell) subject to biochemical and thermodynamic constraints [6]. On the other hand, cells with genetically engineered knockouts or bacterial strains that were not exposed to evolution pressures, need not to optimize their growth. In fact, the method of minimization metabolic adjustment [7] has shown that knockout metabolic fluxes undergo a minimal redistribution with respect to the flux configuration of the wild type. Yet, in more general situations, the results are unpredictable, therefore, a tool to characterize the shape and volume of the whole space of possible phenotypic solutions must be welcome.

So far, apart from exact algorithms evaluating the volume of the space of possible solutions, that are unsuitable for analyzing metabolic networks larger than some dozens metabolites [8, 9], the best technique allowing for such a characterization is based on Monte Carlo sampling (MCS) of the steady-state flux space [1014]. This method is known to perform very well on intermediate size metabolic networks (up to a hundred of metabolites) [10, 11] where different strategies of MCS have been implemented giving comparable results. Some improved variant of MCS seems to perform well also on organism-wide where the number of variables is in the range order of a thousand [13, 14]. Although MCS has in general some intrinsically associated problems, mainly due to the fact that the convergence (or mixing) time is hard to assess and often is exponential, in the case of polytope volume estimations it turns out that sampling strategies such as hit and run have mixing times that scale only polynomially with system size [15]. However in many concrete cases a practical problem is to give a precise condition for convergence therefore we believe that an alternative independent technique could be more than welcome even in the cases where MCS is applicable.

As a concrete step toward an efficient characterization of the set of fluxes compatible with the stoichiometric constraints, we propose a novel message-passing technique derived from the field of statistical physics and information theory.

Mathematical Model

As already mentioned, a metabolic network is an engine that converts metabolites into other metabolites through a series of intra-cellular intermediate steps. The fundamental equation characterizing all functional states of a reconstructed biochemical reaction network is a mass conservation law that imposes simple linear constraints between the incoming and outcoming fluxes at any chemical reaction:

ρ t = i + S ν o MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaqcfa4aaSaaaeaacqGHciITiiWacqWFbpGCaeaacqGHciITcqWG0baDaaGccqGH9aqpieqacqGFPbqAcqGHRaWkcuWGtbWugaqcaiabgwSixlab=17aUjabgkHiTiab+9gaVbaa@3DA3@
(1)

where ρ is the vector of the M metabolite concentrations in the network. i (o) is the input (output) vector of fluxes, and ν are the reaction fluxes governed by the M × N stoichiometric linear operator S MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafm4uamLbaKaaaaa@2D14@ (usually named stoichiometric matrix) encoding the coefficient of the M intra-cellular relations among the N fluxes.

As long as just steady-state cellular properties are concerned one can assume that a variation in the concentration of metabolites in a cell can be ignored and considered as constant. Therefore in case of fixed external conditions one can assume metabolites (quasi) stationarity and consequently the lhs of 1 can be set to zero. Under these hypotheses the problem of finding the metabolic fluxes compatible with flux-balance is mathematically described by the linear system of equations

S ν = o i b MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafm4uamLbaKaacqGHflY1iiWacqWF9oGBcqGH9aqpieqacqGFVbWBcqGHsislcqGFPbqAcqGHHjIUcqGFIbGyaaa@3932@
(2)

where b is the net metabolite uptake by the cell. Without loss of generality we can assume that the stoichiometric matrix S MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafm4uamLbaKaaaaa@2D14@ has full rows rank, i.e. that rank( S MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafm4uamLbaKaaaaa@2D14@ ) = M, since linearly dependent equations can be easily identified and removed. Given that the number of metabolites M is lower than the number of fluxes N the subspace of solutions is a (N-M)-dimensional manifold embedded in the N-dimensional space of fluxes. In addition, the positivity of fluxes, together with the experimentally accessible values for the maximal fluxes, limit further the space of feasible solutions. This fact may be expressed by the following inequalities:0ννmax

in such a way that together, 2 and 3, define the convex set of all the allowed time-independent phenotipic states of a given metabolic network.

Sub-dimensional volumes

Mathematically speaking, the space of feasible solutions consistent with the Equations 2 is an affine space V Nof dimension N - M. The set of inequalities 3 then defines a convex polytope Π V that, from the metabolic point of view, may be considered as the allowed configuration space for the cell states. The main goal of this work is computation of the volume of this space of solutions and of certain subspaces of it. Although conceptually simple, the notion of sub-dimensional volume like that of Π requires some new definitions.

Consider any linear parameterization ϕ : N-MV N(see explicative scheme in Figure 1). A popular choice for ϕ is, for instance, the inverse of the so called lexicographical projection i.e., the projection over the first N-M coordinates such that its restriction to V has an inverse. Being ϕ linear, the (N - M) × N Jacobian matrix λ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafq4UdWMbaKaaaaa@2D99@ is constant and coincides with the matrix of ϕ in the canonical bases. Denoting λ = det ( λ λ ) 1 2 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeiikaGIafq4UdWMbaKaadaahaaWcbeqaaiabbccigcaakiqbeU7aSzaajaGaeiykaKYaaWbaaSqabKqbagaadaWcaaqaaiabigdaXaqaaiabikdaYaaaaaaaaa@3540@ , the Euclidean metric in Ninduces a measure on V (which does not depend on ϕ):

Figure 1
figure 1

Embedding. Sketch of the the parameterization ϕ = N - MV N, in a simple case where N = 3 and M = 1. The mass balance equations defined in Equation 2 define the hyperplane V and the set of inequalities defined in Equation 3 restrict V to polytope Π indicated as the grey triangle in the left part of the figure. The application ϕ map the full-dimensional counter-image of Π (the grey triangle on the right indicate with U) onto Π.

V f ( ν ) d ν λ f ( φ ( u ) ) d u MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaWaa8qeaeaacqWGMbGzcqGGOaakiiWacqWF9oGBcqGGPaqkcqWGKbazcqWF9oGBcqGHHjIUaSqaaiabdAfawbqab0Gaey4kIipakiabeU7aSnaapeaabaGaemOzayMaeiikaGIaeqOXdyMaeiikaGccbmGae4xDauNaeiykaKIaeiykaKIaemizaqMae4xDauhaleqabeqdcqGHRiI8aaaa@475A@
(4)

allowing to compute the volume of our polytope

v o l V ( Π ) V 1 Π ( ν ) d ν = λ 1 φ 1 ( Π ) ( u ) d u MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaacbeGae8NDayNae83Ba8Mae8hBaW2aaSbaaSqaaiabdAfawbqabaGccqGGOaakcqqHGoaucqGGPaqkcqGHHjIUdaWdraqaaiab=fdaXmaaBaaaleaacqqHGoauaeqaaOGaeiikaGcccmGae4xVd4MaeiykaKIaemizaqMae4xVd4galeaacqWGwbGvaeqaniabgUIiYdGccqGH9aqpcqaH7oaBdaWdbaqaaiab=fdaXmaaBaaaleaacqaHgpGzdaahaaadbeqaaiabgkHiTiabigdaXaaaliabcIcaOiabfc6aqjabcMcaPaqabaGccqGGOaakieWacqqF1bqDcqGGPaqkcqWGKbazcqqF1bqDaSqabeqaniabgUIiYdaaaa@55D4@
(5)

where 1Π (·) is the indicator function of the set Π. It is worth pointing out that given the linear structure of the metabolic equations, the determinant of the mapping is a (scalar) constant. Although the coefficient λ could be explicitly calculated, it turns out that as far as only relative volume quantities are concerned, as in the case of the in silico flux knock-outs introduced below, this term factors out and therefore we will drop it from the rest of the computation.

Probabilistic framework

The problem of describing the polytope Π can be cast into a probabilistic framework. We define the probability density P MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeeiuaafaaa@2CFC@ as:

P ( ν ) = v o l V ( Π ) 1 1 Π ( ν ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeeiuaaLaeiikaGcccmGae8xVd4MaeiykaKIaeyypa0dcbeGae4NDayNae43Ba8Mae4hBaW2aaSbaaSqaaiabdAfawbqabaGccqGGOaakcqqHGoaucqGGPaqkdaahaaWcbeqaaiabgkHiTiabigdaXaaakiab+fdaXmaaBaaaleaacqqHGoauaeqaaOGaeiikaGIae8xVd4MaeiykaKcaaa@42A8@
(6)

Marginal flux probabilities over a given set of fluxes are obtained by integrating out all remaining degrees of freedom. In particular we can define single flux marginal probability densities as integrals on the affine subspace W = V ∩ {ν i = ν ¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafqyVd4Mbaebaaaa@2DA5@ }

P i ( ν i ) = W P ( ν ) j i d ν j = v o l W ( Π W ) v o l V ( Π ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiuaa1aaSbaaSqaaiabdMgaPbqabaGccqGGOaakcqaH9oGBdaWgaaWcbaGaemyAaKgabeaakiabcMcaPiabg2da9maapebabaGaeeiuaaLaeiikaGcccmGae8xVd4MaeiykaKYaaebuaeaacqWGKbazcqaH9oGBdaWgaaWcbaGaemOAaOgabeaakiabg2da9KqbaoaalaaabaacbeGae4NDayNae43Ba8Mae4hBaW2aaSbaaeaacqWGxbWvaeqaaiabcIcaOiabfc6aqjabgMIihlabdEfaxjabcMcaPaqaaiab+zha2jab+9gaVjab+XgaSnaaBaaabaGaemOvayfabeaacqGGOaakcqqHGoaucqGGPaqkaaaaleaacqWGQbGAcqGHGjsUcqWGPbqAaeqaniabg+GivdaaleaacqWGxbWvaeqaniabgUIiYdaaaa@5DD2@
(7)

where the normalization term vol W (Π ∩ W) is the (sub dimensional) volume of the intersection between the polytope Π and the hyperplane {ν i = ν ¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafqyVd4Mbaebaaaa@2DA5@ } as displayed schematically in Figure 2 where the marginal probability at point ν i = ν ¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafqyVd4Mbaebaaaa@2DA5@ is proportional to length of the blue segment which is the intersection between the polytope Π and the plane ν i = ν ¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafqyVd4Mbaebaaaa@2DA5@

Figure 2
figure 2

Marginal probability. Geometrical interpretation of the marginal probability distributions: the marginal probability at point ν i = ν ¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafqyVd4Mbaebaaaa@2DA5@ is proportional to length of the blue segment which is the intersection between the polytope Π (the grey triangle) and the plane ν i = ν ¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafqyVd4Mbaebaaaa@2DA5@ .

Approximate volume computation

¿From a computational point of view, the problem of the exact computation of the volume of a polytope with current methods requires the enumeration of all its vertexes. The vertex enumeration problem is #P-hard [16, 17], but even the problem of computing the volume, given the set of all vertexes is a big computational challenge. Various algorithms exist for calculating the exact volume of a polytope from its vertexes (for a review see [18]), and many software packages are available in the Internet. Computational limitations restrict however exact algorithmic strategies to cope with polytopes in relatively few dimensions (e.g. N - M around 10 or so). To overcome such severe limitations we will introduce a very efficient approximate computational strategy that will allow us to compute the volume and the shape of the space of solutions for real-world metabolic networks.

We will adopt the following three steps strategy:

  • We discretize the problem a la Riemann considering an a N-dimensional square lattice whose elementary cell is of size εN. The approximated volume is then proportional to the number of cells intersecting the polytope Π. Of course the smaller ε the better is the approximation.

  • We consider an integer constraint satisfaction problem where each of the mass-balance equations set a hard constraint over the involved discretized fluxes.

  • We solve the constraint satisfaction problem using a message-passing algorithm called Belief Propagation.

Discretization

Consider the regular orthogonal grid Λ ε of side ε partitioning Nas in the simple sketch of Figure 3. This grid maps via ϕ-1 into a partition Γ ε of ϕ-1(Π). The number of cells N ε MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae8xdX70aaSbaaSqaaiabew7aLbqabaaaaa@3936@ ε of Λ ε intersecting Π is proportional to the numbers of cells of Γ ε intersecting ϕ-1(Π). Finally, the volume in Eq. 5 is proportional to limε → 0εN-M N ε MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae8xdX70aaSbaaSqaaiabew7aLbqabaaaaa@3936@ ε . For any given ε one we have then defined discrete variables ν i {0,1,..., q i max MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemyCae3aa0baaSqaaiabdMgaPbqaaiGbc2gaTjabcggaHjabcIha4baaaaa@32EE@ }, for q i max MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemyCae3aa0baaSqaaiabdMgaPbqaaiGbc2gaTjabcggaHjabcIha4baaaaa@32EE@ equal to the integer part of q max × ν i max MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemyCae3aaWbaaSqabeaacyGGTbqBcqGGHbqycqGG4baEaaGccqGHxdaTcqaH9oGBdaqhaaWcbaGaemyAaKgabaGagiyBa0MaeiyyaeMaeiiEaGhaaaaa@3B1A@ , where the integer qmax is the granularity of the approximation.

Figure 3
figure 3

Tiling. Discretization of the volume: in the left part of the figure we display the the regular orthogonal grid Λ ε of side ε partitioning N. The counter-image of Λ ε via ϕ is given by the the grid Γ ε . Let number of Γ ε squares intersecting ϕ-1(Π) is proportional to the number of Λ ε cubes intersecting Π. The smaller the ε the better the intersection of the grid Λ ε with the polytope Π will approximate Π.

The constraint satisfaction problem

When dealing with integer coefficients s i, a , as the ones appearing in normal stoichiometric relations, the discrete version of Eqs. 2 close in the set of positive integers defining a constraint satisfaction problem. The ε-approximation of the volume is then the number of elementary cells that are solution to the discretized mass-balance equations. It is interesting to note that in the case of fractional stoichiometric relations one can multiply all terms for the minimum common multiple of all denominators, getting an equivalent mass-balance equation with integer coefficients.

Belief Propagation

The integer constraint satisfaction is solved using Belief Propagation (BP), a local iterative algorithm that allows for the computations of marginal probability distributions. BP is exact on trees, and perform reasonably well on locally tree-like structures [1922]. This approximation scheme allows for the computation of the logarithm of the number of solutions via the entropy that can be expressed in terms of flux marginals. We will give a detailed derivation of the equations in section Methods.

Results and Discussion

Performance on low dimensional systems

In this section we will analyze the performance of our algorithm against an exact algorithm on low dimensional polytopes. Among the different packages available in the Internet, we have chosen LRS [8], a program based on the reverse search algorithm presented by Avis and Fukuda in [9] that can compute the volume of non-full dimensional polytopes. Actually, it computes the volume of the lexicographically smallest representation of the polytope, that for the benchmark used below, coincides with the conventional volume estimated by our algorithm.

We have devised a specific benchmark generating random diluted stoichiometric matrices at a given ratio α = M/N and fixed number of terms different from zero K in each of the reactions. All fluxes were constrained inside the hypercube 0 ≤ ν i ≤ 1. As a general strategy we have calculated several random instances of the problem and measured the volume (entropy) of the polytope using the LRS and BP algorithm. In particular, we have first generated 1000 realizations of random stoichiometric matrices with N = 12, M = 4. Note that N = 12 is around the maximum that allows simulations with LRS in reasonable time (around one hour per instance). For each polytope then we have computed the two entropies S LRS and S BP with both algorithms, fixing the same maximum value for the discretization qmax = 1024 for all fluxes.

In Figure 4 we show how the quality of the BP measure is affected by the discretization, by displaying the histogram of the relative differences δ S = S B P S L R S S L R S MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqiTdqMaem4uamLaeyypa0tcfa4aaSaaaeaacqWGtbWudaWgaaqaaiabdkeacjabdcfaqbqabaGaeyOeI0Iaem4uam1aaSbaaeaacqWGmbatcqWGsbGucqWGtbWuaeqaaaqaaiabdofatnaaBaaabaGaemitaWKaemOuaiLaem4uamfabeaaaaaaaa@3E5A@ with an increasing number of bins per variable qmax = 16 , 64, 256, 1024. One can see how a finer binning of messages improves the quality of the approximation, seemingly converging to a single distribution of errors. It is expected that for larger N the histograms would peak around the true value: upon increasing the number of fluxes, loops become larger and the overall topology of the graph becomes more locally tree-like, validating the hypothesis behind the Bethe approximation. Unfortunately, the huge increase of computer time experimented in the calculation of the volumes using LRS made impossible to test systems large enough to make any reasonable scaling analysis.

Figure 4
figure 4

Discrepance histogram. Histograms of δS = (S BP - S LRS )/S LRS , where S LRS and S BP are the estimates of the logarithm of the volume computed using respectively the program LRS [8], and our message-passing algorithm. The measurement if δS are taken over a set of 1000 random realizations of a N = 12 and M = 4 stoichiometric matrix where each of the metabolites participates to K = 3 different reactions. The four histograms represent the distribution of the experimental frequencies of the measured δS. Measurements were taken at different values of of the discretization parameter qmax = 16, 64, 256, 1024. For larger values of q the distributions of the measured δ S peak around zero.

Finally we address the issue of the computational complexity of the algorithm which is a crucial one if one is interested in approaching real world metabolic networks whose size typically is at least 50 times the size of the largest network that can be analyzed with exact algorithms. In Figure 5 we display the running time of both LRS and BP as a function of the number of fluxes N. Interesting, LRS outperforms BP up to sizes N ~12 where the running time of LRS explodes exponentially while BP maintains a modest almost-linear behavior.

Figure 5
figure 5

Running time. Logarithm of the running time τ expressed in seconds vs. N for LRS algorithm, and BP algorithm. Averages are taken over 5000 realizations of random stoichiometric matrices at each value of N except in the N = 12 case where we analyzed 500 realizations. LRS outperforms BP up to sizes N ~12 where the running time of LRS explodes exponentially while BP maintains a modest almost-linear behavior.

Distribution of fluxes in Red Blood Cell

We used our BP algorithm and MCS [11] to obtain the marginal flux distributions for each of the reactions in the Red Blood Cell (RBC) metabolism taking the same stoichiometric matrix presented in [10, 11] (see Additional file 1). The network contains 46 reactions and 34 metabolites. The first comparison of MCS with our BP algorithm is done setting all ν i max MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqyVd42aa0baaSqaaiabdMgaPbqaaiGbc2gaTjabcggaHjabcIha4baaaaa@333B@ to 1 (the resulting marginal probability distribution for the fluxes are displayed in Figure 6). The second comparison is done by setting all ν i max MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqyVd42aa0baaSqaaiabdMgaPbqaaiGbc2gaTjabcggaHjabcIha4baaaaa@333B@ to the values used in [10] (results are displayed in Figure 7). In both settings we compared BP with a set of 5000 feasible solutions generated by MCS, while for the BP algorithm we used a qmax = 2048. As it can be seen in the two cases, the predictions of both methods compare rather well. Assuming the accuracy of MCS, differences are probably due to small loops structures in the graph. We leave for a future work the issue of a more detailed comparison of both methods. However we really think that the emerging scenario of RBC metabolism captured by BP is analogous to that obtained by MCS both quantitatively and qualitatively.

Figure 6
figure 6

Distribution of fluxes in Blood Cells. Marginal probability distributions of the flux values for each of the 46 reaction in the red blood cell network computed using our message-passing algorithm (filled area) and the MC method (lines). Panels are arranged following the same sequence of figure 5 in [10]. The maximum values of the fluxes are all identical, ν i max MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqyVd42aa0baaSqaaiabdMgaPbqaaiGbc2gaTjabcggaHjabcIha4baaaaa@333B@ = 1.

Figure 7
figure 7

Distribution of fluxes in Blood Cells. Marginal probability distributions of the flux values for each of the 46 reaction in the red blood cell network computed using our message-passing algorithm (filled area) and the MC method (lines). Panels are arranged following the same sequence of figure 5 in [10]. The maximum values of the fluxes were taken from [10].

The MCS method appears to be quite efficient (the authors of [11] reported in a similar network less then 30 seconds of computer computation in a Dell Dimension 8200 to obtain their distributions) while our algorithm converged to the same result in less than 3 seconds on a similar machine (Intel CPU 6600 2.40 GHz). Of course, no stringent statement can be done at this level about the comparative performance of the two algorithms. This positive result encourages us to face the problem of metabolism at organism-wide scale.

Analysis of gene knock-out in E. coli central metabolism

In this section we study the influence of partial flux knock-out on the volume of the solution space. We concentrate on E. coli central metabolism [23]. The network has 62 metabolites, 104 reactions (75 internal reactions and 29 exchange fluxes). All reactions were considered irreversible following their nominal directions, and maximum flux rates were set to 1.

In this numerical experiment we first run BP on the unperturbed system measuring the volume of the space of solution S0. Then the maximum flux values are kept constant and equal to 1 for all the reactions but one (say reaction ν i ). The partial knock-out effect on flux i is then obtained reducing repeatedly ν i max MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqyVd42aa0baaSqaaiabdMgaPbqaaiGbc2gaTjabcggaHjabcIha4baaaaa@333B@ and computing again the volume S KO ( ν i max MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqyVd42aa0baaSqaaiabdMgaPbqaaiGbc2gaTjabcggaHjabcIha4baaaaa@333B@ ) at each time until ν i max MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqyVd42aa0baaSqaaiabdMgaPbqaaiGbc2gaTjabcggaHjabcIha4baaaaa@333B@ = 0. In principle at each reduction step of ν i max MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqyVd42aa0baaSqaaiabdMgaPbqaaiGbc2gaTjabcggaHjabcIha4baaaaa@333B@ one should converge again the BP equations. In practice for most of the fluxes convergence is very fast since at each reduction of ν i max MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqyVd42aa0baaSqaaiabdMgaPbqaaiGbc2gaTjabcggaHjabcIha4baaaaa@333B@ the new stationary point will be in general close enough to the old one. However we have experimentally noted that the larger is the impact of a given flux on the volume of the space of solutions, the longer is the convergence time at each reduction step. Let us point out that doing so we are tacitly assuming that each knock-out is independent from the others although it is known that some reactions might be associated with the same enzyme. However there is no computational restriction to analyze multiple knock-outs. An analogous technique was presented in [11], where maximal fluxes were reduced to the value of 75% of their original maximal allowed flux to mimic enzymopathies.

In Figure 8 we display the whole set of S0 - SKO( ν i max MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqyVd42aa0baaSqaaiabdMgaPbqaaiGbc2gaTjabcggaHjabcIha4baaaaa@333B@ ) vs. knock-out percentage curves. We can observe how heterogeneous is the impact of the different fluxes on the volume. Moreover one can observe how different curves may cross depending on the knock-out percentage displaying thus an intriguing scenario of differential flux-reduction impacts. Let us now concentrate on the 20 fluxes having larger impact on the space of solutions: in Figure 9 we display complete knock-outs values S0 -S KO ( ν i max MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqyVd42aa0baaSqaaiabdMgaPbqaaiGbc2gaTjabcggaHjabcIha4baaaaa@333B@ = 0). Focusing only on internal fluxes (the first two fluxes are indeed exchange fluxes of water and protons), one can observe that the first half of them basically compose the backbone of glycolysis showing little pathway redundancy in the network, while reactions like FUM, ACONT, SUC and SUCCD1i appear in the Krebs cycle and again show little pathway redundancy in the network (see Additional file 2).

Figure 8
figure 8

Flux knock-out curves. Flux knock-out impact on the volume of the space of solutions in E. coli central metabolism. On the x-axis we display the percentage of reduction of any given flux, and on the y-axis the relative volume difference with respect to the unperturbed system. We use upper-case keys for internal fluxes, and lower-case to exchange fluxes. We indicate keys only for the 20 fluxes having larger impact on the volume (dots) and we display the rest fluxes with thin scattered lines.

Figure 9
figure 9

Top impact flux knock-outs. Impact of the 20 fluxes which have a larger impact on the volume of the space of solution. Here S KO = S0 - S KO ( ν i max MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqyVd42aa0baaSqaaiabdMgaPbqaaiGbc2gaTjabcggaHjabcIha4baaaaa@333B@ = 0)measures the difference of the volume of the unperturbed system and that modified by setting ν i max MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqyVd42aa0baaSqaaiabdMgaPbqaaiGbc2gaTjabcggaHjabcIha4baaaaa@333B@ = 0. In the x-axis we indicate the relative flux name using upper-case keys for internal fluxes, and lower-case to exchange fluxes.

Finally in Figure 10 we display the correlations between the changes in the entropy for different reaction knock-outs and the average flux ν i = ∫P i (ν)ν dν in the unperturbed network. At 75% knock-out, two kinds of regimes are divided by a clear threshold at ν ~ 0.6: (i) for ν i < 0.6, S0 - S KO ( ν i max MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqyVd42aa0baaSqaaiabdMgaPbqaaiGbc2gaTjabcggaHjabcIha4baaaaa@333B@ ) has a small positive correlation with ν, (ii) at larger average fluxes, correlations increase rapidly but with larger fluctuations. The presence of this threshold can be understood noting that reactions belonging to the linear (glycolysis) an circular (Krebs cycle) pathways are, in the wild cell, fast flux reactions, with average flux values larger than 0.5. An analogous scenario emerges in the case of a 100% knock-outs, but now fluctuation are wider, and also fluxes with intermediate average value start becoming important. This is the case for instance of the 6 large impact fluxes having average flux around 0.3. A closer inspection reveals that among them there is an exchange fluxes (glycogen) and 5 internal fluxes (G1PP, GLCP, HEX1, CS, PDH). The first three (G1PP, GLCP, HEX1) are the first steps of glycolysis, while PDH is the input of the Krebs cycle, and CS is a segment of the cycle strictly related to PDH. It is interesting that this peculiar behavior (large impact and relatively small average flux) are related to key check-points of central metabolism.

Figure 10
figure 10

Correlation of impact and value. Change in entropy S0 - S KO after reaction knockouts and the average as a function of the average flux value ν of the unperturbed system. Red dots are relative to 75% knock-out and blue one to complete knock-out. The six red dots with average flux ~0.3 and entropy change larger than 0.15 are G1PP, GLCP, HEX1, CS, PDH, glycogen which are key check-points of central metabolism.

E. coli metabolism at organism-wide scale

Finally we analyze the average fluxes distribution function in the metabolic network of E. coli. The network used contains, in its original format, 1035 reactions and 626 metabolites [23]. The network has been preprocessed using Gaussian elimination procedure: when a trivial mass-balance equation of the type ν i = ν j is present we eliminate variable ν i using elementary row operations. Doing so new trivial mass balance equations appear, an then we iterate the eliminations until all trivial mass-balance equations disappear. During Gaussian elimination, some mass balance equations become effective dead-end metabolite (i.e metabolites that are only consumed or produced). Fluxes participating to dead-end metabolites are set to zero at the beginning and removed from the system. The new set of equations is equivalent to the original one but with a lower number of fluxes. Reversible reactions have been considered using bidirectional signed fluxes. Using BP we are able to compute the marginal flux distributions in around 40 minutes, using qmax = 64.

We ran our algorithm on this network and computed the average fluxes in each reaction. We have performed a statistical analysis of these averages in the spirit of [14, 24, 25]. The probability distribution function (pdf) of these average values is displayed in Figure 11. As can be clearly seen the distribution is large and may be fitted with a power law distribution of P(ν) (ν + ν0)-γ. The fit gives an exponent γ around 1.5, a result that compares very well with previous simulations found in: (i) [14] where FBA optimal fluxes were averaged over many different external condition, (ii) [24] where a maximal local-output strategy is used, and (iii) [25] where a Gaussian approximation for the marginal flux distributions is used. It is claimed in [14] that the robustness of this value γ is a signature of universality in cell's metabolism.

Figure 11
figure 11

Distribution of average fluxes. Distribution of average fluxes (ν) of the reactions for the E. coli metabolism (points) obtained from the marginal probability distributions computed using our message-passing algorithm. We also display a power-law fit function Pfit(ν) = A/(ν + ν 0)γ (solid line) with ν0 = 0.0002 and γ = 1.48.

A more careful analysis of the data may reveals however that the distribution of averages fluxes has a richer structure. In Figure 12 we present the cumulative distribution function P < ( ν ) ν P ( y ) d y MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiuaa1aaSbaaSqaaiabgYda8aqabaGccqGGOaakcqaH9oGBcqGGPaqkcqGHHjIUdaWdXaqaaiabdcfaqjabcIcaOiabdMha5jabcMcaPiabdsgaKjabdMha5bWcbaGaeyOeI0IaeyOhIukabaGaeqyVd4ganiabgUIiYdaaaa@40CC@ of theaverage fluxes of the reactions and a clear jump appears for ν ≈ 0.5, and smaller ones for ν ≈ 0.4 and μ ≈ 0.6. At present we do not know whether these jumps are just due to statistical fluctuations (and are correctly smeared out in the usual binning process done to plot the pdf in double logarithmic scale) or they reflect relevant biological or structural information about the network.

Figure 12
figure 12

Integrated distribution of average fluxes.Integrated distribution P<of average fluxes ν of reactions in E. coli metabolism obtained from the marginal probability distributions computed using our message-passing algorithm. Note the jumps for ν = 0.4, ν = 0.5 and ν = 0.6.

Conclusion

We proposed a novel algorithm to estimate the size and shape of the affine space of a non full-dimensional convex polytope in high dimensions. The algorithm was tested in specific benchmark, i.e. random diluted stoichiometric matrices at a given ratio α = M/N and fixed number of terms different from zero K, in each of the reactions, with results that compare very well with those of exact algorithms. Moreover, we show that while the running time of exact algorithms increases more than exponentially for already moderate sizes, our algorithm is polynomial. The program was run on the Red Blood Cell metabolism producing with shorter computational time results that are in both quantitatively and qualitatively in good agreement with those obtained by MCS presented in [10, 11].

Then, our program was used to study the E. coli central metabolism, and as expected, reactions with little redundancy turned out to be the ones with larger impact in the size of the space of the metabolic solutions. Specifically, most of the reactions associated with the transformation of glucose in pyruvate, belong to this set, as well as some reactions in the citric cycle. In addition we show strong correlations between the characteristics of the flux distributions of the wild type network and the changes in size of the space of solutions after reaction knock-outs. Finally, we calculate the distribution of the average values of the fluxes in the metabolism of the E. coli and present results that are consistent with those of the literature. In the present approach we have mainly followed a discretization strategy that although polynomial, becomes computationally rather heavy for mass balance equation containing a large number of fluxes. We are currently investigating other representation schemes for the messages. The final hope is to obtain an algorithm that allows for on-line analysis of organism-wide metabolic networks.

Let us conclude by noting that in principle the presented approach can be extended to deal with constraints whose functional form is more general than linear, provided that the number of variables involved in each of the constraints remains small, as in the case of inequalities enforcing the second law of thermodynamics for the considered reactions [26]. Work is in progress also in this direction.

Methods

Belief Propagation

In the previous section we have seen how the metabolic problem can be cast into a constraint satisfaction framework where each of M mass-balance equations imposes a constraint onto a subset of the metabolic fluxes. Let A be the set of equations and I the set of fluxes. Consider the a-th row of S MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafm4uamLbaKaaaaa@2D14@ , and let {i a} ≡ {i1, ..., i n a MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemyAaK2aaSbaaSqaaiabd6gaUnaaBaaameaacqWGHbqyaeqaaaWcbeaaaaa@3044@ } I be the labels of the fluxes involved in the considered equation having stoichiometric coefficients different from zero. Let also {a i} ≡ {a1, ..., a n i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemyyae2aaSbaaSqaaiabd6gaUnaaBaaameaacqWGPbqAaeqaaaWcbeaaaaa@3044@ } A be the labels of the equations in which flux i participates. The emerging structure is a bipartite graph, with two types of nodes: variable nodes representing the fluxes of the reactions and factor nodes imposing mass conservation. In this case marginals become q-modal probability densities that for large values of q i max MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemyCae3aa0baaSqaaiabdMgaPbqaaiGbc2gaTjabcggaHjabcIha4baaaaa@32EE@ will approximate better and better the continuous set of probabilities.

Under the hypothesis that the factor graph is a tree it can be shown [19, 27] that a given flux vector ν satisfying all flux-balance constraints can be expressed as a product of flux and reaction marginals [1921]:

P ( ν ) = a A P a ( { ν l } l a ) i I P i ( ν i ) 1 d i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeeiuaaLaeiikaGcccmGae8xVd4MaeiykaKIaeyypa0ZaaebuaeaacqWGqbaudaWgaaWcbaGaemyyaegabeaakiabcIcaOiabcUha7jabe27aUnaaBaaaleaacqWGSbaBaeqaaOGaeiyFa03aaSbaaSqaaiabdYgaSjabgIGiolabdggaHbqabaGccqGGPaqkaSqaaiabdggaHjabgIGiolabdgeabbqab0Gaey4dIunakmaarafabaGaemiuaa1aaSbaaSqaaiabdMgaPbqabaGccqGGOaakcqaH9oGBdaWgaaWcbaGaemyAaKgabeaakiabcMcaPmaaCaaaleqabaGaeGymaeJaeyOeI0Iaemizaq2aaSbaaWqaaiabdMgaPbqabaaaaaWcbaGaemyAaKMaeyicI4SaemysaKeabeqdcqGHpis1aaaa@596D@
(8)

where d i is the number of equations in which flux ν i participates (i.e. the degree of site i). The marginal probabilities are defined as:

P i ( ν i ) = { ν j } j i P ( ν ) p a ( { ν l } l a ) = { ν j } j a P ( ν ) . MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaqbaeWabiqaaaqaaiabdcfaqnaaBaaaleaacqWGPbqAaeqaaOGaeiikaGIaeqyVd42aaSbaaSqaaiabdMgaPbqabaGccqGGPaqkcqGH9aqpdaaeqbqaaiabbcfaqjabcIcaOGGadiab=17aUjabcMcaPaWcbaGaei4EaSNaeqyVd42aaSbaaWqaaiabdQgaQbqabaWccqGG9bqFdaWgaaadbaGaemOAaOMaeyiyIKRaemyAaKgabeaaaSqab0GaeyyeIuoaaOqaaiabdchaWnaaBaaaleaacqWGHbqyaeqaaOGaeiikaGIaei4EaSNaeqyVd42aaSbaaSqaaiabdYgaSbqabaGccqGG9bqFdaWgaaWcbaGaemiBaWMaeyicI4SaemyyaegabeaakiabcMcaPiabg2da9maaqafabaGaeeiuaaLaeiikaGIae8xVd4MaeiykaKcaleaacqGG7bWEcqaH9oGBdaWgaaadbaGaemOAaOgabeaaliabc2ha9naaBaaameaacqWGQbGAcqGHGjsUcqWGHbqyaeqaaaWcbeqdcqGHris5aOGaeiOla4caaaaa@699F@
(9)

One can then define an entropy in terms of the marginal probability distributions which amounts to the logarithm of the number of solutions of the associated constraint satisfaction problem. From a more geometrical point of view, this entropy is a count of the (logarithm of) the number of elementary ε-cells intersecting the polytope Π (see Figure 3).

S ν P ( ν ) ln P ( ν ) = a A { ν j } j a P a ( { ν j } j a ) log P a ( { ν j } j a ) i I ν i ( d i 1 ) P i ( ν i ) log P i ( ν i ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaqbaeWabiqaaaqaaiabdofatjabggMi6kabgkHiTmaaqafabaGaeeiuaaLaeiikaGcccmGae8xVd4MaeiykaKIagiiBaWMaeiOBa4MaemiuaaLaeiikaGIae8xVd4MaeiykaKcaleaacqWF9oGBaeqaniabggHiLdaakeaacqGH9aqpdaaeqbqaamaaqafabaGaemiuaa1aaSbaaSqaaiabdggaHbqabaGccqGGOaakcqGG7bWEcqaH9oGBdaWgaaWcbaGaemOAaOgabeaakiabc2ha9naaBaaaleaacqWGQbGAcqGHiiIZcqWGHbqyaeqaaOGaeiykaKIagiiBaWMaei4Ba8Maei4zaCMaemiuaa1aaSbaaSqaaiabdggaHbqabaGccqGGOaakcqGG7bWEcqaH9oGBdaWgaaWcbaGaemOAaOgabeaakiabc2ha9naaBaaaleaacqWGQbGAcqGHiiIZcqWGHbqyaeqaaOGaeiykaKIaeyOeI0YaaabuaeaadaaeqbqaaiabcIcaOiabdsgaKnaaBaaaleaacqWGPbqAaeqaaOGaeyOeI0IaeGymaeJaeiykaKIaemiuaa1aaSbaaSqaaiabdMgaPbqabaGccqGGOaakcqaH9oGBdaWgaaWcbaGaemyAaKgabeaakiabcMcaPiGbcYgaSjabc+gaVjabcEgaNjabdcfaqnaaBaaaleaacqWGPbqAaeqaaOGaeiikaGIaeqyVd42aaSbaaSqaaiabdMgaPbqabaGccqGGPaqkaSqaaiabe27aUnaaBaaameaacqWGPbqAaeqaaaWcbeqdcqGHris5aaWcbaGaemyAaKMaeyicI4SaemysaKeabeqdcqGHris5aaWcbaGaei4EaSNaeqyVd42aaSbaaWqaaiabdQgaQbqabaWccqGG9bqFdaWgaaadbaGaemOAaOMaeyicI4SaemyyaegabeaaaSqab0GaeyyeIuoaaSqaaiabdggaHjabgIGiolabdgeabbqab0GaeyyeIuoaaaaaaa@9C5C@
(10)

One may wonder how such an approach could be useful in a real-world situation where the graph is not a tree. One can hope that typical loop length are large enough to ensure weak statistical dependence of neighboring sites which lay at the heart of the Bethe approximation [28, 29]. It is interesting to note that the Bethe approximation is successfully used in many different problems with loopy graph topologies. This is for instance the case for LDPC error correcting codes in Information Theory [30] used in wireless Internet transmission technologies such as WiMAX, or in many inference problems such as graphical models [31], binary perceptron learning [32], and in constraint satisfaction problems such as Satisfiability or Coloring [33, 34]. In all these cases although there is no mathematically rigorous proof about the quality of the solution, whenever the algorithm converges, the result generally provides a good estimate of marginal probability distributions.

The algorithm is based on two type of messages exchanged from variable nodes to functional nodes, and vice versa:

  • μia(ν): the probability that flux i takes value ν in the absence of reaction a.

  • mai(ν): the non-normalized probability that the balance in reaction a is fulfilled given that flux i takes value ν.

The two quantities satisfy the following set of functional equations:

m a i ( ν i ) = { ν l } l a \ i δ ( l a s a , l ν l ; b a ) l a \ i u l a ( ν l ) μ i a ( ν i ) = C i a b i \ a m b i ( ν i ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaqbaeWabiWaaaqaaiabd2gaTnaaBaaaleaacqWGHbqycqGHsgIRcqWGPbqAaeqaaOGaeiikaGIaeqyVd42aaSbaaSqaaiabdMgaPbqabaGccqGGPaqkaeaacqGH9aqpaeaadaaeqbqaaiabes7aKnaabmaabaWaaabuaeaacqWGZbWCdaWgaaWcbaGaemyyaeMaeiilaWIaemiBaWgabeaakiabe27aUnaaBaaaleaacqWGSbaBaeqaaOGaei4oaSJaemOyai2aaSbaaSqaaiabdggaHbqabaaabaGaemiBaWMaeyicI4SaemyyaegabeqdcqGHris5aaGccaGLOaGaayzkaaaaleaacqGG7bWEcqaH9oGBdaWgaaadbaGaemiBaWgabeaaliabc2ha9naaBaaameaacqWGSbaBcqGHiiIZcqWGHbqycqGGCbaxcqWGPbqAaeqaaaWcbeqdcqGHris5aOWaaebuaeaacqWG1bqDdaWgaaWcbaGaemiBaWMaeyOKH4QaemyyaegabeaakiabcIcaOiabe27aUnaaBaaaleaacqWGSbaBaeqaaOGaeiykaKcaleaacqWGSbaBcqGHiiIZcqWGHbqycqGGCbaxcqWGPbqAaeqaniabg+GivdaakeaacqaH8oqBdaWgaaWcbaGaemyAaKMaeyOKH4QaemyyaegabeaakiabcIcaOiabe27aUnaaBaaaleaacqWGPbqAaeqaaOGaeiykaKcabaGaeyypa0dabaGaem4qam0aaSbaaSqaaiabdMgaPjabgkziUkabdggaHbqabaGcdaqeqbqaaiabd2gaTnaaBaaaleaacqWGIbGycqGHsgIRcqWGPbqAaeqaaOGaeiikaGIaeqyVd42aaSbaaSqaaiabdMgaPbqabaGccqGGPaqkaSqaaiabdkgaIjabgIGiolabdMgaPjabcYfaCjabdggaHbqab0Gaey4dIunaaaaaaa@981C@
(11)

where { ν l } l a \ i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaWaaabeaeaaaSqaaiabcUha7jabe27aUnaaBaaameaacqWGSbaBaeqaaSGaeiyFa03aaSbaaWqaaiabdYgaSjabgIGiolabdggaHjabcYfaCjabdMgaPbqabaaaleqaniabggHiLdaaaa@3B0C@ means the sum over all values of fluxes around metabolite a but i, b i\a, is the set of metabolites in reaction i but a, Ciais a constant enforcing the normalization of the probability μia(ν), and δ (·; ·) is the Kronecker delta function (δ (a; b) is 1 if a = b and 0 otherwise). The set of Equations 11 can be solved iteratively and upon convergence of the algorithm one can compute the marginal flux distributions as:

P a ( { ν l } l a ) = C a { ν l } l a δ ( l a s a , l ν l ; b a ) l a u l a ( ν l ) P i ( ν ) = C i l i m l i ( ν ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaqbaeWabiWaaaqaaiabdcfaqnaaBaaaleaacqWGHbqyaeqaaOGaeiikaGIaei4EaSNaeqyVd42aaSbaaSqaaiabdYgaSbqabaGccqGG9bqFdaWgaaWcbaGaemiBaWMaeyicI4SaemyyaegabeaakiabcMcaPaqaaiabg2da9aqaaiabdoeadnaaBaaaleaacqWGHbqyaeqaaOWaaabuaeaacqaH0oazdaqadaqaamaaqafabaGaem4Cam3aaSbaaSqaaiabdggaHjabcYcaSiabdYgaSbqabaGccqaH9oGBdaWgaaWcbaGaemiBaWgabeaakiabcUda7iabdkgaInaaBaaaleaacqWGHbqyaeqaaaqaaiabdYgaSjabgIGiolabdggaHbqab0GaeyyeIuoaaOGaayjkaiaawMcaaaWcbaGaei4EaSNaeqyVd42aaSbaaWqaaiabdYgaSbqabaWccqGG9bqFdaWgaaadbaGaemiBaWMaeyicI4SaemyyaegabeaaaSqab0GaeyyeIuoakmaarafabaGaemyDau3aaSbaaSqaaiabdYgaSjabgkziUkabdggaHbqabaGccqGGOaakcqaH9oGBdaWgaaWcbaGaemiBaWgabeaakiabcMcaPaWcbaGaemiBaWMaeyicI4SaemyyaegabeqdcqGHpis1aaGcbaGaemiuaa1aaSbaaSqaaiabdMgaPbqabaGccqGGOaakcqaH9oGBcqGGPaqkaeaacqGH9aqpaeaacqWGdbWqdaWgaaWcbaGaemyAaKgabeaakmaarafabaGaemyBa02aaSbaaSqaaiabdYgaSjabgkziUkabdMgaPbqabaGccqGGOaakcqaH9oGBcqGGPaqkaSqaaiabdYgaSjabgIGiolabdMgaPbqab0Gaey4dIunaaaaaaa@8CDE@
(12)

A brute force integration of the discrete set of equation would be much too inefficient for analyzing large networks, due to the multiple dimensional sum over { 0 , ... , q l max } l a \ i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaei4EaSNaeGimaaJaeiilaWIaeiOla4IaeiOla4IaeiOla4IaeiilaWIaemyCae3aa0baaSqaaiabdYgaSbqaaiGbc2gaTjabcggaHjabcIha4baakiabc2ha9naaBaaaleaacqWGSbaBcqGHiiIZcqWGHbqycqGGCbaxcqWGPbqAaeqaaaaa@424F@ in the previous equation. The first of Equations 11 includes the computation of the convolution of all μjamessages except one, an expression of the form C a i = j a \ i μ j a MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaem4qam0aaSbaaSqaaiabdggaHjabgkziUkabdMgaPbqabaGccqGH9aqpcqGHxkcXdaWgaaWcbaGaemOAaOMaeyicI4SaemyyaeMaeiixaWLaemyAaKgabeaakiabeY7aTnaaBaaaleaacqWGQbGAcqGHsgIRcqWGHbqyaeqaaaaa@4230@ (where denotes convolution) and requires normally n a - 1 convolutions for each outputting message, i.e. n a (n a - 1) convolutions in total for constraint a. In the case of the complete E. coli network we have mass-balance equations with n a as large as 500, then reducing the computational complexity of the iteration has a cogent practical implication on the performance of the proposed algorithm. There is a way to reduce this quadratic load to just O (n a ) convolutions. The method we propose is not confined to convolutions and works generally for any associative operation. Note that for operations that are efficiently invertible and commutative, one could just operate all n a terms (n a initial operations) and then for each outputting message operate with the inverse of the undesired element (just one more operation for each message, totalizing just 2n a operations for constraint a plus invertions) i.e. C a i = ( j a μ j a ) μ i a 1 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaem4qam0aaSbaaSqaaiabdggaHjabgkziUkabdMgaPbqabaGccqGH9aqpdaqadaqaaiabgEPiepaaBaaaleaacqWGQbGAcqGHiiIZcqWGHbqyaeqaaOGaeqiVd02aaSbaaSqaaiabdQgaQjabgkziUkabdggaHbqabaaakiaawIcacaGLPaaacqGHxkcXcqaH8oqBdaqhaaWcbaGaemyAaKMaeyOKH4QaemyyaegabaGaeyOeI0IaeGymaedaaaaa@4B84@ . When elements are not invertible, nor conmutative or are just difficult to invert, the following more general scheme can be applied. Let us concentrate for instance on metabolite a and let us assume that n a is a power of two, say 2n(this assumption can be easily relaxed). We will iteratively build a n × n a auxiliary matrix h i l MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiAaG2aa0baaSqaaiabdMgaPbqaaiabdYgaSbaaaaa@3017@ setting as initial condition h i 0 = μ i a MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiAaG2aa0baaSqaaiabdMgaPbqaaiabicdaWaaakiabg2da9iabeY7aTnaaBaaaleaacqWGPbqAcqGHsgIRcqWGHbqyaeqaaaaa@3729@ and then building the following sequence: h i d = h 2 i d 1 h 2 i + 1 d 1 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiAaG2aa0baaSqaaiabdMgaPbqaaiabdsgaKbaakiabg2da9iabdIgaOnaaDaaaleaacqaIYaGmcqWGPbqAaeaacqWGKbazcqGHsislcqaIXaqmaaGccqGHxkcXcqWGObaAdaqhaaWcbaGaeGOmaiJaemyAaKMaey4kaSIaeGymaedabaGaemizaqMaeyOeI0IaeGymaedaaaaa@42FE@ for d = 1, ..., n - 1. This is equivalent to operate consecutive pairs of fluxes over metabolite a, then consecutive quadruples and so on. One needs just 2n a operations for computing the full matrix at this stage.

To compute Caione needs additionally n = log2 n a operations: we just have to operate the complement of i in its pair with the complement of this pair in its quartet and so on on all levels, i.e. in a compact form d = 0 n h [ i / 2 d ] xor  1 d MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaey4LIq8aa0baaSqaaiabdsgaKjabg2da9iabicdaWaqaaiabd6gaUbaakiabdIgaOnaaDaaaleaacqGGBbWwcqWGPbqAcqGGVaWlcqaIYaGmdaahaaadbeqaaiabdsgaKbaaliabc2faDjabbIha4jabb+gaVjabbkhaYjabbccaGiabigdaXaqaaiabdsgaKbaaaaa@42D1@ (See Figure 13). The total number of operations for all out messages becomes then 2n a + n a log2 n a . Moreover, if all messages are to be computed sequentially this number can be further reduced to O (n a ).

Figure 13
figure 13

Linear cavity summation. An example showing the summation strategy for a metabolite with 8 fluxes. Once all elements of the table are computed (6 convolutions are needed), one needs only 3 more convolutions to compute each one of the 8 cavity messages. In the figure the table elements that are needed to compute j ≠ 5h j = (h1 h2 h3 h4) (h7 h8) h6 are colored in blue.

References

  1. Jeong H, Tombor B, Albert R, Oltvai ZN, Barabasi AL: The large-scale organization of metabolic networks. Nature 2000, 407(6804):651–654. 10.1038/35036627

    Article  CAS  PubMed  Google Scholar 

  2. Fell DA, Wagner A: The small world of metabolism. Nature Biotechnology 2000, 18: 1121–1122. 10.1038/81025

    Article  CAS  PubMed  Google Scholar 

  3. Z D, Qin ZS: Structural comparison of metabolic networks in selected single cell organisms. BMC Bioinformatics 2005., 6(8):

  4. Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita K, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M: From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res 2006, 34: D354–7. 10.1093/nar/gkj102

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  5. Ibarra AU, Edwards J, Palsson B: Escherichia coli K-12 undergoes adaptive evolution to achieve in silico predicted optimal growth. Nature 2002, 420: 186–189. 10.1038/nature01149

    Article  CAS  PubMed  Google Scholar 

  6. Varma A, Palsson B: Metabolic Capabilities of Escherichia coli: I.Synthesis of biosynthetic precursors and cofactors. J theor Biol 1993, 165: 477–502. 10.1006/jtbi.1993.1202

    Article  CAS  PubMed  Google Scholar 

  7. Segré D, Vitkup D, Church G: Analysis of optimality in natural and perturbed metabolic networks. PNAS 2002, 99: 15112–15117. 10.1073/pnas.232349399

    Article  PubMed Central  PubMed  Google Scholar 

  8. LRS package[http://cgm.cs.mcgill.ca/~avis/C/lrs.html]

  9. Avis D, Fukuda K: A pivoting algorithm for convex hulls and vertex enumeration of arrangements and polyhedra. Discrete Comput Geom 1992, 8(3):295–313. 10.1007/BF02293050

    Article  Google Scholar 

  10. Wiback S, Famili I, Greenberg HJ, Palsson B: Monte Carlo sampling can be used to determine the size and shape of the steady-state flux space. J Theor Biol 2004, 228: 437–447. 10.1016/j.jtbi.2004.02.006

    Article  PubMed  Google Scholar 

  11. Price N, Schellenberger J, Palsson B: Uniform Sampling of Steady-State Flux Spaces: Means to Design Experiments and to Interpret Enzymopathies. Biophysical Journal 2004, 87: 2172–2186. 10.1529/biophysj.104.043000

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  12. Thiele I, Price N, Vo T, Palsson B: Impact of diabetes, ischemia, and diet. J Biol Chem 2005, 280: 11683–11695. 10.1074/jbc.M409072200

    Article  CAS  PubMed  Google Scholar 

  13. Price N, Thiele I, Palsson B: Candidate States of Helicobacter pylori's Genome-Scale Metabolic Network upon Application of "Loop Law" Thermodynamic Constraints. Biophysical Journal 2006, 90: 3919–3928. 10.1529/biophysj.105.072645

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  14. Almaas E, Kovacs B, Vicsek T, Oltval Z, Barabasi AL: Global organization of metabolic fluxes in the bacterium Escherichia col. Nature 2004, 427: 839–843. 10.1038/nature02289

    Article  CAS  PubMed  Google Scholar 

  15. Simonovits M: How to compute the volume in high dimension? Math Progr 2003, 97(1–2):337–374.

    Google Scholar 

  16. Dyer M, Frieze A: On the complexity of computing the volume of a polyhedron. SIAM J Comput 1988, 17(5):967–97. 10.1137/0217060

    Article  Google Scholar 

  17. Khachiyan L: Complexity of volume computation. In New trends in discrete and computational geometry, Springer-Verlag Edited by: Pach J. 1993, 91–101.

    Chapter  Google Scholar 

  18. Bëuler B, Enge A, Fukuda K: Exact volume computation for convex polytopes: A practical study. In Polytopes-combinatorics and computation, Birkhauser Edited by: Ziegler GM, Kalai G. 2000, 131–154.

    Chapter  Google Scholar 

  19. Yedidia J, Freeman W, Weiss Y: Generalized belief propagation. In Advances in Neural Information Processing Systems (NIPS) 13, Denver, CO Edited by: press M. 2001, 772–778.

    Google Scholar 

  20. Kschischang FR, Frey BJ, Loeliger HA: Factor graphs and the sum-product algorithm. Information Theory, IEEE Transactions on 2001, 47(2):498–519. 10.1109/18.910572

    Article  Google Scholar 

  21. Braunstein A, Mezard M, Zecchina R: Survey propagation: An algorithm for satisfiability. Random Struct Algorithms 2005, 27: 201–226. 10.1002/rsa.20057

    Article  Google Scholar 

  22. MacKay DJC: Information Theory, Inference, and Learning Algorithms. Cambridge University Press; 2003.

    Google Scholar 

  23. Reed J, Vo T, Schilling C, Palsson B: An expanded genome-scale model of Escherichia coli K-12 (iJR904 GSM/GPR). Genome Biology 2003, 4(9):R54. 10.1186/gb-2003-4-9-r54

    Article  PubMed Central  PubMed  Google Scholar 

  24. De Martino A, Martelli RMC, Castillo IP: Von Neumann expanding model on random graphs. Journal of Statistical Mechanics: Theory and Experiment 2007, 2007(05):P05012. 10.1088/1742-5468/2007/05/P05012

    Article  Google Scholar 

  25. Bianconi G, Zecchina R: Viable flux distribution in metabolic networks. Technical report 2007. ArXiv:cond-mat/0705.2816 [http://aps.arxiv.org/pdf/0705.2816]

    Google Scholar 

  26. Beard D, Babson E, Curtis E, Qian H: Thermodynamic constraints for biochemical networks. J Theor Biol 2004, 228: 327–333. 10.1016/j.jtbi.2004.01.008

    Article  CAS  PubMed  Google Scholar 

  27. Baxter B: Exactly Solved Models in Statistical Mechanics. London: Academic Press Inc; 1989.

    Google Scholar 

  28. Mezard M, Parisi G: The Bethe lattice spin glass revisited. The European Physical Journal B 2001, 20: 217. 10.1007/PL00011099

    Article  CAS  Google Scholar 

  29. Mezard M, Parisi G: The cavity method at zero temperature. J Stat Phys 2003, 111: 1. 10.1023/A:1022221005097

    Article  Google Scholar 

  30. Richardson T, Urbanke R: The capacity of low-density parity check codes under message passing decoding. IEEE, Trans Info Theory 2001, 47: 599–618. 10.1109/18.910577

    Article  Google Scholar 

  31. Weiss Y, Freeman WT: Correctness of Belief Propagation in Gaussian Graphical Models of Arbitrary Topology. Neural Comp 2001, 13(10):2173–2200. 10.1162/089976601750541769

    Article  CAS  Google Scholar 

  32. Baldassi C, Braunstein A, Brunel N, Zecchina R: Efficient supervised learning in networks with binary synapses. PNAS 2007, 104: 11079–11084. [http://aps.arxiv.org/pdf/0707.1295] 10.1073/pnas.0700324104

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  33. Mezard M, Parisi G, Zecchina R: Analytic and Algorithmic Solution of Random Satisfiability Problems. Science 2002, 297: 812. [http://aps.arxiv.org/pdf/0707.1295] 10.1126/science.1073287

    Article  CAS  PubMed  Google Scholar 

  34. Mulet R, Pagnani A, Weigt M, Zecchina R: Coloring random graphs. Phys Rev Lett 2002, 89: 268701. [http://arxiv.org/abs/cond-mat/0208460] 10.1103/PhysRevLett.89.268701

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

AB was supported by Microsoft TCI grant. RM wants to thank the International Center for Theoretical Physics in Trieste and the Center for Molecular Immunology of La Habana, for their cordial hospitality during the completion of this work. We are also very grateful to Ginestra Bianconi, Michele Leone, Martin Weigt, and Riccardo Zecchina, for very interesting discussions, and in particular to Carlotta Martelli for sharing with us a human readable E. coli data set. We are really grateful to Jan Schellenberger for providing us the MCS data used in Figure 6 and 7.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Roberto Mulet or Andrea Pagnani.

Additional information

Authors' contributions

Authors equally contributed to this work. All authors read and approved the final manuscript.

Electronic supplementary material

12859_2007_2225_MOESM1_ESM.tgz

Additional file 1: Human red blood cell metabolism supplementary files. The archive contains 4 text files: rbc.sto (stoichiometric matrix where each column is a reaction and each line is a metabolite), rbc.flu (one column, contains the sign of exchange fluxes: negative are incoming positive outcoming, zero free), rbc.names (contains the name of both reactions and metabolites), and rbc.max (contains maximum flux values of the reactions). (TGZ 2 KB)

12859_2007_2225_MOESM2_ESM.tgz

Additional file 2: E. coli central metabolism supplementary files. The archive contains 3 text files: central.sto (stoichiometric matrix where each column is a reaction and each line is a metabolite), central.flu (one column, contains the sign of exchange fluxes: negative are incoming, positive outcoming, zero free), central.names (contains the name of both reactions and metabolites). (TGZ 988 bytes)

Authors’ original submitted files for images

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Braunstein, A., Mulet, R. & Pagnani, A. Estimating the size of the solution space of metabolic networks. BMC Bioinformatics 9, 240 (2008). https://doi.org/10.1186/1471-2105-9-240

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/1471-2105-9-240

Keywords