Analysis on relationship between extreme pathways and correlated reaction sets

Background Constraint-based modeling of reconstructed genome-scale metabolic networks has been successfully applied on several microorganisms. In constraint-based modeling, in order to characterize all allowable phenotypes, network-based pathways, such as extreme pathways and elementary flux modes, are defined. However, as the scale of metabolic network rises, the number of extreme pathways and elementary flux modes increases exponentially. Uniform random sampling solves this problem to some extent to study the contents of the available phenotypes. After uniform random sampling, correlated reaction sets can be identified by the dependencies between reactions derived from sample phenotypes. In this paper, we study the relationship between extreme pathways and correlated reaction sets. Results Correlated reaction sets are identified for E. coli core, red blood cell and Saccharomyces cerevisiae metabolic networks respectively. All extreme pathways are enumerated for the former two metabolic networks. As for Saccharomyces cerevisiae metabolic network, because of the large scale, we get a set of extreme pathways by sampling the whole extreme pathway space. In most cases, an extreme pathway covers a correlated reaction set in an 'all or none' manner, which means either all reactions in a correlated reaction set or none is used by some extreme pathway. In rare cases, besides the 'all or none' manner, a correlated reaction set may be fully covered by combination of a few extreme pathways with related function, which may bring redundancy and flexibility to improve the survivability of a cell. In a word, extreme pathways show strong complementary relationship on usage of reactions in the same correlated reaction set. Conclusion Both extreme pathways and correlated reaction sets are derived from the topology information of metabolic networks. The strong relationship between correlated reaction sets and extreme pathways suggests a possible mechanism: as a controllable unit, an extreme pathway is regulated by its corresponding correlated reaction sets, and a correlated reaction set is further regulated by the organism's regulatory network.


Background
In the past decades, genome-scale metabolic networks capable of simulating growth have been reconstructed for about twenty organisms [1]. A framework for constraintbased reconstruction and analysis (COBRA) has been developed to model and simulate the steady states of metabolic networks [2][3][4]. As reviewed in the literature [5], COBRA has been successfully applied to studying the possible phenotypes. Thus, it has attracted many attentions and gets rapid progress.
The COBRA framework represents a metabolic network as a stoichiometric matrix S. With the homeostatic-steadystate hypothesis and fluxes boundaries, all allowable steady-state flux distributions are limited in a space which can be represented as where S m × n is the stoichiometric matrix for a network consisting of m metabolites and n fluxes and v n × 1 is a vector of the flux levels through each reaction in the system [6].
Given the reversibility of reactions, an internal reversible reaction can be decoupled into two separate reactions for the forward and reverse directions separately. It means all fluxes should take a non-negative value and the solution space is now a convex polyhedral cone in high-dimensional space [6,7]. This convex cone can be spanned by a set of extreme pathways (ExPa), (p i , i = 1, ..., k) [8,9]. Every possible steady-state flux distribution in the solution space may therefore be represented as a non-negative combination of extreme pathways (ExPa): Extreme pathways (ExPa) have the following properties which make them biologically meaningful [10,11]: 1. The ExPa set of a given network is unique.
2. Each ExPa uses least reactions to be a functional unit. 3. The ExPa set is systemically independent which means an ExPa can't be decomposed into a non-negative combination of the remaining ExPas.
A similar network-based pathway definition as ExPa is elementary flux modes (EM) [12][13][14]. The algorithm for elementary flux modes (EM) treats internal reversible reactions differently from that for ExPas. Actually, ExPa set is a systemically independent subset of elementary flux modes (EM) and each EM can be represented by a non-negative combination of ExPas. The relationship and difference between ExPa and EM have been studied and articulated in literatures [10,15].
ExPas and EMs lead to a systems view of network properties [16] and they also provide a promising way to understand network functionality, robustness as well as regulation [17,18]. However, the number of ExPas for a reaction network grows exponentially with network size which makes the use of ExPas for large-scale networks computationally difficult [19,20].
A rapid and scalable method to quantitatively characterize all allowable phenotypes of genome-scale networks is uniform random sampling [21]. It studies the contents of the available phenotypes by sampling the points in the solution space. The set of flux distributions obtained from sampling can be analyzed to measure the pairwise correlation coefficients between all reaction fluxes and can be further used to define correlated reaction sets (CoSet). Correlated reaction sets (CoSet) are unbiased, conditiondependent definitions of modules in metabolic networks in which all the reactions have to be co-utilized in precise stoichiometric ratios [22]. It means the fluxes of the reactions in the same correlated reaction sets (CoSet) go up or down together in fixed ratios. We may think about whether CoSets provide clues about regulated procedures of a metabolic network.
Both ExPas and CoSets are determined by the topology of a metabolic network. Although lots of analyses were done on them separately [23][24][25], few attention has been paid to the relationship between them. Here, our aim is to reveal the relationship between ExPas and CoSets. We select Escherichia coli core metabolic network, human red blood cell metabolic network and Saccharomyces cerevisiae metabolic network as examples to start our research.

Escherichia coli core metabolic network
We use the E. coli core model published on the web site of UCSD's systems biology research group. It is "a condensed version of the genome-scale E. coli reconstruction and contains central metabolism reactions" [26]. Details of this model can also be found in a published book [27]. The network contains 62 internal reactions, 14 exchange reactions and a biomass objective function.
The computation of the extreme pathways for E. coli core model results in 7784 ExPas, in which 7748 are type I or II ExPas and 36 are type III ExPas (Calculation and classification of ExPas are discussed in Methods section). The type I and II ExPas will be focused on herein and the reason for neglecting type III ExPas will be explained in Methods section. Twenty CoSets are identified on this net- work based on pairwise correlation coefficients between all reaction fluxes and listed in table 1.
For each CoSet C j , we check how many type I and II ExPas use k reactions in C j , where k ranges from zero to the size of C j . The result is shown in table 2. Taking CoSet 3 as an  example, from table 1  We also calculate, for each ExPa p i , the ratio of reactions in any CoSet which is fully covered by p i to all reactions in p i . The distribution of the ratios is shown in Figure 1. Each ExPa of E. coli core model covers at least one CoSet. The coverage rates are higher than 40% which implies that ExPas are under well control of CoSets.

Human red blood cell metabolic network
Human red blood cell (RBC) metabolic network has been well reconstructed and simulated [28][29][30][31]  Both 'Ex_NADP' and 'Ex_NADPH' belong to CoSet 1, indicating the need of RBC cell to balance the NADPH/ NADP ratio. According to "historically" partition of metabolic pathways, when pathway PPP is up-regulated, the quantity of NADP increases. When metabolic pathway EMP is up-regulated, the quantity of NADPH comes up.   thus form a CoSet. Among the sample ExPas, we find that some of them utilize 'AKGMAL' and 'AKGt2r' while others use 'MALt2r' only. But, there are also some ExPas utilizing 'AKGt2r' while we don't find any sample ExPas that use the other two reactions in the CoSet. However, according to the above analysis, there should be some complemental ExPas utilizing reactions in the CoSet other than 'AKGt2r'. Otherwise, the cell will die due to the insupportable internal environment. Since the whole ExPa set is extremely large, the available ExPa sample set can only give a glance at the tremendous ExPa set and will certainly lose some information.
The scale of S. cerevisia metabolic network is much larger. However, complementary relationship on usage of reactions in a CoSet is repeated as that in E. coli core metabolic network and RBC metabolic network.

Conclusion
In this study, we investigated the relationship between CoSets and ExPas on the in-silicon representations of three metabolic networks. These models are different in species and scale. However, the experiment on each model leads to similar results that ExPas show strong complementary relationship on the usage of reactions in the same CoSet.

It implies that this kind of relationship may exist in most
Metabolic maps of RBC Figure 2 Metabolic maps of RBC. The graph is adapted from [25]. CoSet label of each reaction is added and different symbols are used to represent forward(→) and reverse(→) directions separately.
of organisms. Since both CoSets and ExPas are generated from the topology information of metabolic networks, this phenomenon may reflect some inherent properties resulting from the topology constraints composed on the networks.
Moreover, our study not only reveals the interesting relationship between CoSets and ExPas but also provides a new sight of how the metabolic network works and how it is adjusted. The strong relationship between CoSets and ExPas provides clues about regulated procedure of a metabolic network, thus suggests a possible mechanism of how a metabolic network transferring its phenotype from one steady state to another. As functional units, ExPas are in control of the regulatory structure of the metabolic network, and the regulatory command usually spreads from regulated reactions to CoSets and finally to the related ExPas. As fluxes through each ExPa change according to the regulatory orders from its corresponding CoSets, the flux distribution of the whole metabolic network transfers towards a new steady state. By interrogating the relationship between CoSets and ExPas, we can tell which ExPas are possible to be up (down) regulated caused by an increasing (decreasing) flux in a given CoSet. This result    may help predict the function of regulatory factors acting on metabolism. However, in order to answer the question which ExPas are really regulated, more information should be considered, such as regulatory structure of the metabolic networks as well as kinetic and thermodynamic constraints, which will be our future work.

ExPas computation and classification
ExPas are computed by an open source tool, 'expa', developed by Steven L. Bell and Bernhard O. Palsson [34]. The exchange fluxes can be separated into two groups: primary exchange fluxes and currency exchange fluxes. Primary exchange fluxes are external fluxes and currency exchange fluxes are fluxes external to metabolism but internal to the cell [27]. ExPas can be divided into three categories according to their use of exchange fluxes [35]. Type I ExPas utilize primary exchange fluxes as well as currency exchange fluxes. Type II ExPas involve currency exchange fluxes only. Type III ExPas are solely internal cycles without any exchange fluxes. Since type III ExPas are thermodynamically infeasible [36], we neglect type III ExPas and only focus on those of type I and II.

CoSets computation
The CoSets of each metabolic model is generated by COBRA toolbox, an integrated toolbox of functions which are useful for analysis and simulation of organism's metabolic behavior [22]. For each model, uniform random sampling has been done first in the condition of optimum growth and results in 100,000 unique sample flux distributions that are available to the network. Then, 10,000 samples have been randomly selected and used to measure the pairwise correlation coefficients between reactions. We set the threshold of square pairwise correlation CoSets coverage rate of ExPas of S. cerevisia model Figure 5 CoSets coverage rate of ExPas of S. cerevisia model. The y-axis indicates the number of extreme pathways which have the corresponding CoSets coverage rates; the x-axis lists the Cosets coverage rates, ranging from 0 to 1.
CoSets coverage rate of ExPas of RBC metabolic network Figure 3 CoSets coverage rate of ExPas of RBC metabolic network. The y-axis indicates the number of ExPas which have the corresponding CoSets coverage rates; the x-axis represents the Cosets coverage rates, ranging from 0 to 1.
Length of iND750's sample ExPas Figure 4 Length of iND750's sample ExPas. The y-axis indicates the number of ExPas which consist of the corresponding number of reactions; the x-axis represents the number of reactions contained in a single ExPa. The ExPa sampling process found no ExPa whose length is less than 20 or more than 80.
coefficient to 1 -1e -8 while identifying CoSets of each metabolic network assuring that reactions in the same CoSets have strong correlation with each other. The procedure of CoSets identification has been carried out 20 times for each model and the results are quite stable.

Sampling for ExPa subset
We randomly delete a few reactions in S. cerevisiae's iND750 model, and enumerate all the ExPas of the sub network. Then, the dimensions of deleted reactions are inserted back with zeros to these ExPas. As proved in Theorem 1, the ExPa set derived from sampling is a subset of the whole ExPa set of iND750. One thousand ExPa sets of different sub networks of iND750 model have been generated and merged together without redundancy. The union of all these ExPas constitute the sample set of ExPas used in the analysis on Saccharomyces cerevisiae metabolic network.

Theorem 1. Suppose G is a metabolic network and ‫ސ‬ is the ExPa set of G, then for any sub network G', its ExPa set ‫'ސ‬ is a subset of ‫.ސ‬
Proof. We assume that the available steady state flux distribution (v) of G lies in the convex cone : Without loss of generality, we assume G' is generated from G by deleting reactions v k , v k+1 , ..., v n , then the steady state flux distribution of G' lies in the convex cone : Assuming that = {a i | a i ∈ and = 0, j = k, ..., n}.

List of abbreviations used
The abbreviations used in this study are listed in table 7.   Table 5. For each CoSet, we calculate how many ExPas cover k reactions in it where k ranges from 0 to size of this CoSet. Xylulose-5-phosphate epimerase Abbreviations used in this study are divided into three parts. They are concept abbreviations, metabolite abbreviations and pathway/reaction abbreviations.