Skip to main content

Genome-scale analysis to the impact of gene deletion on the metabolism of E. coli: constraint-based simulation approach

Abstract

Background

Genome-scale models of metabolism have only been analyzed with the constraint-based modelling philosophy. Some gene deletion studies on in silico organism models at genome-scale have been made, but most of them were from the aspects of distinguishing lethal and non-lethal genes or growth rate. The impact of gene deletion on flux redistribution, the functions and characters of key genes, and the performance of different reactions in entire gene deletion still lack research.

Results

Three main researches have been done into the metabolism of E. coli in gene deletion. The first work was about finding key genes and subsystems: First, by calculating the deletion impact p of whole 1261 genes, one by one, on the metabolic flux redistribution of E. coli _iAF1260, we can find that p is more detailed in describing the change of organism's metabolism. Next, we sought out 195 important (high-p) genes, and they are more than essential genes (growth rate f becomes zero if deleting). So we speculated that under some circumstances and when an important gene is deleted, a big change in the metabolic system of E. coli has taken place and E. coli may use other reaction ways to strive to live. Further, by determining the functional subsystems to which 195 key genes belong, we found that their distribution to subsystems was not even and most of them were related to just three subsystems and that all of the 8 important but not essential genes appear just in "Oxidative Phosphorylation". Our second work was about p's three characters: We analyzed the correlation between p and d (connection degree of one gene) and the correlation between p and v gene (flux sum controlled by one gene), and found that both of them are not of linear correlation, but the correlation between p and f is of highly linear correlation. The third work was about highly-affected reactions: We found 16 reactions with more than 2000 R g value (measuring the impact that a reaction is gotten in the whole 1261 gene deletion). We speculated that highly-affected reactions involve in the metabolism of basic biomasses.

Conclusion

To sum up, these results we obtained have biological significances and our researches will shed new light on the future researches.

Background

Since various 'omics' datasets are becoming available, biology has transited from a data-poor to a data-rich environment. This has underscored the need for systems analysis in biology and systems biology has become a rapidly growing field as well [1].

A change in mathematical modelling philosophy is also necessitated, and that is based on building and validating in silico models. Modern biological models need to meet new sets of criteria: organism-specific, data-driven, easily scalable, and so on. Many modelling approaches, such as kinetic, stochastic and cybernetic approaches, are currently being used to model cellular processes. Owing to the computational complexity and the large number of parameters needed, it is currently difficult to use these methods to model genome-scale networks. To date, genome-scale models of metabolism have only been analyzed with the constraint-based modelling philosophy [2, 3]. Genome-scale network models of diverse cellular processes such as signal transduction, transcriptional regulation and metabolism have been generated. Gene-protein-reaction (GPR) associated models can keep track of associations between genes, proteins, and reactions [4], and there have been several genome-scale GPR models, such as E. coli [4, 5], S. aureus [6], H. pylori [7], M. barkeri [8], S. cerevisiae [9] and B. subtilis [10]. A reconstruction is herein defined as the list of biochemical reactions occurring in a particular cellular system and the associations between these reactions and relevant proteins, transcripts and genes [2]. A reconstruction can include the assumptions necessary for computational simulation, such as maximum reaction rates and nutrient uptake rates [11].

Computer simulations of complex biological systems become essential as soon as the computational capability become available. As reconstructed networks have been made publicly available, researchers around the world have undertaken new computational studies using these networks [12]. Many researches apply a core set of basic in silico methods and often also describe novel methods to investigate different models. An extensive set of methods for analyzing these genome-scale models have been developed and have been applied to study a growing number of biological problems [12]. But as we have mentioned above, as yet, genome-scale models of metabolism have only been analyzed with the constraint-based philosophy [2, 3].

The in silico models can be applied to generate novel, testable and often quantitative predictions of cellular behaviors [13]. The impact of a gene deletion experiment on cellular behavior can be simulated in a manner similar to linear optimization of growth [14]. The results can be used to guide the design of informative confirmation experiments and will be helpful for metabolic engineering. Some gene deletion studies on the genome-scale in silico models of organisms have been made [4–10, 15–19], but most of them are from the standpoints of distinguishing lethal and non-lethal genes or growth rate [4–10, 15–22]. The impact of gene deletion on flux redistribution, the characters and functions of key genes, and the performance of different reactions in entire gene deletion still lack research.

In this paper, in the part of results, we have done three research works. The first one: First, we calculated flux distribution of E. coli _iAF1260. Then we calculated the deletion impact of whole 1261 genes (using p to describe the deletion impact of one gene), one by one, on the metabolic flux redistribution of E. coli _iAF1260. Next, we sought out the important genes that most greatly affect the metabolic flux distribution, and furthermore determined their functional subsystems. The second one: We analyzed the correlation between p (describing deletion impact of one gene) and f (describing growth rate in the deletion of 1261 genes), the correlation between p and d (connection degree of one gene) and the correlation between p and v gene (flux sum controlled by one gene). The third one: We made research into what are the reactions affected most greatly in the whole 1261 gene deletion (using R g to measure the impact). In the part of methods and materials, we introduced the GPR model, some properties of the in silico model of E. coli _iAF1260 (SBML (Systems Biology Markup Language) format) and the method of constraint-based analysis.

Results and discussion

Metabolic flux distribution of E. coli _ iAF 1260

As a base for the later comparing research, we here calculate the flux distribution of E. coli _iAF1260. What we use is E. coli _iAF1260_ flux1.xml, one of the two SBML files that are presented with the reconstruction of E. coli [5]. The computational method we use is flux balance analysis (FBA) [11], one of the fundamental genome-scale phenotypic calculations, which can simulate cellular growth. FBA is based on linear optimization of an objective function, which typically is biomass formation. Given an uptake rate for key nutrients and the biomass composition of the cell (usually in mmol component gDW-1 and defined in the biomass objective function), the maximum possible growth rate of the cells can be predicted in silico. We use the COBRA toolbox [11] to carry out this computation of FBA. The flux distribution of E. coli _iAF1260 is illustrated in Figure 1.

Figure 1
figure 1

Flux distribution of E. coli _iAF1260. X-axis indicating every reaction in rxns (the order is as the same as in rxns, total 2382) and y-axis indicating the value of its corresponding flux (unit is mmol gDW-1h-1). Rxns is the reaction set in the model.

Impact of gene deletion on the metabolic flux redistribution and key genes

As our first work, we now do research into the impact of gene deletion on the metabolic system of E. coli. First we calculate the deletion impact of 1261 genes, further seek out important genes and functional subsystems to which these key genes respectively belong.

1) Impact of gene deletion on the metabolic flux redistribution and key genes that affect metabolism most greatly

There are 1261 genes in the model of E. coli _iAF1260. If a single gene is associated with multiple reactions, the deletion of that gene will result in the removal of all associated reactions. On the other hand, a reaction that can be catalyzed by multiple non-interacting gene products will not be removed in a single gene deletion. By the aid of the COBRA toolbox [11], we can calculate the impact of their deletion. We define the impact of one gene deletion on the whole metabolic flux redistribution as p

p = ∑ i R ( v ′ i − v i ) 2 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiCaaNaeyypa0ZaaabCaeaacqGGOaakcuWG2bGDgaqbamaaBaaaleaacqWGPbqAaeqaaOGaeyOeI0IaemODay3aaSbaaSqaaiabdMgaPbqabaGccqGGPaqkdaahaaWcbeqaaiabikdaYaaaaeaacqWGPbqAaeaacqWGsbGua0GaeyyeIuoaaaa@3D27@
(1)

Where v i and v ′ i MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafmODayNbauaadaWgaaWcbaGaemyAaKgabeaaaaa@2EDD@ are respectively the flux value of i-th reaction of the model of E. coli _iAF1260 before and after a single gene deleting and R is the whole reaction set. In most of the researches on gene deletion [4–10, 15–22], the change of growth rate f is often used to describe the impact of gene deletion. The reason why we define p as the impact of gene deletion is that we believe it is more detailed in describing the change of organism's metabolism. p has considered the flux change taking place at every reaction, and it uses the square sum of the difference between v i and v ′ i MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafmODayNbauaadaWgaaWcbaGaemyAaKgabeaaaaa@2EDD@ . Otherwise, f is just a whole measure and it does not distinguish the flux change taking place at every reaction.

Figure 2 shows the deletion impact of these 1261 genes. Table 1 gives p scopes, gene numbers falling within these scopes and their corresponding percentages that these genes take.

Figure 2
figure 2

The deletion impact p of 1261 genes of the E. coli _iAF1260 model. X-axis indicating every gene in 1261 genes (the order is as the same as in genes, total 1261) and y-axis indicating its impact p. Genes is the set of genes in model.

Table 1 p scopes, gene number (GN) and percentages

Figure 3 shows the deletion impact of these 1261 genes to the growth rate f of E. coli. Every deletion of these 1261 genes will entail a new f.

Figure 3
figure 3

The deletion impact of 1261 genes to f of the E. coli _iAF1260 model. X-axis indicating every gene in 1261 genes and y-axis indicating new f after its deletion.

We define those genes with p>9800 as key genes or high-p genes, and there are 195 genes in total. There are 187 cases in which f = 0, their corresponding genes are usually called essential genes or zero-f genes, and all of their p are >9800. These 187 so-called essential genes are consistent with previous literatures [5], except "s0001" which is not included in the report of Ref. [5]. The left 8 genes with p > 9800 &f ≠ 0 are shown in Table 2 with bold text, and we call them INE (Important but Not Essential) genes. Additional file 1 provides the details. Comparing with experiment observation [22], six (b3731, b3733, b3734, b3735, b3736, b3738, b3731) of the 8 INE genes are essential genes; Comparing with experiment observation [23], two (b3731, b3736) of the 8 INE genes are essential genes. At the same time, two genes (b0529 and b3956) are reported as essential genes in Ref. [5], but they are not key genes as for our computation, while b3956 is reported as nonessential gene both in Ref. [22, 23] and b0529 is reported as nonessential gene both in Ref. [22]. From these comparisons, we can find that p has an advantage over f in describing the change of organism's metabolism.

Table 2 The functional subsystems (SS) and their related genes of E. coli _iAF1260

We also note that there are 8 genes with p>9800 &f ≠ 0. Based on the fact, we can speculate that, under some circumstances and when an important gene is deleted, a big change in the metabolic system of E. coli has taken place and E. coli may use other reaction ways to strive to live. This may reflect the robustness of the metabolic networks of microbes. It is also an important and interesting conclusion.

2) Functional subsystems to which these key genes belong

If a gene catalyzes a reaction which belongs to a certain subsystem, we say that the gene belongs to the subsystem. Functional subsystems about important genes in the metabolic system of micro-organism are seldom reported. We have hereinabove defined those genes with p>9800 as key genes. We now list the functional subsystems to which every key gene belongs, 23 subsystems in total, and several genes appear in more than one subsystem, shown in Table 2. The 23 functional subsystems are "Threonine and Lysine Metabolism (TLM), Cofactor and Prosthetic Group Biosynthesis (CPGB), Valine Leucine and Isoleucine Metabolism (VLIM), Cell Envelope Biosynthesis (CEB), Lipopolysaccharide Biosynthesis Recycling (LBR), Methionine Metabolism (MM), Arginine and Proline Metabolism (APM), Glycerophospholipid Metabolism (GM), Membrane Lipid Metabolism (MLM), Purine and Pyrimidine Biosynthesis (PPB), Nucleotide Salvage Pathway (NSP), Citric Acid Cycle (CAC), Glycine and Serine Metabolism (GSM), Tyrosine Tryptophan and Phenylalanine Metabolism (TTPM), Transport Inner Membrane (TIM), Alanine and Aspartate Metabolism (AAM), Folate Metabolism (FM), Alternate Carbon Metabolism (ACM), Histidine Metabolism (HM), Cysteine Metabolism (CM), Inorganic Ion Transport and Metabolism (IITM), Oxidative Phosphorylation (OP), Unassigned (U)".

We can find that the distribution to subsystems of these 195 key genes is not even and most of them are related to "Cofactor and Prosthetic Group Biosynthesis", "Cell Envelope Biosynthesis" and "Purine and Pyrimidine Biosynthesis" subsystems, especially CPGB. We can also find that all of the important but not essential (INE) genes, 8 in total, appear in "Oxidative Phosphorylation".

The reason for many high-p genes just belonging to several metabolic subsystems maybe is in that these subsystems involve many reactions and provide supports for other subsystems; The reason for INE genes just belonging to "Oxidative Phosphorylation (OP)" subsystem probably is in that the permissibility which E. coli use other reaction ways to carry out this kind of metabolism, under the given media condition, takes place on OP subsystem.

Analysis to the three characters of p

As our second work, we now begin research into some properties of the metabolic network of E. coli, i.e., three characters of p. Some properties about the metabolic network of micro-organisms have been reported in literatures [15–22]. Because the measure we defined is different, our research will provide further evidences to the properties about the metabolic network.

1) Correlation between p and f (describing growth rate in the deletion of 1261 genes)

Figure 4 is the scatter diagram (p, f), total 1261 data pairs. Many data pairs are superposition and locate at the same place, so there aren't lots of points in the figure. From the diagram, we can easily find that the relationship between p and f is of highly linear correlation. High p corresponds to low f.

Figure 4
figure 4

The scatter diagram ( p , f ). X-axis indicating p and y-axis indicating f, total 1261 data pairs. Many data pairs locate at the same points.

2) Correlation between p and d (connection degree of every gene in network)

We compute out the related reaction number d of every gene in those 1261 genes of the E. coli _iAF1260 model, as illustrated in Figure 5. From the figure, we can find that some but not many genes have high d value, but we don't know whether they affect metabolic flux distribution greatly.

Figure 5
figure 5

The related reaction number of every gene in 1261 genes of the E. coli _iAF1260 model. X-axis indicating every gene in 1261 genes (the order is as the same as in genes, total 1261) and y-axis indicating the number of its related reactions.

Figure 6 is the scatter diagram (d, p), 1261 data pairs in total. Still many data pairs are superposition and locate at the same place. From the diagram, we can easily find that the relationship between d and p is not of linear correlation. So high-d genes and low-d genes are equally important to the metabolism of E. coli _iAF1260.

Figure 6
figure 6

The scatter diagram ( d , p ). X-axis indicating d (connection degree of every gene) and y-axis indicating the corresponding gene impact p.

3) Correlation between p and v gene (flux sum controlled by every gene)

We define the flux sum controlled by every gene as

v g e n e = ∑ j R g e n e | v j | MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemODay3aaSbaaSqaaiabdEgaNjabdwgaLjabd6gaUjabdwgaLbqabaGccqGH9aqpdaaeWbqaamaaemaabaGaemODay3aaSbaaSqaaiabdQgaQbqabaaakiaawEa7caGLiWoaaSqaaiabdQgaQbqaaiabdkfasnaaBaaameaacqWGNbWzcqWGLbqzcqWGUbGBcqWGLbqzaeqaaaqdcqGHris5aaaa@44BB@
(2)

Where v j is the flux value of j-th reaction of the model of E. coli _iAF1260 before a single gene deleting and R gene is the reaction set controlled by the given gene. We can easily compute out the flux sum v gene of every gene in those 1261 genes of the E. coli _iAF1260 model, as illustrated in Figure 7. From the figure, we can find that some but not many genes have high v gene value, but will they affect metabolic flux distribution greatly?

Figure 7
figure 7

The controlled reaction number of every gene in 1261 genes of the E. coli _iAF1260 model. X-axis indicating every gene in 1261 genes (the order is as the same as in genes, total 1261) and y-axis indicating the number of its controlled reactions.

Figure 8 is the scatter diagram (v gene , p), 1261 data pairs in total, and many data pairs are superposition. From the diagram, we can also find that the relationship between v gene and p is not of linear correlation as well. So genes with high v gene and genes with low v gene are equally important to the metabolism of E. coli _iAF1260.

Figure 8
figure 8

The scatter diagram ( v gene , p ). X-axis indicating v gene (the flux sum controlled by every gene) and y-axis indicating the impact, p.

Impact of gene deletion on every metabolic reaction

As our third work, we now make research into what are the reactions affected most greatly in the whole 1261 gene deletion. Highly-affected reactions (HAR) are often neglected in many researches in literatures about gene deletion study.

1) Impact of gene deletion on every metabolic reaction

There are 2382 reactions in the in silico model of E. coli _iAF1260. We define R g to measure the impact that a reaction is gotten in the whole 1261 gene deletion.

R g = ∑ k G | v k − v 0 | MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemOuai1aaSbaaSqaaiabdEgaNbqabaGccqGH9aqpdaaeWbqaamaaemaabaGaemODay3aaSbaaSqaaiabdUgaRbqabaGccqGHsislcqWG2bGDdaWgaaWcbaGaeGimaadabeaaaOGaay5bSlaawIa7aaWcbaGaem4AaSgabaGaem4raCeaniabggHiLdaaaa@3E4D@
(3)

Where v0 and v k are respectively the flux value of a certain reaction of the model of E. coli _iAF1260 before and after k-th gene deleting, and G is the set of whole 1261 genes.

Figure 9 provides each R g of 2382 reactions and Table 3 shows Rg scopes, corresponding reaction number within these scopes and the percentages that these reactions take. In the following section, we will determine what the highly-affected reactions are.

Figure 9
figure 9

The R g of each 2382 reactions of E. coli _iAF1260. X-axis indicating every reaction in 2382 reactions (the order is as the same as in rxns, total 2382) and y-axis indicating its corresponding R g value.

Table 3 R g scopes, reaction number (RN) and their percentages

2) Highly-affected reactions (HAR)

There are 42 reactions which the R g value of every one of them is beyond 500. Especially, for those with more than 2000 R g value, there are 16 reactions in total, and they are "ATPS4rpp, CO2tex, CO2tpp, CYTBO3_4pp, ENO, EX_co2(e), EX_h2o(e), EX_ o2(e), GAPD, H2Otex, H2Otpp, NADH16pp, O2tex, O2tpp, PGK, PGM".

Why are these 16 reactions more sensitive to gene deletion? Maybe, it is due to the fact that they involve in the metabolism of basic biomasses such as H2O, ATP, O2, NADH.

Conclusion

In this paper, we have done three main researches into the metabolism of E. coli in gene deletion. The first was to find its important genes and the corresponding belonging subsystems, the second was to analyze the characters of p, and the third was to find its highly-affected reactions in gene deletion.

To the first work: We used p to describe the impact which gene deletion entailed. Our first finding was that maybe p is more detailed than f in describing the change of organism's metabolism in gene deletion. After calculating the deletion impact of 1261 genes, we sought out 195 important genes (high p genes, p >9800), and they are more than essential genes (f = 0 genes). So our second finding was that under some circumstances and when an important gene is deleted, the metabolic system of E. coli has greatly changed and E. coli may use other reaction ways to strive to live. The third finding was that the distribution to subsystems of these 195 key genes is not even and most of them are related to about three subsystems ("Cofactor and Prosthetic Group Biosynthesis", "Cell Envelope Biosynthesis" and "Purine and Pyrimidine Biosynthesis") and that all of the 8 important but not essential (INE) genes appear just in "Oxidative Phosphorylation" subsystem. We have also tried to give some explanations.

To the second work: We have done research into p's three characters, i.e. its relationship with f, d, v gene . We found that p-f correlation was of highly linear correlation, while both of the p-d correlation and the p-v gene correlation were not of linear correlation. Our research can provide further evidences to the properties about the metabolic network, because the measure we defined is different.

To the third work: We defined R g to measure the impact that a reaction is gotten in the whole 1261 gene deletion. We calculated the R g value of each 2382 reactions and gave a statistics to the Rg scopes and the corresponding reaction number. Finally, we sought out 16 reactions with more than 2000 Rg value. We have also tried to give an explanation, i.e., these highly-affected reactions involve in the metabolism of basic biomasses.

In summary, because the in silico model of E. coli _iAF1260 is credible, we can conclude that the results we obtained have biological significances and that the researches we have done will shed new light on the future research. As a next step, we will try more media conditions to the research on E. coli, and will also do similar work on other organisms and compare them with the case of E. coli.

Methods

Gene-protein-reaction (GPR) associated model

The association between genes and reactions is not a one-to-one relationship. Many genes may encode subunits of a protein which catalyze one reaction, while there are genes that encode so-called promiscuous enzymes that can catalyze several different reactions. So it is necessary to keep track of associations between genes, proteins, and reactions and to distinguish "&" and "OR" associations in GPR models. Examples of different types of GPR associations are illustrated in Ref. [4, 14].

GPR model structure of E. coli _iAF1260

The in silico model that we use is E. coli _iAF1260 [5], a metabolic reconstruction consisting of the chemical reactions that transport and interconvert metabolites within E. coli K-12 MG1655. This network reconstruction was based on a previous reconstruction, termed E. coli _iJR904 [4]. The general features of E. coli _iAF1260 are shown in Ref. [5].

SBML format file to the model E. coli _iAF1260 can be downloaded from the supplementary information of Ref. [5]. There are two SBML files that are presented with the reconstruction, each containing a different flux distribution XML files. SBML file properties are given in the supplementary of Ref. [5]. The dimensions of rxns, mets, and genes are respectively 2382, 1668, 1261.

The minimal media of in silico model is an important aspect. The computational minimal media of E. coli _iAF1260 is also included in the supplementary information of Ref. [5]. In the method of constraint-based analysis, the biomass objective function (BOF) should be defined. The BOF was generated by defining all of the major and essential constituents that make up the cellular biomass content of E. coli [5].

Gene-protein-reaction associations embodied in rxnGeneMat matrix, which is a matrix with as many rows as there are reactions in the model and as many columns as there are genes in the model. The i th row and j th column contains a one if the j th gene in genes is associated with the i th reaction in rxns and zero otherwise.

Methodology of constraint-based analysis

1) Constraint-based analysis

In silico modelling and simulation of genome-scale biological systems are different from that practiced in the physicochemical sciences. A network can fundamentally have many different states or many different solutions. Which states (or solutions) are picked is up to the cell and based on the selection pressure experienced, and such choices can change over time. Therefore, constraint-based approaches [2, 3] to the analysis of complex biological systems have proven to be very useful. The differences between the physicochemical sciences and the physical sciences or engineering are illustrated in Ref. [14]. All theory-based considerations (i.e., engineering and physics) lead one to attempt to seek an "exact" solution, and typically computed based on the laws of physics and chemistry. However, constraint-based considerations (as in biology) are useful. Not only can a network have many different behaviors that are picked based on the evolutionary history of the organism, but also these networks can carry out the same function in many different and equivalent ways [14].

2) Representation of reconstructed metabolic network

Before calculation and simulation, the reconstructed metabolic network must be represented mathematically. The stoichiometric matrix, S, is the centerpiece of a mathematical representation of genome-scale metabolic networks. It represents each reaction as a column and each metabolite as a row, where each numerical element is the corresponding stoichiometric coefficient.

An upper and lower bound for the allowable flux through each reaction also requires defining. This represents the lowest and highest reaction rate possible for each reaction. The set of upper and lower bounds is represented as two separate vectors, each containing as many components as there are columns in S, and in the same order. In many cases, reversible reactions are defined to have an arbitrary large upper bound and an arbitrarily large negative lower bound. Irreversible reactions have a lower bound that is nonnegative, usually zero.

In order to predict meaningful fluxes, setting upper and lower bounds is especially important for exchange reactions which serve to uptake compounds to the cell or secrete compounds from the cell. The lower bound of exchange reaction column must be a finite negative number using this orientation (e.g., glucose). The upper bound of exchange reaction column must be greater than zero. At least one of the reactions in the model must have a constrained lower/upper bound, and typically, the substrate (e.g., glucose or oxygen) uptake rates are set to experimentally measured values. The upper and lower bounds for exchange reactions are quantitative in silico representations of the growth media environment.

3) Biomass objective function (BOF) and minimal media

The constraint-based approach is based on the assumption that cells strive to maximize their growth rate. This assumption which provides an acceptable starting point for many types of computations is satisfied by simulating maximal production of the molecules required to make new cells (biomass precursor molecules). In spite of their limitations, the predictive power of genome-scale models of metabolic networks has been demonstrated in diverse situations through careful experimentation [11].

The biomass objective function (the function v growth , see below) is a special reaction taking as substrates of all biomass metabolites, ATP and water and producing ADP, protons, and phosphate (as a result of the non-growth associated ATP maintenance requirement) [6].

The minimal media is determined computationally with the systematic testing of distinct inputs. Different combinations of molecules are allowed to enter the reaction network until the minimal group that allowed biomass production, or non-zero Z (see below), was found [6]. It is only concerned that some amount of biomass production is calculated but do not discriminate between extremely slow, inefficient growth and rapid growth.

4) Computation of phenotypic states

In genome-scale metabolic networks, the fluxes within a cell usually cannot be uniquely calculated because a range of feasible values exist when fluxes are subjected to known constraints. Flux balance analysis (FBA) is used to find optimal growth phenotypes. Briefly, a large-scale linear programming is used to find a complete set of metabolic fluxes (v) that are consistent with steady-state condition (eq. 4) and reaction rate bounds (eq. 5), and at the same time maximize the biomass objective function in the defined ratio. This corresponds to the following linear programming problem [6]:

max Z = v growth

Subject to

S·v= 0

α i <v i <β i

Where S is the stoichiometric matrix, and α i and β i define the bounds through each reaction v i . The flux range was set arbitrarily high for all internal reactions so that no internal reaction restricted the network, with the exception of irreversible reactions, which have a minimum flux of zero. The inputs to the system were restricted to a minimal media.

The value of Z computed with the above procedure can either be zero (predicting no growth) or greater than zero (corresponding to cellular growth) depending on the inputs and outputs that are allowed, according to the nutrients provided in the media.

5) Gene deletion study

The effect of a gene deletion experiment on cellular growth can be simulated in a manner similar to linear optimization of growth [5, 11]. Gene-reaction associations model the logical relationship between genes and their corresponding reactions. If a single gene is associated with multiple reactions, the deletion of that gene will result in the removal of all associated reactions, i.e. to simultaneously restrict the fluxes (upper and lower flux bounds) of these reactions to zero prior to computing maximal biomass objective function. On the other hand, a reaction that can be catalyzed by multiple non-interacting gene products will not be removed in a single gene deletion. The possible results from a simulation of a single gene deletion are unchanged maximal growth (non-lethal), reduced maximal growth or no growth (lethal). Those genes were considered essential if no biomass could be produced without their usage.

Abbreviations

GPR:

Gene-protein-reaction

SBML:

Systems Biology Markup Language

FBA:

Flux balance analysis

INE:

Important but Not Essential

HAR:

Highly-affected reactions

BOF:

Biomass objective function

TLM:

Threonine and Lysine Metabolism

CPGB:

Cofactor and Prosthetic Group Biosynthesis

VLIM:

Valine Leucine and Isoleucine Metabolism

CEB:

Cell Envelope Biosynthesis

LBR:

Lipopolysaccharide Biosynthesis Recycling

MM:

Methionine Metabolism

APM:

Arginine and Proline Metabolism

GM:

Glycerophospholipid Metabolism

MLM:

Membrane Lipid Metabolism

PPB:

Purine and Pyrimidine Biosynthesis

NSP:

Nucleotide Salvage Pathway

CAC:

Citric Acid Cycle

GSM:

Glycine and Serine Metabolism

TTPM:

Tyrosine Tryptophan and Phenylalanine Metabolism

TIM:

Transport Inner Membrane

AAM:

Alanine and Aspartate Metabolism

FM:

Folate Metabolism

ACM:

Alternate Carbon Metabolism

HM:

Histidine Metabolism

CM:

Cysteine Metabolism

IITM:

Inorganic Ion Transport and Metabolism

OP:

Oxidative Phosphorylation

U:

Unassigned

References

  1. Bork P: Is there biological research beyond Systems Biology? A comparative analysis of terms. Mol Syst Biol. 2005, 1: Art. No. 2005.0012.

    Google Scholar 

  2. Price ND, Papin JA, Schilling CH, Palsson BO: Genome-scale microbial in silico models: the constraints-based approach. Trends Biotechnol. 2003, 21: 162-169. 10.1016/S0167-7799(03)00030-1.

    Article  CAS  PubMed  Google Scholar 

  3. Price ND, Reed JL, Palsson BO: Genome-scale models of microbial cells: evaluating the consequences of constraints. Nature Reviews Microbiology. 2004, 2: 886-897. 10.1038/nrmicro1023.

    Article  CAS  PubMed  Google Scholar 

  4. Reed JL, Vo TD, Schilling CH, Palsson BO: An expanded genomescale model of Escherichia coli K-12 (iJR904GSM/GPR). Genome Biology. 2003, 4: R54-10.1186/gb-2003-4-9-r54.

    Article  PubMed Central  PubMed  Google Scholar 

  5. Feist AM, Henry CS, Reed JL, Krummenacker M, Joyce AR, Karp PD, Broadbelt LJ, Hatzimanikatis V, Palsson BØ: A genome-scale metabolic reconstruction for Escherichia coli K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information. Molecular Systems Biology. 2007, 3: Art. No. 121.

    Google Scholar 

  6. Becker Scott, Palsson Bernhard: Genome-scale reconstruction of the metabolic network in Staphylococcus aureus N315: an initial draft to the two-dimensional annotation. BMC Microbiol. 2005, 5: Art. No. 8.

    Google Scholar 

  7. Ines Thiele, Vo Thuy, Price Nathan, Palsson Bernhard: Expanded metabolic reconstruction of Helicobacter pylori (iIT341 GSM/GPR): an in silico genome-scale characterization of single- and double-deletion mutants. Journal of Bacteriology. 2005, 187: 5818-5830. 10.1128/JB.187.16.5818-5830.2005.

    Article  Google Scholar 

  8. Feist Adam, Scholten Johannes, Palsson Bernhard, Brockman Fred, Ideker Trey: Modeling methanogenesis with a genome-scale metabolic reconstruction of Methanosarcina barkeri. Molecular Systems Biology. 2006, 2: Art. No. 2006.0004.

    Google Scholar 

  9. Duarte Natalie, Herrgard Markus, Palsson Bernhard: Reconstruction and validation of Saccharomyces cerevisiae iND750, a fully compartmentalized genome-scale metabolic model. Genome Res. 2004, 14: 1298-1309. 10.1101/gr.2250904.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  10. Oh YK, Palsson BO, Park SM, Schilling CH, Mahadevan R: Genome-scale reconstruction of metabolic network in Bacillus subtilis based on high-throughput phenotyping and gene essentiality data. J Biol Chem. 2007, 282: 28791-28799. 10.1074/jbc.M703759200.

    Article  CAS  PubMed  Google Scholar 

  11. Becker SA, Feist AM, Mo ML, Hannum G, Palsson BO, Herrgard MJ: Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox. Nat Protoc. 2007, 2: 727-738. 10.1038/nprot.2007.99.

    Article  CAS  PubMed  Google Scholar 

  12. Feist Adam, Palsson Bernhard: The growing scope of applications of genome-scale metabolic reconstructions using Escherichia coli. Nature Biotechnology. 2008, 26: 659-667. 10.1038/nbt1401.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  13. Di Ventura B, Lemerle C, Michalodimitrakis K, Serrano L: From in vivo to in silico biology and back. Nature. 2006, 443: 527-533. 10.1038/nature05127.

    Article  CAS  PubMed  Google Scholar 

  14. Palsson BO: Systems biology properties of reconstructed networks. 2006, New York: Cambridge university press

    Chapter  Google Scholar 

  15. da Silva JPM, Acencio ML, Mornbach JCM, Vieira R, da Silva JC, Lemke N, Sinigagliac M: In silico network topology-based prediction of gene essentiality. Physica A-Statistical Mechanics and Its Applications. 2008, 387: 1049-1055. 10.1016/j.physa.2007.10.044.

    Article  Google Scholar 

  16. Ghim Cheol-Min, Goh Kwang-Il, Kahng Byungnam: Lethality and synthetic lethality in the genome-wide metabolic network of Escherichia coli. Journal of Theoretical Biology. 2005, 237: 401-411. 10.1016/j.jtbi.2005.04.025.

    Article  CAS  PubMed  Google Scholar 

  17. Papp Bala'zs, Pa'l Csaba, Hurst Laurence: Metabolic network analysis of the causes and evolution of enzyme dispensability in yeast. Nature. 2004, 429: 661-664. 10.1038/nature02636.

    Article  CAS  PubMed  Google Scholar 

  18. Mahadevan R, Palsson BO: Properties of Metabolic Networks: Structure versus Function. Biophysical Journal. 2005, 88: L07-L09. 10.1529/biophysj.104.055723.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  19. Edwards JS, Palsson BO: The Escherichia coli MG1655 in silico metabolic genotype: Its definition, characteristics, and capabilities. Proceedings of the National Academy of Sciences of the United States of America. 2000, 97: 5528-5533. 10.1073/pnas.97.10.5528.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  20. Becker SA, Palsson BO: Three factors underlying incorrect in silico predictions of essential metabolic genes. BMC Syst Biol. 2008, 2: 14-10.1186/1752-0509-2-14. Art. No. 14.

    Article  PubMed Central  PubMed  Google Scholar 

  21. Iman Famili, Jochen Förster, Jens Nielsen, Palsson Bernhard: Saccharomyces cerevisiae phenotypes can be predicted by using constraint-based analysis of a genome-scale reconstructed metabolic network. Proceedings of the National Academy of Sciences of the United States of America. 2003, 100: 13134-13139. 10.1073/pnas.2235812100.

    Article  Google Scholar 

  22. Joyce Andrew, Reed Jennifer, Aprilfawn White, Robert Edwards, Andrei Osterman, Tomoya Baba, Hirotada Mori, Lesely Scott, Palsson Bernhard, Sanjay Agarwalla: Experimental and Computational Assessment of Conditionally Essential Genes in Escherichia coli. J Bacteriol. 2006, 118: 8259-8271. 10.1128/JB.00740-06.

    Article  Google Scholar 

  23. Baba T, Ara T, Hasegawa M, Takai Y, Okumura Y, Baba M, Datsenko KA, Tomita M, Wanner BL, Mori H: Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol Syst Biol. 2006, 2: 2006.0008-10.1038/msb4100050.

    Article  PubMed Central  PubMed  Google Scholar 

Download references

Acknowledgements

Support for this work was provided by China Postdoctoral Science Foundation (20070420960), Jiangsu Planned Projects for Postdoctoral Research Funds (0701026B), and Southeast University Foundation of Science and Technology (XJ2008318). We thank systems biology research group at UCSD (University of California, San Diego) for providing the COBRA Toolbox and BIGG database, and thank Dr. Nicolo Giorgetti at IEEE for providing the Glpkmex program which is used to solve linear optimization problem.

This article has been published as part of BMC Bioinformatics Volume 10 Supplement 1, 2009: Proceedings of The Seventh Asia Pacific Bioinformatics Conference (APBC) 2009. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/10?issue=S1

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zixiang Xu.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

ZX collected the data, carried out the computation, performed the analysis, and drafted the manuscript. XS and SY participated in the design of the study. All authors read and approved the final manuscript.

Electronic supplementary material

12859_2009_3245_MOESM1_ESM.xls

Additional file 1: The detailed comparison. The file (detailed_comparison.xls) includes two contents: The first is the comparison between those 188 essential genes reported in Ref. [5] and those 195 important genes that we obtained from computation. The second is the comparison between p and f of every gene deletion that we obtained from computation. (XLS 98 KB)

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Xu, Z., Sun, X. & Yu, S. Genome-scale analysis to the impact of gene deletion on the metabolism of E. coli: constraint-based simulation approach. BMC Bioinformatics 10 (Suppl 1), S62 (2009). https://doi.org/10.1186/1471-2105-10-S1-S62

Download citation

  • Published:

  • DOI: https://doi.org/10.1186/1471-2105-10-S1-S62

Keywords