First incomplete E. colimodel
To create the first incomplete E. coli model, we removed the following reaction from the original E. coli model:
Note: in this subsection and the following subsections, the MetaCyc reaction unique identifier is shown between parentheses. In the reaction just shown, that identifier is branched-chainaminotransferleu-rxn, which can be used as an unambiguous keyword to search for more information about that reaction at BioCyc.org.
That reaction, going right to left, is the last step in the biosynthesis pathway of the amino acid L-leucine and, going left to right, the first step in the L-leucine degradation pathway. No other reaction in the model produces L-leucine. Because L-leucine is part of the biomass reaction of the model, we expect that adding this reaction will be suggested when gap-filling is done. Indeed, the MetaFlux gap-filling MILP solution suggested adding the same reaction, in the right to left direction. MetaFlux could give the relevant direction as part of the solution because all reversible candidate reactions are split into two reactions, one for each direction. MetaFlux did not suggest adding the reaction in the opposite direction because the reaction was not essential for growth given the nutrients. SCIP required 125 seconds to obtain this optimal MILP solution.
When the FastGapFilling algorithm was run on the same incomplete model, adding the same reaction, in the same direction, was suggested. Four iterations (that is, four LP runs) were needed before that solution was found; the first three iterations found solutions with three reactions. The total solver running time of these four iterations was 6 seconds.
This first simple example shows that FastGapFilling can find the exact same solution as the MILP technique in much less time.
Second incomplete E. colimodel
The second incomplete E. coli model is derived by removing, additionally to the reaction removed in the first incomplete E. coli model, three reactions that produce the compound tetrahydrofolate, which is part of the biomass. These additional reactions are:
Notice that the reaction of the previous incomplete model and these three reactions are in different metabolic pathways.
The MILP development mode of MetaFlux suggested adding three of the reactions that were removed: the first two reactions above, and the reaction removed in the first incomplete model. That is, the third removed reaction above was not suggested. The SCIP solver required 7,794 seconds or about 2:10 hours to find this optimal solution.
We ran the FastGapFilling algorithm on the same incomplete model. The smallest set of suggestd reactions included the same three reactions. Other solutions proposed included four reactions. FastGapFilling performed 12 LP solving iterations, with a solver total execution time of 16 seconds.
This second example shows that FastGapFilling can find the same solution as the MILP technique in substantially less time — in this case, well over 2 orders of magnitude faster.
Third incomplete E. colimodel
For this third example, we selected four reactions to remove from the original E. coli model tricarboxylic acid cycle (TCA cycle) pathway:
The MILP solution suggested adding two reactions, both in the taxonomic range of E. coli:
Although, these two reactions were not among the four reactions removed, the second reaction does produce the compound 2-oxoglutarate, one of the compounds produced by one of the reactions removed. Notice that the development mode of MetaFlux produces only one of the possible optimal solutions. Other different minimal cost solutions may provide the same value, but MetaFlux outputs only one of them. In this example, an optimal solution might include two of the four reactions removed, but we cannot confirm it. SCIP required 9,729 seconds or about 2:42 hours to find this solution.
However, FastGapFilling found a solution of three reactions after 12 iterations lasting 13 seconds, the first two reactions being the same as the MILP solution plus the following reaction:
The third reaction appears to be redundant because its flux, as given by FastGapFilling, is 0.00056, whereas the fluxes of the other two reactions are the same at 0.110394. The flux of the third reaction is much lower than those of the first two reactions.
Indeed, the low-flux reactions suggested by FastGapFilling might simply be a way to increase the biomass and might not be reactions essential for growth. In general, this possibility can be verified by only adding the suggested reactions with relatively high fluxes and solving the resulting network. If the biomass can be generated, the low-flux reactions would be nonessential.
Fourth incomplete E. colimodel
The fourth incomplete model is the original E. coli model with 14 reactions removed. These reactions were selected because they produced at least one of the following metabolites: L-lysine, L-leucine, L-isoleucine, and L-histidine. All these metabolites participate in the biomass reaction. Essentially, this is an example where many biosynthesis pathways have been disturbed. The following reactions were removed.
The SCIP solver could not find an optimal solution to the MILP problem after running for 24 hours. However, FastGapFilling produced the following solution, after 12 iterations that took 14 seconds by the SCIP solver, by suggesting three reactions: branched-chainaminotransferileu-rxn, diaminopimdecarb-rxn, and histaldehyd-rxn. These three reactions are among the 14 reactions that were removed.
This example shows that FastGapFilling can be very useful in practice: no optimal or near optimal solution could be found after 24 hours using the MILP approach, whereas FastGapFilling quickly found a gap-filling solution.
Incomplete yeast model
As a last example of applying the FastGapFilling algorithm, we used a yeast model. As with the E. coli model, the original yeast model can grow. The original yeast model is based on the YeastCyc database (version 17.5). The model includes 1,454 enzymatic reactions, including the instantiated generic enzymatic reactions. The biomass reaction is composed of 41 metabolites. The growth media is composed of glucose, oxygen, ammonium, phosphate, sulfate, and iron. The secretions are carbon-dioxide, carbon-monoxide, formate, hydrogen-peroxide, glycolaldehyde, and water. An upper bound of 12 mmol/g/h was constraining the intake of glucose.
To generate an incomplete model, four reactions responsible for the biosynthesis of five lipids (ergosterol, zymosterol, episterol, fecosterol, and lanosterol) were removed. These five lipids are part of the biomass. The reactions removed are:
The first reaction (GPPSYN-RXN) appears in three different pathways (trans-farnesyl diphosphate biosynthesis, geranyl diphosphate biosynthesis, and hexaprenyl diphosphate biosynthesis), whereas each of the other reactions occurs separately in three other pathways.
The optimal MILP solution for gap-filling this incomplete yeast model, required 21,027 seconds or about 5:50 hours, and suggested adding the four reactions above. This set is the least number of reactions expected because MetaCyc (version 17.5) includes no other reactions to produce these lipids.
FastGapFilling found the same solution, after 12 iterations lasting 14 seconds using the SCIP solver — a much faster execution time when compared to MILP. The binary search of FastGapFilling also found other solution sets with up to 34 candidate reactions, but the smallest set included the exact four reactions that were removed.