A principal components method constrained by elementary flux modes: analysis of flux data sets

Background Non-negative linear combinations of elementary flux modes (EMs) describe all feasible reaction flux distributions for a given metabolic network under the quasi steady state assumption. However, only a small subset of EMs contribute to the physiological state of a given cell. Results In this paper, a method is proposed that identifies the subset of EMs that best explain the physiological state captured in reaction flux data, referred to as principal EMs (PEMs), given a pre-specified universe of EM candidates. The method avoids the evaluation of all possible combinations of EMs by using a branch and bound approach which is computationally very efficient. The performance of the method is assessed using simulated and experimental data of Pichia pastoris and experimental fluxome data of Saccharomyces cerevisiae. The proposed method is benchmarked against principal component analysis (PCA), commonly used to study the structure of metabolic flux data sets. Conclusions The overall results show that the proposed method is computationally very effective in identifying the subset of PEMs within a large set of EM candidates (cases with ~100 and ~1000 EMs were studied). In contrast to the principal components in PCA, the identified PEMs have a biological meaning enabling identification of the key active pathways in a cell as well as the conditions under which the pathways are activated. This method clearly outperforms PCA in the interpretability of flux data providing additional insights into the underlying regulatory mechanisms. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1063-0) contains supplementary material, which is available to authorized users.

which is equal to what the Gram Schmidt orthonormalization for the normalized EMs would yield in case that the weight values of ! !,! and ! !,! are greater than zero. The same can be shown for more factors. The orthonormalization of the elementary modes in the proposed method is implicitly achieved by the subtraction of the captured variance from the data, Eq.
(A3), and subsequent analysis of the variance remaining in the data, Eq. (A4,) as in principle component analysis. This is an important feature, as this means that: i) the calculated contributions can simply be summed since they are independent; ii) the iteratively calculated variances for the EMs can be summed (see also Eq. (B6)), wherefore: which allows us to simplify equations (7) resulting in equation (8), see paper; iii) allows us to assess how much information two EMs share, see following.

B: Analysis of the interference between two successive EMs
The flux patterns which are captured by the combination of the two normalized EMs, ! !,! and ! !,! , is described by the sum: When inserting the relations presented before and rearranging If the inner vector product of ! !,! ! ⋅ ! !,! would be zero then the contributions would be independent, i.e.: Wherefore the last term in equation (B 6) would vanish. Otherwise the subtracted term is the information, which both EMs share and, if it would not be subtracted, would contribute two times to the summation of flux pattern. Note that it is exactly the contribution of this term, which is eliminated by the Gram-Schmidt approach through orthonormalization and in the proposed algorithm in implicitly accounted for by the subtraction of the captured variance from the data, Eq. (A3), and subsequent analysis of the variance remaining in the data, Eq. (A4). The decompositions of ! !"# using ! ! and then ! ! are thus independent!

C: Pichia pastoris Simulation Case
This simulation case study is based on the metabolic network of Pichia pastoris proposed by Tortajada et al [1] and the provided 98 EMs are herein used for flux data generation. Twelve different experimental conditions were simulated, see Table 1. Considering all of the different conditions the EMs were filtered employing the following rational. If a compound is not present and cannot be produced then its contribution in the EM must be zero. If a compound is present and is consumed then it cannot be secreted at the same time. The reduced set of EMs possibly active considering these scenarios comprised 76 EMs. The rank of the reduced set of EMs was analyzed (rank=20) and from the reduced set a subset of EMs was chosen that spans the space of (provides a basis to) the reduced set (20 EMs). For each scenario only those EMs (out of the subset of the reduced set) that can be active were assumed to contribute randomly to the flux pattern, but obeying to the minimum and maximum values of the fluxes given in Tortajada et al [1]. In total only 16 EM were active, namely EMs=[1, 3, 7, 12, 13, 14, 16, 19, 20, 22, 23, 24, 28, 32, 33, 37].

Active Elementary Modes
In the following ten pages, the active Elementary Flux Modes selected by PEMA with ten factors are highlighted in different colors in the metabolic network, which was adapted from Hayakawa et al [2].