### Establishing the network

Metabolic reactions were extracted from the EcoCyc database (Version 9, [33]). A graph was established by defining neighbours of metabolites. Two metabolites were neighbours if and only if an enzymatic reaction existed that needed one of the metabolites as input (needed substrate) and produced the other as output (product). Note, that in this representation, enzymes are edges and metabolites the nodes. This network was clustered to group enzymes into parts of the network with their major connections (the clustering algorithm is described below, see section "The clustering method"). The clustering algorithm produced a symmetrical sub-matrix of the cluster matrix for each cluster, whose rows and columns were the metabolites. The matrix contained a "1" entry at position (i, j) if an enzyme existed that combined metabolites of row i and column j. Otherwise a "0" entry was set (Figure 7).

### Mapping gene expression data onto the cluster-matrices

For our case study, we collected raw intensity values of gene expression data from the work of Covert et al. [15] which we downloaded from the ASAP database [34]. Covert et al. determined mRNA levels of all open reading frames by hybridisations on Affymetrix oligo microarrays. We normalised them with an established variance normalisation method [35] and selected the data for 43 hybridisations of the following samples: strain K-12 MG 1655, wild-type, Δ*arcA, ΔappY, Δfnr, ΔoxyR, ΔsoxS* single mutants and the Δ*arcAΔfnr* double mutant (for generation of the knockouts and growth conditions, see [36, 37], respectively). The mutated genes are key transcriptional regulators of the oxygen response [15]. They effect a major portion of all genes in *E. coli* and therefore supported a variance stimulation of the respiratory and fermentative control of the investigated strain. All gene expression experiments were done in triplicate under aerobic and anaerobic conditions, respectively, except for anaerobic wild-type which was repeated four times. The gene expression data of each data-set was mapped onto the corresponding reactions of the transcribed proteins. Mean values were taken if a reaction was catalysed by a complex of proteins. The expression data of all samples was mapped onto each cluster-matrix, yielding 43 different patterns for each cluster.

### Pattern discovery: defining the features with the Haar wavelet transform

We wanted to calculate a value for every possible expression pattern of neighbouring genes and groups of genes within a cluster that may show essential differences between samples of different conditions. Therefore, we performed a Haar-wavelet transform for each cluster-matrix. The wavelet transformed expression values served as features for the classifier (classification method, see next section). This allowed the identification of regions with a varying pattern between aerobic and anaerobic conditions. The wavelet-transformation is described in the following. Each cluster-matrix was divided into 2 × 2 pixeled disjoint sub-sections (e.g. a cluster matrix of size 8 × 8 was divided into 16 sub-sections). Clusters with non-fitting sizes (e.g. 3 × 3, 5 × 5,) were extended with rows and columns of zeros to yield matrices that could be divided into 2 × 2 pixeled sub-sections. For each sub-section, all combinations of row-wise and column-wise mean and differences, respectively, were calculated. This yielded 4 combined values for each 2 × 2 pixeled sub-section: 1st: mean of the mean of the upper and mean of the lower row, 2nd: difference of the mean of the upper and the mean of the lower row, 3rd: mean of the difference of the upper and the difference of the lower row, and, 4th: difference of the difference of the upper and the difference of the lower row. All four combined values for each 2 × 2 pixeled sub-section were stored and applied as features for the classifier. This was done for all sub-sections of the matrix. All 1st combined values (mean of means) were taken for a new matrix and were again grouped into 2 × 2 fractions that were combined in the same manner, yielding again 4 new features for every fraction. This procedure was repeated until no further grouping was possible. Such a "Haar" wavelet transform can be regarded as a low pass filter when calculating the mean, and a high pass filter when calculating the difference between neighbouring value pairs. The transform applied a filter in horizontal and subsequently in vertical direction. The procedure consisted of repeatedly applying high and low pass filters on the image. Therefore, either high frequency or low frequency portions of the signal were calculated and stored, until the maximal possible compositions were obtained. This procedure was carried out for all clusters of every sample and the results of the transforms were stored as the corresponding features for every sample.

### Extracting essential features and their sub-graphs with the classifier

The SAM method [16] as a modified t-test was performed to rank the features according to their p-values. Higher ranking features (low p-values) were selected focusing the classifier on the most relevant patterns (9,996 out of 70,912). For classification, we applied the Support Vector Machine implementation as provided by the R MCRestimate package [17]. To receive a suitable feature extraction result, a 10-fold cross validation was performed and repeated 10 times with different splittings of the data, respectively. A linear kernel was applied for the feature extraction as described elsewhere [17]. Parameter optimisation was performed for the regularisation term that defined the costs for false classifications (9 steps, range: 2^{n}, n = -4, -2,, 8, 10). This optimisation was realised by an internal three-fold cross validation during every iteration. To determine the most relevant features, a recursive feature elimination [17] was applied during the parameter optimisation procedure. This yielded a set of discriminating features for every run. These features were ranked due to their selection frequency of all 100 runs. Note, that high-ranking features yielded the corresponding sub-graphs (cluster of the cluster matrix) of the reaction network that contained well discriminating patterns of the expression data. We defined a cut-off criterion for selecting only substantial features by comparing the selection frequency of each feature with random selections. We assumed a binomial distribution, neglecting the cases that the same feature may have been chosen twice in one run. The overall number of drawings was the sum of all selections (8,191 selections). The probability to draw the respective feature was the reciprocal value of the number of all features (1/9,996). The number of drawings for the respective feature was its selection frequency. As we calculated this for every feature, the resulting p-values where corrected for multiple testing by multiplying them with the number of all features (Bonferroni correction [18]).

### The clustering method

The metabolic network was represented by an adjacency matrix: An entry at row i and column j was set to "1" if there existed a common reaction downstream of metabolite i and upstream of metabolite j or *vice versa*. Note, that for the pattern discovery method described above, this entry was the corresponding gene expression value of the reaction. The adjacency matrix consisted of 2,754 rows and columns and 2 × 17,233 non-zero entries (entries (i, j) and (j, i) equaled). Hence, non-zero entries were rare (≈ 0.45%). The matrix was transformed to obtain regions enriched with non-zero elements by the following clustering method.

Given the metabolic network as graph G(V,E) with node set V (metabolites) and edge set E (reactions), our goal was to identify clusters of G where each cluster was given by the node set of a highly connected sub-graph. Note, that we did not require the clusters to be mutually disjoint. The clustering algorithm based on the so-called *weighted simultaneous consecutive ones problem* .Clusters were represented by symmetric 0/1-matrices. Rows and columns of the matrix corresponded to the nodes of G. A "1" entry in position (i, j) indicated that there was at least one cluster containing both node i and node j. A "0" entry indicated that there was no cluster containing both nodes. The diagonal entries were fixed to "1" (Figure 7). We call such a matrix *cluster-matrix*. Using a suitable permutation to rearrange both the rows and the columns of the cluster-matrix, one can obtain a matrix with consecutive "1" entries in every row as well as in every column. Note, that a matrix has the *simultaneous consecutive ones property (SCO)*,if such a rearrangement of rows and columns is possible. A characterisation of such matrices was given by Tucker [27] and there is a linear time algorithm for checking this property using the PQ-Tree algorithm introduced by Booth and Lueker [28]. This algorithm outputs a data structure from which all node permutations that establish the SCO property can be generated. The basic task was to convert the adjacency matrix (A) of G into a cluster-matrix X containing only "1" in the clusters. Altering an entry of the adjacency matrix ("0" to "1" flip) was allowed though penalised. The main part of this process was to search for simultaneous permutations (rows and columns) that led to a minimum number of "0" to "1" flips. Figure 8 shows an example. We parameterised the objective function using a negative parameter d for adjusting the ratio between rewarding an edge inside a cluster and penalising a "0" to "1" flip, i.e. for placing two nodes that were not connected inside the same cluster. Formally, given a symmetric adjacency matrix A^{n x n} with a_{ij} ∈ {0, 1} and a_{ii} = 1 ∀ i, j = 1...n, we wanted to maximise the objective function

{c}^{X}:={\displaystyle {\sum}_{ij}{c}_{ij}{x}_{ij}},\left(1\right)

where the coefficients c_{ij} were defined by

{c}_{ij}:=\{\begin{array}{lll}1\hfill & if\hfill & {a}_{ij}=1\hfill \\ d\hfill & else\hfill \end{array}.\left(2\right)

X = (x_{ij}) denoted the cluster-matrix of A to be optimised. As a starting point, X equaled A with all diagonal entries fixed to "1". The goal was to transform X by simultaneous column- and row-permutations yielding a matrix that consisted of quadratic sub-matrices with "1" entries and "0" entries else.

Note that, as we allowed missing connections inside a cluster, non-connected nodes could still enter the same cluster. The objective function was penalised by a negative value d for every c_{ij} inside a cluster of X that was a "0" entry in A. This avoided clusters with too many non-existing connections in it. If the nodes i and j were not connected by an edge (i.e. a_{ij} = 0) but assigned to the same cluster (i.e. x_{ij} = 1), the negative parameter d reduced the objective function value c^{X}. Note that, as the number of inequalities for describing the SCO grows exponentially with the input elements, it got computationally too hard to solve the integer program exactly for this study. Therefore, we designed a fast heuristics based on algorithms for the consecutive ones problem. Having as an input the adjacency matrix A of our graph G to obtain the clustering matrix X^{opt}, the heuristics worked as follows:

1 Fix all diagonal entries a_{ii} of A to 1

2 Set c^{Xopt} = - ∞

3 Define X^{opt} to have 1-entries only in the main diagonal

4 Compute the coefficient matrix C = (c_{ij}) defined by c_{ij}: = 1 if a_{ij} = 1 and c_{ij}: = d if a_{ij} = 0

5 FOR i = 1,..., k DO

5.1 Initialise a random permutation π_{i} of X

5.2 FOR j = 0, ..., l DO

5.2.1 Use dynamic programming to compute an optimal cluster matrix X for this fixed permutation π_{i}

5.2.2 IF c^{X} > c^{Xopt}

5.2.2.1 Update X^{opt} and c^{Xopt}

5.2.3 Use the PQ-Tree-Algorithm to select randomly a permutation from all permutations that keep the found clusters together

6 Try to improve X^{opt} by m iterations of loop 5.2

The choice of the loop parameters depended on the size of the input and was chosen as k = 250, l = 10 and m = 100. The two loops were independent of each other. The outer loop 5 restarted the main part of the heuristics (loop 5.2) for several different start permutations, always saving the best result found.