In recent years, time course analyses of metabolic perturbations have become more important to understand metabolic networks based on experimental data [1, 2]. One way to analyze metabolic networks is by systematically investigating linear relationships between all analyzed metabolites (variables) followed by constructing networks from positively identified components, and eventually comparing network topologies [3] between different physiological or genetic conditions [4, 5]. Simulations of metabolic reactions have shown that even stochastic influences on metabolism may result in linear metabolic co-regulation because initial metabolic perturbations can be propagated and enhanced through the cellular biochemical network [6]. Such linear co-regulation of pairs of metabolites may point to changes in biochemical control (chemical equilibrium, mass conservation, asymmetric control distribution) as well to transcriptional regulation [7]. Variance in metabolite levels can be caused by three different factors:

(I) concentrations alter and hence increase variance due to intentionally changing the experimental conditions, for example by altering environmental parameters like external nutrients or by using different genotypes [8],

(II) metabolite data will found to vary in a stochastic manner caused by the imprecision of the analytical method [9] used for acquiring metabolite data and

(III) interestingly, even under very controlled environmental conditions, a high degree of biological variation is found for metabolite levels due to stochastic biological events that trickle through the biochemical network and thus reflect the underlying control structure at this particular biological condition [6].

Therefore, if enough biological replicates are analyzed for a given organism at a given physiological situation, the metabolic phenotype can be investigated not only by its corresponding average metabolic values, but also by a snapshot of its corresponding metabolic network. However, biologists often do not know the inherent biological variability in advance and hence tend to use just a few independent biological replicates based on preliminary power analysis. Resulting data may be sufficient to estimate arithmetic means of metabolic levels but do not enable analyzing the linear control structure between different biological conditions. One of the challenges for calculating linearity networks is to compute the likelihood or significance of the presence of a truly linear relationship, with the aim of excluding both false negative and false positive detections of linearities.

Estimating optimal linearity parameters has been solved decades ago for cases, for which linear dependence of variables could be reasoned based on background knowledge. However, in metabolic data sets, the control structure of metabolites is unknown *a priori*. Therefore, two fundamental questions need to be answered:

(a) For which pairs of variables can a linear relationship be hypothesized?

(b) Are there sub sets of data that reflect differences in linear behavior of variables? For example, linearity may be given for only a group of data but absent in another group, or the linearity parameters between these groups may be different.

An unbiased analysis of linear relationships between pairs of variables needs to test whether there is one or more valid linear hypotheses that could explain data in complex data sets. This procedure defines a novel approach for testing biological data: instead testing pre-defined hypotheses [

10], the likelihood of hypotheses is calculated that may be used to explain complex process. This hypothesis-building is fundamentally different from estimating the best parameters of an assumed linear relationship by regression equations [

11–

13] for which various software packages exist, and solutions for estimating parameters for multiple linear relationships[

14]. However, regressions do not test the probability of the presence of linear relationships, especially in high-dimensional data sets. Instead, regressions are founded on the presence of linearity that is justified by background knowledge. In biochemistry, the existence of linear relationships cannot generally be assumed trivial but must receive thorough statistical evaluations. In addition, regressions usually do not account for technical errors [

15] that are critical in practice. All measurements comprise technical errors which are due to inadequacy of the total chemical-analytical method, specifically the extraction, sample preparation and the instrumental data acquisition. Hence, the degree of technical errors will vary between the chemical nature of the metabolites, their absolute concentrations and influences of different sample matrices. Furthermore, outliers and missing data further obscure detection of linear hypotheses. For regressions, on the other hand, the impact of outliers has been studied extensively, and multiple measures to assess and weigh the influence of outliers have been developed. Assumptions on the degree of technical errors may further refine weighting factors in regression analysis, and such factors can be optimized for example using the EM-algorithm. Nevertheless, regression is not an explorative tool for data analysis. Additionally, metabolomic data do not distinguish between dependent and independent variables. All variables are subject to varying degree of noise (analytical-chemical measurement errors, i.e. technical errors). No controlled observations of supposedly independent variables can be acquired [

16,

17]. Consequently, the control structure of metabolic linearity networks can only be assessed with a tool that solves the following tasks:

- (1)
Linear relationships must be detected in an unbiased and observer-independent manner.

- (2)
Sub sets of data need to be grouped according to presence of (multiple) linear relationships.

- (3)
Criteria have to be applied that verify linear hypotheses based on test statistics.

- (4)
Technical errors: varying degree of analytical-chemical measurement errors and missing data have to be accounted for.

Especially, the potential presence of multiple linear relationships and independence of both variables poses problems for simple regression analyses. As a substitute for regression, the degree of correlation has been used for detecting linear relationships despite the fact that correlation only relates the covariance to the total variance, but does not verify genuine linearities. Moreover, Pearsons' correlation coefficients lack robustness against outliers, especially for multivariate datasets, and a number of different approaches have been suggested to link estimates to better test statistics [18]. In practice, however, empirical or heuristic thresholds are taken to distinguish strong or weak correlations, but no mathematical basis exists on which such thresholds can safely be founded. In some cases, Student's statistics *p*-values have been taken in an effort to validate Pearson's correlations [19]. Unfortunately, such *p*-values only describe the significance of the non-randomness of data pairs but do not test hypotheses if data pairs can be described by a (single or multiple) linear functions. Consequently, correlation networks based on Pearsons correlations may be strongly distorted [20].

A further approach has been taken using partial correlations that deconvolute contributions by additional parameters in order to reduce the list of correlations to basic dependencies [21] which may present a link from correlation to causality [22, 23]. This method is valuable to investigate the control structure within a given correlation network but it does not remove the principle robustness problem of correlation estimates. Simple correlations coefficients always decline with increasing variance that is introduced by method errors during data acquisition. In contrary, partial correlation coefficients may be increasing, decreasing or even change the algebraic signs with increasing method errors [20]. In order to remedy this situation, scientists tend to select high Pearson's correlation thresholds [24] which imply that the variance caused by method errors is small in relation to the biological variance. The latter assumption is often true when comparing widely different metabolic phenotypes such as certain mutant genotypes, or severe stress conditions such as acute (metabolic) diseases in comparison to healthy states. However, metabolic theory predicts that even incremental changes in enzymatic properties can have large effects on metabolic control, especially when multiple enzymes are affected [25]. Such changes might be too subtle to cause large differences in average concentrations but would still effect the pathway control structure and hence, linearities in pairs of metabolite data. Consequently, the metabolic control structure can only be assessed with a robust tool for linearity detection.

We here present a different approach. Using the Bayesian law [26], a likelihood formula is derived that is based on information about the measurement error using a specific technical method. This formula is then transformed in way that allows searching for local maxima of linear parameters within the total hypothesis space. Such likelihood maxima are subsequently assessed for residuals of the corresponding linearity parameters using simulated test statistics. We demonstrate the power of this approach using a synthetic data set with a given set of true linear relationships which are subsequently subjected to both increasing technical errors and increasing number of samples.