 Research
 Open access
 Published:
Analyzing postprandial metabolomics data using multiway models: a simulation study
BMC Bioinformatics volume 25, Article number: 94 (2024)
Abstract
Background
Analysis of timeresolved postprandial metabolomics data can improve the understanding of metabolic mechanisms, potentially revealing biomarkers for early diagnosis of metabolic diseases and advancing precision nutrition and medicine. Postprandial metabolomics measurements at several time points from multiple subjects can be arranged as a subjects by metabolites by time points array. Traditional analysis methods are limited in terms of revealing subject groups, related metabolites, and temporal patterns simultaneously from such threeway data.
Results
We introduce an unsupervised multiway analysis approach based on the CANDECOMP/PARAFAC (CP) model for improved analysis of postprandial metabolomics data guided by a simulation study. Because of the lack of ground truth in real data, we generate simulated data using a comprehensive human metabolic model. This allows us to assess the performance of CP models in terms of revealing subject groups and underlying metabolic processes. We study three analysis approaches: analysis of fastingstate data using principal component analysis, T0corrected data (i.e., data corrected by subtracting fastingstate data) using a CP model and fulldynamic (i.e., full postprandial) data using CP. Through extensive simulations, we demonstrate that CP models capture meaningful and stable patterns from simulated meal challenge data, revealing underlying mechanisms and differences between diseased versus healthy groups.
Conclusions
Our experiments show that it is crucial to analyze both fastingstate and T0corrected data for understanding metabolic differences among subject groups. Depending on the nature of the subject group structure, the best group separation may be achieved by CP models of T0corrected or fulldynamic data. This study introduces an improved analysis approach for postprandial metabolomics data while also shedding light on the debate about correcting baseline values in longitudinal data analysis.
Background
Postprandial metabolomics data (also referred to as meal challenge test data) includes the metabolic transition from the fasting to the fed state. Analysis of such data can reveal the underlying biological processes and improve the understanding of metabolic mechanisms [1, 2]. Examples include the study of a highfat meal response for over one thousand subjects, which revealed subclass patterns in lipoprotein metabolism [3]. Analysis of postprandial metabolomics data also holds the promise to reveal new biomarkers especially for cardiometabolic diseases [4, 5]. In addition to detecting biomarkers and advancing prediagnosis, studying the postprandial state has proven useful in designing personalized nutrition [6]. Because of the high variability between individuals in terms of their postmeal glucose and triglyceride levels, personalized dietary interventions can be considered to lower individual postmeal glucose, thus may help decrease the risk of diseases such as prediabetes [6, 7].
There are various methods available for analyzing meal challenge test data. These are mainly supervised approaches, i.e., assuming that the group information (e.g., labels such as healthy and diseased) is known and incorporated into the analysis. In the aforementioned and other related work, univariate analyses are the most used tools for analyzing postprandial metabolism, e.g., the repeated measures analysis of variance (ANOVA) method is used to test group differences [3, 5]; a linear mixed model (LMM), which adds the random effect in the analysis, is used to explore the time and group interaction effect [8]. Univariate methods are simple and powerful in terms of studying group differences; however, such analyses are performed per metabolite, thus unable to reveal the relation between metabolites. Since metabolites are interlinked via pathways and hold dependent or independent relations with other metabolites, multivariate methods are promising in terms of capturing the underlying patterns in the data [4, 9]. Supervised multivariate methods used to analyze metabolomics data from challenge tests often combine ANOVA or LMM with principal component analysis (PCA). Such methods include ANOVASimultaneous Component Analysis (ASCA)[10], ANOVAPCA [11], ANOVAPartial Least Squares (PLS) [12], and their extensions [13, 14]. While these methods are suitable for timeresolved data, they rely on a priori known group information.
When the goal is exploratory analysis to reveal unknown stratification of subjects such as subgroups among healthy or diseased subjects, the workhorse data analysis approaches are unsupervised methods based on PCA and clustering [9]. However, these approaches cannot exploit the threeway structure of postprandial metabolomics data. They rely on either measurements at one time point or summaries such as clusters of time profiles obtained using data averaged across subjects [2, 15]. An effective way to preserve the multiway nature of the data and extract the underlying patterns is to use tensor factorizations [16,17,18], which are extensions of matrix factorizations such as PCA to multiway arrays (also referred to as higherorder data sets).
In this paper, we arrange timeresolved metabolomics data as a threeway array with modes: subjects, metabolites, and time, and use the CANDECOMP/PARAFAC (CP) [19, 20] tensor model to reveal the underlying patterns. The CP model summarizes the threeway array as a sum of a small number of factor triplets as in Fig. 1. Among various tensor factorization methods, we use the CP model because of its uniqueness properties [16, 21], which facilitate interpretability of the extracted factors. In addition, the CP model is less sensitive to noise due to the concise structure (parsimony) of the model [22]. The CP model has previously been mentioned as a promising analysis tool for meal challenge test data [4, 9], but no analysis results have been presented for the threeway data arranged as in Fig. 1. Recently, the effectiveness of the CP model in terms of analyzing other types of longitudinal data has been studied, e.g., using gut microbiome data from infants studying microbial changes and how those relate to the birth mode [23], urine metabolomics data from newborns exploring the use of the CP model for compositional data [24] and simulated dynamic metabolomics data relying on smallscale metabolic pathway models [25]. Unlike previous studies, we provide a comprehensive study of the CP model for analyzing timeresolved metabolomics data, in particular, focusing on the human metabolism in response to a meal challenge test and studying metabolic changes from a baseline state to dynamic states.
In the literature, analysis of fulldynamic data (i.e., full postprandial data without baseline correction) as well as T0corrected data (i.e., the data corrected by subtracting the fasting state, similar to the method of analysis of changes [26]) have been considered [1, 2, 5, 27, 28]. Note that the fulldynamic data has information from the fasting as well as the T0corrected (puredynamic) state. Understanding metabolic differences between these two states can help understand the metabolic mechanisms for postprandial metabolomics data. In this paper, we explore whether the CP model should be applied to T0corrected data or fulldynamic data for postprandial metabolomics data analysis. Furthermore, we investigate the added value of analysis of the postprandial state compared to the fasting state.
In order to assess the performance of analysis approaches in terms of revealing the underlying patterns in the data, we generate simulated postprandial metabolomics data with known groundtruth information. To mimic the evolution of metabolite concentrations during a meal challenge, we used a human metabolic model [29] that describes metabolic pathways in eight organs in mechanistic detail by enzyme kinetic equations. Variation of diseaserelevant parameters generated different subject groups (i.e., control versus diseased). Through extensive computational experiments, we assess the performance of analysis of fastingstate data using PCA, T0corrected data using a CP model and fulldynamic data also using a CP model. We demonstrate that (i) CP models of postprandial metabolomics data reveal meaningful underlying patterns, (ii) for understanding metabolic differences among subject groups, it is crucial to analyze both fastingstate and T0corrected data, (iii) depending on the nature of the subject group structure, the best performance in terms of revealing subject groups may be achieved by CP models of T0corrected or fulldynamic data, and (iv) patterns extracted by CP models are reliable (i.e., consistently observed in a number of different settings such as when subsets of subjects are leftout, in the presence of a higher level of withingroup variation, and with an unbalanced number of control and diseased subjects).
Materials and methods
Simulated postprandial metabolomics data
Human wholebody metabolic model
To generate simulated postprandial metabolomics data, we consider a human wholebody metabolic model proposed by Kurata [29]. The model is defined by a set of ordinary differential equations, with metabolites as variables and kinetic constants as parameters. Consisting of 202 metabolites, 217 reaction rates and 1,140 kinetic parameters, this multiscale and multiorgan model is the largest, comprehensive, and highly predictive kinetic model of the wholebody metabolism [29]. It describes each reaction with a reversible MichaelisMenten type rate equation and includes the action of insulin and glucagon, key hormones that govern the response to a meal (Fig. 2). The metabolic model considers both intracellular and extracellular metabolites. The meal in this simulated model only considers glucose and fat, where 87 grams (g) carbohydrate and 33 g fat is given after a 10hour fasting. Although the metabolic model provides two parameters to regulate glucose and fat intake, we observe no apparent improvement in comparison with the real data for different amounts of glucose and fat intake.
Generation of the simulated data
To test whether our proposed unsupervised approach is able to reveal different subject groups, we generate simulated blood metabolite concentrations using two types of variation: (i) betweengroup variation and (ii) withingroup variation. The betweengroup variation is introduced by changing a specific parameter, i.e., Km_Ins_B_M or Km_inssyn_Glc_B, in the differential equations, as demonstrated in [29]. These two parameters are used to regulate the insulinstimulated glucose uptake in skeletal muscle or the glucosestimulated insulin secretion by the pancreas, respectively. Each type of betweengroup variation mimics one disease, which is as follows [29]:

Insulin resistance in skeletal muscle: It is used for the study of type 2 diabetes mellitus and simulated by multiplying the default Km_Ins_B_M with 1.5,

Betacell dysfunction: Its extreme case is type 1 diabetes mellitus and it is simulated by multiplying the default Km_inssyn_Glc_B with 1.1.
For the withingroup variation, i.e., individual variation, we randomly perturb the kinetic constants in the liver (see Additional file 1 for a list of perturbed parameters). The perturbation level is denoted by \(\alpha\). For example, \(\alpha =0.2\) indicates that related kinetic constants are set to be random parameters ranging from \((10020)\%\) to \((100+20)\%\) of their default values.
Based on these definitions of between and withingroup variations, we generate data for each subject as follows:

1.
Get the 10hour fasting concentrations for each individual (one individual has one set of random kinetic constants) by running the human wholebody metabolic model with the default initial values of model variables in [29], with an exception of the initial concentrations for particular metabolites presented in the next paragraph;

2.
Start the meal challenge, i.e., run the human wholebody model using each individual’s 10hour fasting state, and take the concentrations of metabolites at time points \(t=[0, 0.25, 0.5, 1, 1.5, 2, 2.5, 4]\) hours (with \(T_0=0\)h, \(T_1=0.25\)h, \(\cdots\), \(T_7=4\)h).
We choose the above time points to match the time samples in the real data, which we use to assess how realistic the simulations are (see section “Simulated postprandial metabolomics data are realistic”). In step one, we set initial values of Insulin (Ins) and blood metabolites Glucose (Glc), Pyruvate (Pyr), Lactate (Lac), Alanine (Ala), \(\beta\)hydroxybutyrate (Bhb), Triglyceride and total Cholesterol to the median of fasting concentrations in the real data. Although the metabolic model involves 202 metabolites (including hormones) in different organs, we only consider hormone Ins and blood metabolites Glc, Pyr, Lac, Ala and Bhb since they are also measured in the real data. In addition, they are involved in the same metabolic network (see Fig. 2), which makes the comparison of real and simulated data easier. When simulating data for a diseased subject, we use the parameter values given for insulin resistance and betacell dysfunction; otherwise, default values for parameters Km_Ins_B_M and Km_inssyn_Glc_B are used to simulate data for control subjects.
The simulated data from multiple subjects is arranged as a threeway array (\(\#\) subjects \(\times\) 6 metabolites \(\times\) \(\#\) time points) as in Fig. 1.
Simulated postprandial metabolomics data are realistic
We compare the simulated data with the real data to demonstrate how realistic the simulations are. We use the real data corresponding to Nuclear Magnetic Resonance (NMR) spectroscopy measurements of plasma samples collected during a meal challenge test from the COPSAC\(_{2000}\) cohort [30]. The real data is not publicly available but may be shared by COPSAC through a collaboration agreement. The study was conducted in accordance with the Declaration of Helsinki and was approved by The Copenhagen Ethics Committee (KF 01289/96 and H16039498) and The Danish Data Protection Agency (2015413696). Written consent was obtained from the participants. The cohort considered in this work consists of 299 healthy 18yearold subjects (144 males and 155 females). The blood samples were collected at eight time points following an overnight (at least 9hour) fasting during the fasting and postprandial states, i.e., at \(t=[0, 0.25, 0.5, 1, 1.5, 2, 2.5, 4]\)h. Over two hundred features were measured. We select the six features mentioned in section “Generation of the simulated data” since these are the ones available in both real and simulated data.
Time profiles of metabolites in simulated and real data match well for most metabolites except for Pyr and Lac, where we noticed deviations (See Additional file 2: Fig. S2.1). The simulated Pyr differed most from the standard reference range [0.04, 0.1]mmol/L. To get a better correspondence for the initial concentrations, we performed a sensitivity analysis on the model and tuned the parameters related to Pyr and Lac (see Additional file 2: Fig. S2.2). After parameter tuning, all initial (fasting) concentrations, except Lac, corresponded well between the real and simulated data (see Fig. 3). For Lac, we note that also in [29] the measured concentration of Lac is higher than the simulated value (1 versus 0.5 mM). In our subjects, the Lac concentration is even higher, suggesting a difference between the cohorts. We do not necessarily expect a perfect correspondence between the real and simulated data throughout the time course since the simulated meal (taken from [29]) differs from the real meal (see Additional file 2: Table S2.1). In addition, the large individual variability in the real data puts the discrepancy between the real and simulated data in perspective, as demonstrated in Fig. 3. The simulated model is realistic for our study in the sense that we observe that responses of key metabolites upon the meal challenge are in the physiological range, which is what we are interested in. Changing the metabolic model to simulate a mealintake similar to the real data is outside the scope of this study.
CANDECOMP/PARAFAC (CP) model
The CP model, which stands for Canonical Decomposition (CANDECOMP) [20] and Parallel Factor Analysis (PARAFAC) [19], stems from the polyadic form of a tensor [32]. Similar to the matrix Singular Value Decomposition (SVD), the main idea of the CP model is to represent a tensor as the sum of a minimum number of rankone tensors (Fig. 1). The CP model has been successfully used in many disciplines, including data mining [17, 33], neuroscience [34, 35], chemometrics [22], and recently also in longitudinal omics data analysis [23] in terms of revealing the underlying patterns from complex data sets.
For a thirdorder tensor \({{\mathscr {X}}}\in \mathbb {R}^{I\times J\times K}\), an Rcomponent CP model represents the tensor as follows:
where \(\circ\) denotes the vector outer product, \({{\mathbf {{A}}}} = [{{\mathbf {{a}}}}_{1} \hspace{0.05in}... \hspace{0.05in} {{\mathbf {{a}}}}_{R}] \in {\mathbb R}^{I \times R}\), \({{\mathbf {{B}}}}= [{{\mathbf {{b}}}}_{1} \hspace{0.05in}... \hspace{0.05in} {{\mathbf {{b}}}}_{R}]\in {\mathbb R}^{J \times R}\), and \({{\mathbf {{C}}}}= [{{\mathbf {{c}}}}_{1} \hspace{0.05in}... \hspace{0.05in} {{\mathbf {{c}}}}_{R}]\in {\mathbb R}^{K \times R}\) are the factor matrices corresponding to each mode. Columns of the factor matrices, e.g., \({{\mathbf {{a}}}}_{r}, {{\mathbf {{b}}}}_{r}\), and \({{\mathbf {{c}}}}_{r}\) corresponding to the rth component, reveal the underlying patterns in the data. When assessing how well the CP model fits the data, we use the explained variance (also referred to as model fit):
where \(\hat{\mathscr {X}} =\sum _{r=1}^{R} {{\mathbf {{a}}}}_{r}\circ {{\mathbf {{b}}}}_{r}\circ {{\mathbf {{c}}}}_{r}\) is the data approximation based on the CP model, and \(\left\ .\right\\) denotes the Frobenius norm. A fit value close to \(100\%\) indicates that data \({{\mathscr {X}}}\) is explained well by the model; otherwise, there is an unexplained part left in the residuals.
The CP model has been widely used in applications as a result of its uniqueness property, i.e., factors are unique up to permutation and scaling ambiguities under mild conditions, without imposing additional constraints such as orthogonality [16, 21]. This uniqueness property facilitates interpretation. In the CP model, each rankone component (\({{\mathbf {{a}}}}_{r}, {{\mathbf {{b}}}}_{r}, {{\mathbf {{c}}}}_{r}\)), i.e., rth column of factor matrices in all modes, can be interpreted, for instance, in terms of identifying groups of subjects (subjects with similar coefficients in \({{\mathbf {{a}}}}_{r}\)) positively or negatively related to specific sets of metabolites (metabolites with large coefficients in \({{\mathbf {{b}}}}_{r}\)) following certain temporal profiles (given by \({{\mathbf {{c}}}}_{r}\)). We normalize vectors (\({{\mathbf {{a}}}}_{r}, {{\mathbf {{b}}}}_{r}, {{\mathbf {{c}}}}_{r}\)) to unit norm when we interpret the factors. Due to the scaling ambiguity in the CP model, the norms can be absorbed arbitrarily by the vectors as long as the product of the norms of the vectors stays the same. In the presence of missing entries, which is common in applications, it is still possible to analyze such incomplete data using a CP model [36, 37].
The selection of the number of components (i.e., R) when fitting a CP model is a challenging task [16, 38], and an active research topic [39]. Among existing approaches, we make use of the model fit and the core consistency diagnostic [40]. In addition, we choose the number of components based on the interpretability of the model and replicability of the factors [39, 41] in subsets of the data (see Additional file 3: for details about the selection of number of components).
The performance of the CP model does not depend on the data size, i.e., the number of dimensions in each mode: I, J, K. If the data follows a CP structure, i.e., a sum of rankone tensors, the CP model can reveal the underlying patterns even with data from a few subjects. For instance, fluorescence spectroscopy measurements of mixtures can be arranged as a thirdorder tensor with modes: mixtures, emission wavelengths, and excitation wavelengths. When such data is analyzed using a CP model, each rankone component can capture one of the chemicals in the mixtures, and has done so successfully with only few, e.g., five, mixtures [22]. However, there are still challenges that will be encountered in the case of small data set size in any mode. First, the number of factors that can be uniquely extracted from the data will be limited [16, 21] as uniqueness conditions depend on the number of dimensions in each mode. Furthermore, using replicability to determine the number of components will not be possible with a small number of samples (that holds for a resampling strategy for any model). Finally, in the time mode, it is important to have enough time points to get temporal profiles.
Analysis approach
We analyze each postprandial metabolomics data set using three methods: (i) PCA of the fastingstate data, (ii) CP model of the T0corrected data, and (iii) CP model of the fulldynamic data. We compare the fastingstate analysis with T0corrected analysis to understand metabolic differences between these two states. We compare these three approaches to obtain a better picture in terms of subject group differences and underlying metabolic mechanisms. When assessing subject group differences, we apply the unpaired (twosample) ttest to each subject component and the null hypothesis is that the two groups come from independent random samples from normal distributions with equal means, without the assumption of equal variances. The main focus of the paper is to investigate metabolic differences associated with subject group differences during the fasting and puredynamic states, rather than striving for the best group separation.
Experimental setup and implementation details
In our experiments with simulated data, we consider the analysis of eight data sets generated using different types of betweengroup variation, different levels of individual variations, and balanced or unbalanced groups (see Table 1 for a complete list of data sets). In the balanced case, there are 50 controls and 50 diseased subjects; in the unbalanced case, there are 70 control and 30 diseased subjects.
Before the CP analyses, the threeway data is preprocessed by centering across the subjects mode (to remove the common intercept from all subjects) and then scaling within the metabolites mode (to adjust for scale differences in different metabolites), i.e., dividing by the root mean squared value of each slice in the metabolites mode (see [42] for details about preprocessing multiway arrays). Before PCA, the fastingstate data is preprocessed similarly, i.e., centered and scaled.
For fitting CP models, we use cpopt [43] and cpwopt [37] from the Tensor Toolbox version 3.1 [44] with the Limited Memory BFGS optimization algorithm. cpwopt uses weighted optimization [37] for fitting CP to incomplete data. For PCA, we use the svd function from MATLAB. Multiple random initializations are used to avoid local minima when fitting CP models. For the unpaired ttest, we use the ttest2 function from MATLAB.
When comparing CP models for different data sets, we use the factor match score (FMS) to quantify the similarity of factors in specific modes, e.g., the metabolites and time modes. FMS is defined as follows:
where \({{\mathbf {{b}}}}_{r}\), \({{\mathbf {{c}}}}_{r}\) and \({\hat{{\mathbf {b}}}}_{r}\), \({\hat{{\mathbf {c}}}}_{r}\) denote the rth column of factor matrices in the metabolites and time modes extracted by CP models from two different data sets (after finding the best permutation of the columns). FMS values are between 0 and 1, and an FMS value close to 1 indicates high similarity between the CP factors extracted from two different data sets.
All experiments are performed in MATLAB 2020a. Simulated data sets in Table 1 and example scripts are available in the GitHub repository for the paper: https://github.com/Lusource/projectofchallengetestdata.
Results
CP models extract similar factors from the simulated versus real meal challenge data
Since subjects in the real data are considered healthy, we expect that the CP model extracts similar patterns in the metabolites and time modes from real data (299 subjects) versus the simulated data with only the 50 control subjects (\(\alpha =0.2\)). Patterns extracted from the subjects mode are omitted since the subjects mode corresponds to different individuals in simulated versus real data. Fig. 4 shows that the 3component CP model of each data set reveals similar patterns from the metabolites and time modes, to a certain extent.
The first component (\({{\mathbf {{b}}}}_{1}\) and \({{\mathbf {{c}}}}_{1}\)) in Fig. 4 mainly models metabolites Pyr, Lac and Ala (i.e., metabolites with large coefficients) which share similar dynamic patterns, i.e., being accumulated and then consumed. It makes sense that these three metabolites stay close since they are tied to each other with reactions Pyr\(\leftrightarrow\)Lac and Pyr\(\leftrightarrow\)Ala as shown in the pathway in Fig. 2. However, we observe different peak heights and different time points for the highest peak in the time mode in real versus simulated data. The different peak heights are possibly because meal compositions are different in real versus simulated data. The different time points at the highest peak may be due to the fact that different individuals reach the highest peak at different time in the real data, and the first component of real data (with \({{\mathbf {{c}}}}_{1}\)) captures the temporal profiles of a group of subjects with the highest peak appearing at around 0.5h in metabolites Lac, Ala and Pyr (see Additional file 4 for details).
The second component (see \({{\mathbf {{b}}}}_{2}\) and \({{\mathbf {{c}}}}_{2}\) in Fig. 4) mainly captures Ins and Glc, which are accumulated and then consumed, after the meal challenge but at a different speed than Lac and Ala. However, we observe that the coefficient for Pyr is significantly different in real versus simulated data. For the simulated data, the coefficient of Pyr is almost zero since the time point at the highest peak for all individuals in the simulated data is the same for each metabolite, and Pyr reaches to the highest peak at different time point compared with Ins and Glc. On the other hand, in real data, there are several subjects with dynamic patterns of Pyr reaching the highest peak at the same time point compared with Ins and Glc (see Additional file 4 for more details). This results in a positive coefficient for Pyr in \({{\mathbf {{b}}}}_{2}\) in real data.
The third component (\({{\mathbf {{b}}}}_{3}\) and \({{\mathbf {{c}}}}_{3}\) in Fig. 4) mainly models Bhb, which behaves differently from other metabolites after a meal challenge, i.e., being consumed first and then getting accumulated. This behaviour has been observed in a consistent way in both real and simulated data.
Insulinresistant versus control group with low withingroup variation
In this section, we show that fasting and T0corrected analysis together reveal the metabolic differences between the fasting and dynamic states. Furthermore, we demonstrate that T0corrected analysis performs better than fulldynamic analysis in terms of revealing subject group differences.
Fastingstate analysis
PCA of the fastingstate data captures the subject group difference. The scatter plot of PC3 (the unpaired ttest using PC3 gives a pvalue of \(2 \times 10^{9}\)) and PC5 (pvalue of \(2\times 10^{7}\)) in Fig. 5 shows the best group separation. Note that PC3 and PC5 together explain only \(21\%\) of the data, indicating that the individual variation may dominate the total variation in the data. Figure 5 demonstrates that all metabolites except Ins and Glc are related to the subject group separation at the fasting state. Among them, Pyr contributes the most and is negatively related to the insulinresistant group (i.e., the insulinresistant group has mainly positive (negative) score values, and Pyr has a negative (positive) coefficient on PC3 (PC5)). This negative association indicates that the insulinresistant group has lower concentration of Pyr at the fasting state. This results from a lower insulinstimulated glucose uptake into the skeletal muscle in the insulinresistant group, leading to less conversion of glucose into pyruvate, which may be supported by the observations in the literature [45, 46].
T0corrected analysis
We choose the 4component CP model for the T0corrected data (see Additional file 3 for the selection of number of components). This model reveals the subject group difference, as shown in Fig. 6 (subjects mode). Among all components, the fourth component gives the best subject group separation (see the comparison of \({{\mathbf {{a}}}}_{4}\) with \({{\mathbf {{a}}}}_{1}\), \({{\mathbf {{a}}}}_{2}\) and \({{\mathbf {{a}}}}_{3}\)). In the metabolites mode (see \({{\mathbf {{b}}}}_{4}\)), metabolite Pyr has the largest coefficient, followed by Glc, while others have almost zero coefficients. In the time mode, \({{\mathbf {{c}}}}_{4}\) captures the dynamic behavior of mainly Pyr, i.e., it increases first and then slightly decreases. The increase of Pyr may be due to the glycolysis after the meal intake and its decrease afterwards may result from its conversion to other metabolites, e.g., Bhb, Lac and Ala, as shown in the pathway plot (Fig. 2). \({{\mathbf {{a}}}}_{4}\), \({{\mathbf {{b}}}}_{4}\) and \({{\mathbf {{c}}}}_{4}\) together indicate that the concentration of Pyr increases more in the insulinresistant group than in the control group; this suggests that the T0corrected Pyr is positively related to the insulinresistant group. Compared with \({{\mathbf {{a}}}}_{4}\), \({{\mathbf {{a}}}}_{1}\) and \({{\mathbf {{a}}}}_{2}\) reveal a weaker subject group separation due to the dynamic behavior of Lac & Ala (see \({{\mathbf {{b}}}}_{1}\) and \({{\mathbf {{c}}}}_{1}\)) and Ins & Glc (see \({{\mathbf {{b}}}}_{2}\) and \({{\mathbf {{c}}}}_{2}\)), respectively. Without capturing any group separation (see \({{\mathbf {{a}}}}_{3}\)), the third component reveals that metabolite Bhb (it has the largest coefficient on \({{\mathbf {{b}}}}_{3}\) while others have almost zero coefficients) responds differently compared to other metabolites after the meal intake (see \({{\mathbf {{c}}}}_{3}\) versus \({{\mathbf {{c}}}}_{1}\), \({{\mathbf {{c}}}}_{2}\) and \({{\mathbf {{c}}}}_{4}\)), i.e., it decreases first and then slowly increases, as also shown in an earlier study [47].
Fulldynamic analysis
Figure 7 demonstrates the 4component CP model of the fulldynamic data. This model captures both the fastingstate information (\({{\mathbf {{c}}}}_{4}\) reveals a constant temporal profile) and the T0corrected information (i.e., the first and second components). The first component, i.e., \({{\mathbf {{a}}}}_{1}\), \({{\mathbf {{b}}}}_{1}\) and \({{\mathbf {{c}}}}_{1}\), and the second component, i.e., \({{\mathbf {{a}}}}_{2}\), \({{\mathbf {{b}}}}_{2}\) and \({{\mathbf {{c}}}}_{2}\), reveal the positive relation of Lac, Ala & Pyr and Ins & Glc with the insulinresistant group and their dynamic response after the meal intake, respectively. The fourth component mainly captures the information from the fasting state, as evidenced by a nearly constant time profile in the time mode \({{\mathbf {{c}}}}_{4}\). At the fasting state, insulinresistant subjects have higher glucose concentrations [45, 46], which may result in lower concentrations of pyruvate since the conversion of glucose to pyruvate is less efficient due to insulin resistance. Such a negative association between pyruvate and insulinresistant subjects is captured by the fourth component, where we see negative coefficients for insulinresistant subjects on \({{\mathbf {{a}}}}_{4}\) and a positive coefficient for pyruvate on \({{\mathbf {{b}}}}_{4}\).
Comparison of three analysis approaches
Through three analysis approaches, we observe four underlying mechanisms related to (i) Pyr, (ii) Ins & Glc, (iii) Lac, Ala & Pyr, and (iv) Bhb.
Pyr is identified by all methods as an essential metabolite related to the subject group separation. However, fastingstate versus T0corrected analysis reveals a different relation between Pyr and the insulinresistant group. This, in turn, results in T0corrected analysis performing better than fulldynamic analysis in terms of revealing group differences (see the comparison of the subjects mode in Fig. 6 and 7). The best separation in T0corrected analysis is given by the fourth component, where Pyr has the dominant coefficient in the metabolites mode. The left panel of Fig. 8 shows that although the insulinresistant group has a lower concentration of Pyr at the fasting state, Pyr accumulates much faster in the insulinresistant group during the postprandial state (especially from 0.5h to 2 h). In this sense, subtracting the fastingstate data from each time slice of the fulldynamic data leads to a more evident group difference between control versus insulinresistant subjects, as demonstrated in the right panel of Fig. 8.
The other mechanisms, (ii) Ins & Glc, (iii) Lac, Ala & Pyr, and (iv) Bhb, are captured through similar patterns in both T0corrected and fulldynamic analysis.
Betacell dysfunction versus control group with low withingroup variation
In this section, we demonstrate that fasting and T0corrected analysis together reveal the metabolic differences between the fasting and dynamic states. In addition, our analysis results indicate that fulldynamic analysis performs better than T0corrected analysis in terms of separating subject groups.
Fastingstate analysis
The fastingstate analysis captures the subject group difference. The scatter plot of PC3 (with pvalue of \(8\times 10^{7}\) from the unpaired ttest) and PC5 (with pvalue of \(2\times 10^{8}\)) gives the best separation between the groups (Fig. 9). Figure 9 also shows that metabolites Glc and Pyr contribute the most since they have large absolute coefficients along the discrimination direction (line \(y=x\)). In addition, it is shown that metabolite Glc (Pyr) is positively (negatively) related to the betacell dysfunction group. This is consistent with Additional file 2: Fig. S2.3, which shows that the betacell dysfunction group has a higher concentration of Glc and a lower concentration of Pyr at the fasting state compared with the control group. The higher plasma Glc in the betacell dysfunction group results from inadequate glucose sensing to stimulate insulin secretion. Due to this inefficient secretion, glycolysis becomes less efficient, and less Pyr is produced, leading to a lower level of plasma Pyr in the betacell dysfunction group.
T0corrected analysis
We choose the 4component CP model for the T0corrected data. This model captures the subject group separation, and the best separation is achieved by \({{\mathbf {{a}}}}_{2}\) (see Fig. 10). In the metabolites mode (\({{\mathbf {{b}}}}_{2}\)), Ins has the largest coefficient followed by Glc, while other metabolites almost have no contribution. This indicates that the dynamic changes of Ins and Glc contribute to the subject group separation, and Ins is the most significant one. Ins is shown to be negatively associated with the betacell dysfunction group. It makes sense that Glc stays close to Ins and shares a similar dynamic pattern as Ins (see Additional file 2: Fig. S2.4) since Ins and Glc regulate each other. In the time mode, \({{\mathbf {{c}}}}_{2}\) captures the dynamic pattern shared by Ins and Glc, i.e., an increase after the meal intake and then a decrease due to the glycolysis. \({{\mathbf {{a}}}}_{1}\) (with pvalue of \(8\times 10^{5}\)) and \({{\mathbf {{a}}}}_{4}\) (with pvalue of \(2\times 10^{4}\)) capture weak group separation in the subjects mode. In the metabolites mode, \({{\mathbf {{b}}}}_{1}\) captures the fact that Pyr, Lac and Ala respond to the meal challenge similarly, and this makes sense since they are close to each other, as shown in the pathway in Fig. 2.
Fulldynamic analysis
We choose the 4component CP model to analyze the fulldynamic data, with factor plots shown in Fig. 11. \({{\mathbf {{a}}}}_{4}\) reveals the subject group differences best. In the metabolites mode (\({{\mathbf {{b}}}}_{4}\)), metabolites Glc and Pyr have the largest coefficients in terms of absolute values. This component also reveals the positive (negative) relation of metabolite Glc (Pyr) with the betacell dysfunction group (i.e., the betacell dysfunction group has negative coefficients in the subjects mode while metabolite Glc (Pyr) has a positive (negative) coefficient in the metabolites mode). In the time mode, we observe a nonconstant dynamic profile (\({{\mathbf {{c}}}}_{4}\)) with a large coefficient at \(t=0\) comparable to the coefficients of other time points. This indicates that the fourth component captures a mixture of information from the fasting and T0corrected states, and the fastingstate signal is relatively strong. The model also captures the pattern (the second component) in T0corrected analysis (see \({{\mathbf {{a}}}}_{2}\), \({{\mathbf {{b}}}}_{2}\) and \({{\mathbf {{c}}}}_{2}\) in Fig. 11), which shows that the betacell dysfunction group is negatively related with Ins.
Comparison of three analysis approaches
In this case, we observe four underlying mechanisms related to (i) Glc & Pyr, (ii) Ins & Glc, (iii) Lac, Ala & Pyr, and (iv) Bhb.
While both fastingstate analysis and T0corrected analysis can separate subject groups, the mechanisms responsible for group separation are different, with Glc & Pyr playing a role at the fastingstate analysis while Ins & Glc playing the main role in the T0corrected analysis. We also observe that fulldynamic analysis outperforms the other two approaches in terms of capturing the subject groups (see the comparison of Figs. 10 and 11). The best separation in the fulldynamic analysis is provided by the fourth component, which captures a mixture of the information from the fasting and T0corrected states. The Glc & Pyr signal at the fasting state gets stronger through the dynamic information. This is consistent with the more evident group difference observed at the postprandial state from the time profiles of Glc (see Additional file 2: Fig. S2.3).
The other mechanisms (iii) Lac, Ala & Pyr, and (iv) Bhb are captured similarly using both T0corrected and fulldynamic data analysis.
The proposed analysis approach reveals metabolic aberrations
The crucial question in an unsupervised analysis is whether such an analysis can find groups among subjects pointing to metabolic differences or even metabolic deficiencies. The two studied metabolic deficiencies clearly show different patterns for the metabolites Ins, Glc and Pyr in the different analyses. These different patterns can be understood using the underlying metabolic network, thereby validating the power of the data analysis approach in terms of its ability to discriminate between different types of metabolic differences. More specifically, insulinresistant subjects are expected to have higher fasting Glc levels, lower fasting Pyr levels, and higher puredynamic Ins levels. On the other hand, betacell dysfunction subjects are expected to have higher fasting Glc levels, lower fasting Pyr levels, and lower puredynamic Ins levels ([29] and references therein). The primary metabolic differences between these two metabolic deficiencies are related to Ins levels, with insulinresistant group having a reduced ability to respond to Ins, resulting in higher Glc levels, which in turn stimulate further insulin secretion by the pancreas; conversely, those with betacell dysfunction have impaired insulin production, resulting in lower Ins levels. Figure 5 together with Fig. 6, and Fig. 9 together with Fig. 10 indicate that our analyses can capture such metabolic differences between these two different types of metabolic deficiencies. Although we considered here only two metabolic deficiencies, we expect this property to hold also for other types of metabolic differences, especially when  in real data  there are more than six metabolites.
CP models reveal stable patterns
Withingroup variation can be enormous in real data making data analysis more challenging. For example, large withingroup variations in healthy subjects might mask the changes of metabolites in response to diseases when exploring biomarker candidates [48]. Our numerical experiments demonstrate that CP models reveal stable patterns even when the level of individual variation is higher. Figure 12 shows the CP factors extracted using T0corrected analysis from data sets generated with insulinresistant and control groups using two different levels of individual variation, i.e., \(\alpha =0.2\) and \(\alpha =0.4\). We observe very similar factors in the metabolites and time modes even with different levels of individual variation. In addition, we consider data sets with an unbalanced number of control and diseased subjects. Table S2.2 and Table S2.3 in Additional file 2 show that CP factors are similar for data sets using balanced versus unbalanced samples, and using different levels of individual variations, with the smallest FMS value of 0.90.
The behaviours of CP models are also tested on data sets with random Gaussian noise mimicking the measurement error. The relative standard deviation (RSD) of the measurements encountered in this study can range from \(4\%\) to \(15\%\), which aligns closely with the range mentioned in [49]. Our experimental findings indicate that the CP models are robust to noise when RSD of a replicate measurement is moderate. For example, when the RSD is \(10\%\) or \(6\%\) for the data set with insulinresistant versus control groups or the data set with betacell dysfunction versus control groups, respectively, the deconvoluted CP patterns are very similar compared to their noisefree versions with the FMS values (noisy versus noiseless patterns) greater than 0.92. With a further increase in RSD, the added noise may have an impact on the number of components in the CP models and the extracted patterns. More detailed results about comparisons of the patterns extracted by the CP models from noisy versus noiseless data can be found in Additional file 5.
Discussion
In this paper, we have proposed an unsupervised approach based on the CP model to improve the analysis of the postprandial metabolomics data. The main focus is to demonstrate how the proposed method can effectively capture the differences in metabolic responses among different subject groups during a meal challenge test (in particular, a priori unknown stratifications rather than the setting where there are a priori labels as in, e.g., clinical interventions). Using the simulated data with known ground truth, we have demonstrated several benefits of the proposed approach: One benefit is the extraction of the underlying patterns in all modes simultaneously, which reveals subjects groups, related metabolites as well as corresponding temporal patterns from timeresolved postprandial data. In addition, we have shown that CP models can extract reliable patterns under various settings, e.g., even for data with considerable withingroup variation. Another benefit is an overall picture of metabolic differences at the fasting versus T0corrected state, which can be obtained by using PCA for the fastingstate data and the CP model for the T0corrected data analysis. Understanding such metabolic differences helps to explore the added value in the postprandial state due to metabolites’ dynamic responses. This, in turn, provides insightful guidance on why the fulldynamic analysis may capture a stronger or weaker group difference compared to the separate analyses of the fastingstate data and T0corrected data in longitudinal analysis. The simulation studies indicate that such metabolic differences depend on the nature of the betweengroup variation, i.e., the data itself, which may be related to metabolic deficiency or aberration due to genetic variations in different groups of subjects. Therefore, to gain a more comprehensive understanding of the underlying metabolic mechanisms, particularly the interplay between the fasting and T0corrected signals, we recommend using these three analyses together. Also, note that while our experiments rely on data sets consisting of a control and a diseased group, the proposed analysis approach holds the promise to reveal subject group differences due to different diseases, e.g., insulinresistant versus betacell dysfunction versus control group, as well as subgroups of diseases, e.g., insulinresistant versus less insulinresistant versus control group.
We expect the proposed approach to be useful also for other applications of timeresolved data analysis, for example, other types of challenge test data such as exercise challenge tests [50]. A common feature of such data is that they are timeresolved and hold both the baseline and puredynamic information. The essential idea is to analyze the baseline data together with the baselinecorrected data to understand their differences. If group structures are detected, understanding such differences will facilitate better utilization of the timeresolved information.
A side product of our study is an approach to generate meaningful simulated postprandial metabolomics data. We use a comprehensive metabolic model to create such data. While the metabolic model was previously proposed, we tuned the model based on a comparison with our real data set including hundreds of subjects. Such simulated models may also be beneficial in terms of the optimal design of meal challenge tests.
Nevertheless, there are also several limitations in this study. Using the metabolic model, we have only generated data where the concentration of each metabolite reaches the highest peak at the same time points for all subjects. However, in reality, time points at the highest peak vary from one subject to another. The main difficulty in generating such data is tuning the parameters while keeping the concentrations in reasonable ranges. Since the mathematical metabolic model involves 1140 parameters and 202 metabolites in eight different organs, parameter tuning is not trivial. Other limitations come from assumptions of the data analysis method. The CP model assumes that temporal patterns (i.e., \({{\mathbf {{c}}}}_{r}\)) are the same across subjects as we have in the simulated data; therefore, it may not be able to capture the individual differences such as shifted temporal profiles. Furthermore, another assumption in the CP model is that the factor matrix in the metabolites modes is the same across all time slices. However, those factor matrices may change over time. To account for such individualspecific temporal profiles or timeevolving factors in the metabolites mode, alternative multiway methods such as the PARAFAC2 model [51] may prove useful as in the analysis of neuroimaging signals [52,53,54] and in chemometrics [55]. We plan to study the performance of PARAFAC2 model in terms of analyzing dynamic metabolomics data as well as joint analysis of dynamic metabolomics data together with other omics measurements [56].
Availability of code, data and materials
All code and simulated data are available in a GitHub repository at https://github.com/Lusource/projectofchallengetestdata. NMR measurements of plasma samples collected during the challenge test from the COPSAC_{2000} cohort are available through a collaboration agreement from COPSAC. For data access requests, please contact Morten A. Rasmussen (morten.arendt@dbac.dk).
References
Harte AL, Varma MC, Tripathi G, McGee KC, AlDaghri NM, AlAttas OS, Sabico S, O’Hare JP, Ceriello A, Saravanan P, et al. High fat intake leads to acute postprandial exposure to circulating endotoxin in type 2 diabetic subjects. Diabetes Care. 2012;35(2):375–82.
Wopereis S, Stroeve JHM, Stafleu A, Bakker GCM, Burggraaf J, Erk MJ, Pellis L, Boessen R, Kardinaal AAF, Ommen B. Multiparameter comparison of a standardized mixed meal tolerance test in healthy and type 2 diabetic subjects: the phenflex challenge. Genes Nutrit. 2017;12(21):1–14.
Wojczynski MK, Glasser SP, Oberman A, Kabagambe EK, Hopkins PN, Tsai MY, Straka RJ, Ordovas JM, Arnett DK. Highfat meal effect on ldl, hdl, and vldl particle size and number in the genetics of lipidlowering drugs and diet network (goldn): an interventional study. Lipids Health Dis. 2011;10(181):1–11.
Lépine G, TremblayFranco M, Bouder S, Dimina L, Fouillet H, Mariotti F, Polakof S. Investigating the postprandial metabolome after challenge tests to assess metabolic flexibility and dysregulations associated with cardiometabolic diseases. Nutrients. 2022;14(3):472.
Kumar AA, Satheesh G, Vijayakumar G, Chandran M, Prabhu PR, Simon L, Kutty VR, Kartha CC, Jaleel A. Postprandial metabolism is impaired in overweight normoglycemic young adults without family history of diabetes. Sci Rep. 2020;10:353.
Zeevi D, Korem T, Zmora N, Israeli D, Rothschild D, Weinberger A, BenYacov O, Lador D, AvnitSagi T, LotanPompan M, et al. Personalized nutrition by prediction of glycemic responses. Cell. 2015;163(5):1079–94.
Berry SE, Valdes AM, Drew DA, Asnicar F, Mazidi M, Wolf J, Capdevila J, Hadjigeorgiou G, Davies R, Al Khatib H, et al. Human postprandial responses to food and potential for precision nutrition. Nat Med. 2020;26(6):964–73.
Müllner E, Röhnisch HE, Von Brömssen C, Moazzami AA. Metabolomics analysis reveals altered metabolites in lean compared with obese adolescents and additional metabolic shifts associated with hyperinsulinaemia and insulin resistance in obese adolescents: A crosssectional study. Metabolomics. 2021;17(1):1–13.
Vis DJ, Westerhuis JA, Jacobs DM, Duynhoven JP, Wopereis S, Ommen B, Hendriks MM, Smilde AK. Analyzing metabolomicsbased challenge tests. Metabolomics. 2015;11(1):50–63.
Smilde AK, Jansen JJ, Hoefsloot HC, Lamers RJA, Van Der Greef J, Timmerman ME. Anovasimultaneous component analysis (asca): a new tool for analyzing designed metabolomics data. Bioinformatics. 2005;21(13):3043–8.
Harrington P, Vieira NE, Espinoza J, Nien JK, Romero R, Yergey AL. Analysis of varianceprincipal component analysis: a soft tool for proteomic discovery. Anal Chim Acta. 2005;544(1–2):118–27.
Thissen U, Wopereis S, Berg SA, Bobeldijk I, Kleemann R, Kooistra T, Dijk K, Ommen B, Smilde AK. Improving the analysis of designed studies by combining statistical modelling with study design information. BMC Bioinf. 2009;10(1):1–15.
Thiel M, Feraud B, Govaerts B. ASCA+ and APCA+: extensions of ASCA and APCA in the analysis of unbalanced multifactorial designs. J Chemom. 2017;31(6):2895.
Martin M, Govaerts B. LiMMPCA: Combining ASCA+ and linear mixed models to analyse highdimensional designed data. J Chemom. 2020;34(6):3232.
Pellis L, van Erk MJ, van Ommen B, Bakker GCM, Hendriks HFJ, Cnubben NHP, Kleemann R, van Someren EP, Bobeldijk I, Rubingh CM, Wopereis S. Plasma metabolomics and proteomics profiling after a postprandial challenge reveal subtle diet effects on human metabolic status. Metabolomics. 2012;8:347–59.
Kolda TG, Bader BW. Tensor decompositions and applications. SIAM Rev. 2009;51(3):455–500.
Acar E, Yener B. Unsupervised multiway data analysis: a literature survey. IEEE Trans Knowl Data Eng. 2009;21(1):6–20.
Smilde A, Bro R, Geladi P. MultiWay Analysis: Applications in the Chemical Sciences. West Sussex: Wiley; 2004.
Harshman RA. Foundations of the parafac procedure: Models and conditions for an “explanatory” multimodal factor analysis. UCLA working papers in phonetics 1970;16, 1–84.
Carroll JD, Chang JJ. Analysis of individual differences in multidimensional scaling via an nway generalization of “eckartyoung’’ decomposition. Psychometrika. 1970;35(3):283–319.
Kruskal JB. Threeway arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra Appl. 1977;18(2):95–138.
Bro R. PARAFAC. Tutorial and applications. Chemometrics and Intelligent Laboratory Systems 1997;38(2), 149–171.
Martino C, Shenhav L, Marotz CA, Armstrong G, McDonald D, VázquezBaeza Y, Morton JT, Jiang L, DominguezBello MG, Swafford AD, et al. Contextaware dimensionality reduction deconvolutes gut microbial community dynamics. Nat Biotechnol. 2021;39(2):165–8.
Gardlo A, Smilde AK, Hron K, Hrda M, Karlikova R, Friedeckỳ D, Adam T. Normalization techniques for PARAFAC modeling of urine metabolomic data. Metabolomics. 2016;12(12):1–13.
Li L, Hoefsloot H, Graaf AA, Acar E, Smilde AK. Exploring dynamic metabolomics data with multiway data analysis: a simulation study. BMC Bioinf. 2022;23(31):1–22.
Twisk J, Bosman L, Hoekstra T, Rijnhart J, Welten M, Heymans M. Different ways to estimate treatment effects in randomised controlled trials. Contemp Clin Trials Commun. 2018;10:80–5.
Mattes RD. Oral fat exposure alters postprandial lipid metabolism in humans. Am J Clin Nutr. 1996;63(6):911–7.
Wopereis S, Wolvers D, Erk M, Gribnau M, Kremer B, Dorsten FA, Boelsma E, Garczarek U, Cnubben N, Frenken L, et al. Assessment of inflammatory resilience in healthy subjects using dietary lipid and glucose challenges. BMC Med Genom. 2013;6:44.
Kurata H. Virtual metabolic human dynamic model for pathological analysis and therapy design for diabetes. iScience. 2021;24(2): 102101.
Bisgaard H. The Copenhagen Prospective Study on Asthma in Childhood (COPSAC): design, rationale, and baseline data from a longitudinal birth cohort study. Ann Allergy Asthma Immunol. 2004;93(4):381–9.
Stroeve JH, Wietmarschen H, Kremer BH, Ommen B, Wopereis S. Phenotypic flexibility as a measure of health: the optimal nutritional stress response test. Genes Nutrit. 2015;10(3):1–21.
Hitchcock FL. The expression of a tensor or a polyadic as a sum of products. J Math Phys. 1927;6(1):164–89.
Papalexakis EE, Faloutsos C, Sidiropoulos ND. Tensors for data mining and data fusion: models, applications, and scalable algorithms. ACM Trans Intell Syst Technol. 2016;8(2):1.
Acar E, Bingol CA, Bingol H, Bro R, Yener B. Multiway analysis of epilepsy tensors. Bioinformatics. 2007;23(13):10–8.
Williams AH, Kim TH, Wang F, Vyas S, Ryu SI, Shenoy KV, Schnitzer M, Kolda TG, Ganguli S. Unsupervised discovery of demixed, lowdimensional neural dynamics across multiple timescales through tensor component analysis. Neuron. 2018;98(6):1099–115.
Tomasi G, Bro R. Parafac and missing values. Chemom Intell Lab Syst. 2005;75(2):163–80.
Acar E, Dunlavy DM, Kolda TG, Mørup M. Scalable tensor factorizations for incomplete data. Chemom Intell Lab Syst. 2011;106(1):41–56.
Håstad J. Tensor rank is npcomplete. J Algorithms. 1990;11(4):644–54.
Adali T, Kantar F, Akhonda MABS, Strother S, Calhoun VD, Acar E. Reproducibility in matrix and tensor decompositions: focus on model match, interpretability, and uniqueness. IEEE Signal Process Mag. 2022;39(4):8–24.
Bro R, Kiers HA. A new efficient method for determining the number of components in PARAFAC models. J Chemom. 2003;17(5):274–86.
Harshman RA, De Sarbo WS. An application of PARAFAC to a small sample problem, demonstrating preprocessing, orthogonality constraints, and splithalf diagnostic techniques in Research Methods for Multimode Data Analysis. New York: Praeger; 1984. p. 602–42.
Bro R, Smilde AK. Centering and scaling in component analysis. J Chemom. 2003;17(1):16–33.
Acar E, Dunlavy DM, Kolda TG. A scalable optimization approach for fitting canonical tensor decompositions. J Chemom. 2011;25(2):67–86.
Bader BW, Kolda TG, et al. Matlab Tensor Toolbox, Version 3.1. https://www.tensortoolbox.org
Petersen KF, Dufour S, Savage DB, Bilz S, Solomon G, Yonemitsu S, Cline GW, Befroy D, Zemany L, Kahn BB, et al. The role of skeletal muscle insulin resistance in the pathogenesis of the metabolic syndrome. Proc Natl Acad Sci. 2007;104(31):12587–94.
Merz KE, Thurmond DC. Role of skeletal muscle in insulin resistance and glucose uptake. Compr Physiol. 2011;10(3):785–809.
Kahler A, Zimmermann M, Langhans W. Suppression of hepatic fatty acid oxidation and food intake in men. Nutrition. 1999;15(11–12):819–28.
Saito K, Maekawa K, Pappan KL, Urata M, Ishikawa M, Kumagai Y, Saito Y. Differences in metabolite profiles between blood matrices, ages, and sexes among caucasian individuals and their interindividual variations. Metabolomics. 2014;10(3):402–13.
Smilde AK, Werf MJ, Schaller JP, Kistemaker C. Characterizing the precision of massspectrometrybased metabolic profiling platforms. Analyst. 2009;134(11):2281–5.
Vilozni D, Bentur L, Efrati O, Barak A, Szeinberg A, Shoseyov D, Yahav Y, Augarten A. Exercise challenge test in 3to 6yearold asthmatic children. Chest. 2007;132(2):497–503.
Harshman RA. Parafac2: Mathematical and technical notes. UCLA working papers in phonetics, 1972;22, 30–44
Madsen KH, Churchill NW, Mørup M. Quantifying functional connectivity in multisubject fMRI data using component models. Hum Brain Mapp. 2017;38:882–99.
Roald M, Bhinge S, Jia C, Calhoun V, Adali T, Acar E. Tracing network evolution using the PARAFAC2 model. In: ICASSP’20, 2020;1100–1104 .
Acar E, Roald M, Hossain KM, Calhoun VD, Adali T. Tracing evolving networks using tensor factorizations vs. icabased approaches. Front Neurosci 2022;16:861402.
Bro R, Andersson CA, Kiers HA. Parafac2–part ii. modeling chromatographic data with retention time shifts. J Chemomet 1999;13:295–309.
Jendoubi T, Ebbels TMD. Integrative analysis of time course metabolic data and biomarker discovery. BMC Bioinf. 2020;21:11.
Acknowledgements
We thank the children and families of the COPSAC\(_{2000}\) cohort for their contribution, and the clinical team at COPSAC for conducting the clinical testing.
Funding
The work presented in this article is supported by Research Council of Norway project 300489 (L. L, S. Y, M. A. R, A. K. S, E. A) and Novo Nordisk Foundation Grant NNF19OC0057934 (L. L, M. A. R, A. K. S, E. A).
Author information
Authors and Affiliations
Contributions
A. K. S and E. A formulated the research problem. B. C and M. A. R prepared for the real data. B. M. B, H. H and L. L prepared for the simulated data. L. L performed the data analysis. S. Y and E. A validated the analysis. L. L, E. A, A. K. S, B. M. B and H. H interpreted the simulated data. L. L, E. A, A. K. S, B. M. B, H. H and D. H performed the comparison between the real and simulated data. L. L, A. K. S and E. A prepared the original draft. All authors were involved in the preparation of the final manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
All procedures performed in studies involving human participants were in accordance with the Declaration of Helsinki and was approved by The Copenhagen Ethics Committee (KF 01289/96 and H16039498) and The Danish Data Protection Agency (2015413696). Written consent was obtained from the participants.
Consent for publication
Not applicable.
Competing interests
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be considered as a potential competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1
. A list of parameters that are randomly perturbated to generate different individuals.
Additional file 2
. Supplementary figures and tables for the main manuscript.
Additional file 3
. Selection of the number of components in CP models.
Additional file 4
. Comparison of CP models for real vs. simulated data.
Additional file 5
. Experiments on data sets with random noise.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Li, L., Yan, S., Bakker, B.M. et al. Analyzing postprandial metabolomics data using multiway models: a simulation study. BMC Bioinformatics 25, 94 (2024). https://doi.org/10.1186/s1285902405686w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1285902405686w