Skip to main content

Table 4 Synopsis of popular statistical methods for metabolomics studies

From: Statistical methods and resources for biomarker discovery using metabolomics

 

Methods

Strengths

Limitations

Univariate

T test

Mann Whitney

Chi-square

ANOVA

Kruskal Wallis

Straightforward application

Easy to interpret the results

Requires prior knowledge of data

No information about inter-variable relationships that is crucial in a biological set-up

Outliers cannot be determined

Multiple linear regression with Bonferroni correction (with one explanatory variable)

Easy to apply and interpret

Significance level affected by sample size

Does not account for intercorrelation

Multiple linear regression with false discovery rate (with one explanatory variable)

Easy to use and interpret

Preferred over Bonferroni method

Increases the number of false negatives

Multivariate

Principle component analysis

Effective in variable reduction

Uses the complete collected data

Easy to manage complex data

Focuses on the inter-variable relationships

Requires no prior knowledge of data

No clarity on how to rank the metabolites

Biological interpretation may be challenging

Partial least square discriminant analysis

Orthogonal partial least square discriminant analysis

Dimensional reduction to comprehensible level

No data wastage

Shows relationship between variables, apt in a biological setting

Handles large, complex data

Prior knowledge of data required

Over-fitting issues

No significance level of the most important metabolites

Abundant variables mask the effect of lesser abundant variables

Cross-validation steps required to predict accuracy of model

Random Forest, SVM and other ML methods

Handles complex data

Robust to outliers

Finds complex relationships between metabolites and between metabolite and other factors

Excessive tuning may be required to retrieve best model

Less efficient for truly linear data

Does not provide metabolite selection