Statistical methods and resources for biomarker discovery using metabolomics

Anwardeen, Najeha R.; Diboun, Ilhame; Mokrab, Younes; Althani, Asma A.; Elrayess, Mohamed A.

doi:10.1186/s12859-023-05383-0

BMC Bioinformatics

Table 4 Synopsis of popular statistical methods for metabolomics studies

From: Statistical methods and resources for biomarker discovery using metabolomics

	Methods	Strengths	Limitations
Univariate	T test Mann Whitney Chi-square ANOVA Kruskal Wallis	Straightforward application Easy to interpret the results	Requires prior knowledge of data No information about inter-variable relationships that is crucial in a biological set-up Outliers cannot be determined
	Multiple linear regression with Bonferroni correction (with one explanatory variable)	Easy to apply and interpret	Significance level affected by sample size Does not account for intercorrelation
	Multiple linear regression with false discovery rate (with one explanatory variable)	Easy to use and interpret Preferred over Bonferroni method	Increases the number of false negatives
Multivariate	Principle component analysis	Effective in variable reduction Uses the complete collected data Easy to manage complex data Focuses on the inter-variable relationships Requires no prior knowledge of data	No clarity on how to rank the metabolites Biological interpretation may be challenging
	Partial least square discriminant analysis Orthogonal partial least square discriminant analysis	Dimensional reduction to comprehensible level No data wastage Shows relationship between variables, apt in a biological setting Handles large, complex data	Prior knowledge of data required Over-fitting issues No significance level of the most important metabolites Abundant variables mask the effect of lesser abundant variables Cross-validation steps required to predict accuracy of model
	Random Forest, SVM and other ML methods	Handles complex data Robust to outliers Finds complex relationships between metabolites and between metabolite and other factors	Excessive tuning may be required to retrieve best model Less efficient for truly linear data Does not provide metabolite selection

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com