From: Statistical methods and resources for biomarker discovery using metabolomics
 | Methods | Strengths | Limitations |
---|---|---|---|
Univariate | T test Mann Whitney Chi-square ANOVA Kruskal Wallis | Straightforward application Easy to interpret the results | Requires prior knowledge of data No information about inter-variable relationships that is crucial in a biological set-up Outliers cannot be determined |
Multiple linear regression with Bonferroni correction (with one explanatory variable) | Easy to apply and interpret | Significance level affected by sample size Does not account for intercorrelation | |
Multiple linear regression with false discovery rate (with one explanatory variable) | Easy to use and interpret Preferred over Bonferroni method | Increases the number of false negatives | |
Multivariate | Principle component analysis | Effective in variable reduction Uses the complete collected data Easy to manage complex data Focuses on the inter-variable relationships Requires no prior knowledge of data | No clarity on how to rank the metabolites Biological interpretation may be challenging |
Partial least square discriminant analysis Orthogonal partial least square discriminant analysis | Dimensional reduction to comprehensible level No data wastage Shows relationship between variables, apt in a biological setting Handles large, complex data | Prior knowledge of data required Over-fitting issues No significance level of the most important metabolites Abundant variables mask the effect of lesser abundant variables Cross-validation steps required to predict accuracy of model | |
Random Forest, SVM and other ML methods | Handles complex data Robust to outliers Finds complex relationships between metabolites and between metabolite and other factors | Excessive tuning may be required to retrieve best model Less efficient for truly linear data Does not provide metabolite selection |