TY - JOUR AU - Lopes, Marta B. AU - Veríssimo, André AU - Carrasquinha, Eunice AU - Casimiro, Sandra AU - Beerenwinkel, Niko AU - Vinga, Susana PY - 2018 DA - 2018/05/04 TI - Ensemble outlier detection and gene selection in triple-negative breast cancer data JO - BMC Bioinformatics SP - 168 VL - 19 IS - 1 AB - Learning accurate models from ‘omics data is bringing many challenges due to their inherent high-dimensionality, e.g. the number of gene expression variables, and comparatively lower sample sizes, which leads to ill-posed inverse problems. Furthermore, the presence of outliers, either experimental errors or interesting abnormal clinical cases, may severely hamper a correct classification of patients and the identification of reliable biomarkers for a particular disease. We propose to address this problem through an ensemble classification setting based on distinct feature selection and modeling strategies, including logistic regression with elastic net regularization, Sparse Partial Least Squares - Discriminant Analysis (SPLS-DA) and Sparse Generalized PLS (SGPLS), coupled with an evaluation of the individuals’ outlierness based on the Cook’s distance. The consensus is achieved with the Rank Product statistics corrected for multiple testing, which gives a final list of sorted observations by their outlierness level. SN - 1471-2105 UR - https://doi.org/10.1186/s12859-018-2149-7 DO - 10.1186/s12859-018-2149-7 ID - Lopes2018 ER -