Skip to main content
Figure 3 | BMC Bioinformatics

Figure 3

From: SparSNP: Fast and memory-efficient analysis of all SNPs for phenotype prediction

Figure 3

Prediction experiments. LOESS-smoothed AUC and explained phenotypic variance (denoted “VarExp”), for the Finnish celiac disease dataset, for increasing model sizes. AUC is estimated over 20×3-fold cross-validation, except for HyperLasso for which we ran only 2×3-fold cross-validation due to the high computational cost. The explained phenotypic variance is estimated from the AUC using the method of [11], assuming a population prevalence of celiac disease K=1%. Note that glmnet, HyperLasso, LIBLINEAR (denoted “LL-L1”), and SparSNP used an 1-penalised model, whereas LIBLINEAR-CDBLOCK (denoted “LL-CD-L2”) used an 2-penalised model (non sparse), inducing a model using all 516,504 SNPs, therefore it is shown as a horizontal line across all model sizes. Note that tuning the 2penalty for LIBLINEAR-CDBLOCK resulted in very similar AUC

Back to article page