Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: Robust classification using average correlations as features (ACF)

Fig. 1

Concepts and results regarding the comparison of correlation-based classifiers on simulated datasets. A Concept sketch of ACF. B Concept sketch of the data-generating process for datasets with \(N_{A}\), \(N_{B}\), \(N_{C}\) samples from each of the three classes A, B, C respectively. C Dependency of the properties of correlation matrices from the data-generating process on the percentage of missing values and the standard deviation per feature. Lines indicate the mean value per combination of classes A, B and C over 20 repetitions. D Dependency of the macro-averaged F1-score of ACF (red), DBC (green), KNN (blue) and KNN with random oversampling (KNN+ROs, orange) on the average relative noise. ACF may incorporate additional predictive covariates (violet). The predictiveness of the covariate alone is indicated in black. Solid lines and shaded areas indicate mean and standard deviations over 10 repetitions. For ACF, we tested multiple baseline classifiers (support-vector-classifier/RandomForest/ridge) and report the scores for the SVC which performed best in most cases. E Dependency of the macro-averaged F1-score (averaged over 10 repetitions) of the considered correlation-based classifiers on class imbalance. F Top: Dependency of the macro-averaged F1-score (averaged over 10 repetitions) of the considered correlation-based classifiers on the size of the training set. Bottom: Mean and standard deviation of the selected number of nearest neighbors for KNN and KNN+ROs. Results are averaged over 10 repetitions. G Mean and standard deviation over 10 repetitions of the runtime per predicted test instance for varying number of training instances. Linestyles indicate the respective algorithms (KNN, DBC, F-ACF and F-DBC) and colors represent the respective number of reference instances per class (10, 20, 30 and naïve (all)). For F-ACF, we chose a support-vector-classifier (kernel = “rbf”, C = 100) as baseline classifier. H Visualization of the trade-off between the number of reference instances for F-ACF and the F1-macro score achieved at a fixed noise level (averaged over 10 repetitions). Both ACF (orange) and F-ACF (blue) use a support-vector-machine as baseline classifier

Back to article page