Fig. 3From: Predicting the pathogenicity of bacterial genomes using widely spread protein familiesPrediction performance before and after removing highly correlated features from the training set (excluding the validation set). A The percentage of pairs of features that have a correlation within a specific range, for different ranges. The labels on the ’x’ axis represent the middle of the relevant range, where each range width is 0.1. B Validation set results of the RF classifier trained using the 450 features selected in the first step, and the RF classifier trained using the set of 244 features obtained after removing highly correlated features in the second feature selection stepBack to article page