Fig. 1From: Predicting the pathogenicity of bacterial genomes using widely spread protein familiesDataset pre-processing overview. We report the number of genomes in the WSPC training and test data, after each pre-processing step (“Dataset” section). Note that genomes that could not be labeled were removed. WGS: Whole-genome sequencesBack to article page