Skip to main content

Table 1 Comparison of the classification accuracies using the simulated dataset

From: A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy

CST = 0.8 V2 V4 V1V3 V3V5 V6V9
Species BLCA 0.7594 ± 0.0164* 0.5331 ± 0.0208 0.9323 ± 0.0054* 0.8335 ± 0.0072* 0.8690 ± 0.0012*
Kraken 0.7275 ± 0.0054 0.5326 ± 0.0181 0.8672 ± 0.0072 0.7542 ± 0.0087 0.7572 ± 0.0056
MEGAN 0.7290 ± 0.0114 0.5238 ± 0.0161 0.7071 ± 0.0053 0.5206 ± 0.0108 0.5227 ± 0.0140
RDP 0.6102 ± 0.0042 0.3928 ± 0.0292 0.8549 ± 0.0199 0.7307 ± 0.0203 0.7823 ± 0.0124
SPINGO 0.5700 ± 0.0187 0.3910 ± 0.0106 0.7907 ± 0.0061 0.6900 ± 0.0071 0.7318 ± 0.0116
Genus BLCA 0.9498 ± 0.0019* 0.8982 ± 0.0107* 0.9965 ± 0.0012* 0.9863 ± 0.0011* 0.9925 ± 0.0012*
Kraken 0.9072 ± 0.0066 0.8612 ± 0.0189 0.9691 ± 0.0051 0.9463 ± 0.0006 0.9437 ± 0.0034
MEGAN 0.9334 ± 0.0079 0.8830 ± 0.0115 0.9528 ± 0.0040 0.9002 ± 0.0027 0.8939 ± 0.0041
RDP 0.8768 ± 0.0065 0.8067 ± 0.0139 0.9629 ± 0.0072 0.9562 ± 0.0065 0.9657 ± 0.0042
SPINGO 0.8481 ± 0.0002 0.7726 ± 0.0077 0.9333 ± 0.0057 0.9192 ± 0.0034 0.9238 ± 0.0067
Family BLCA 0.9791 ± 0.0009* 0.9787 ± 0.0018* 0.9984 ± 0.0019* 0.9975 ± 0.0019* 0.9970 ± 0.0014*
Kraken 0.9594 ± 0.0038 0.9480 ± 0.0028 0.9882 ± 0.0021 0.9850 ± 0.0033 0.9799 ± 0.0032
MEGAN 0.9495 ± 0.0089 0.9413 ± 0.0015 0.9517 ± 0.0032 0.9397 ± 0.0044 0.9447 ± 0.0034
RDP 0.9461 ± 0.0093 0.9295 ± 0.0062 0.9818 ± 0.0007 0.9806 ± 0.0054 0.9855 ± 0.0013
SPINGO NA NA NA NA NA
CST = 0.5 V2 V4 V1V3 V3V5 V6V9
Species BLCA 0.8485 ± 0.0128* 0.6813 ± 0.0115* 0.9629 ± 0.0077* 0.9050 ± 0.0034* 0.9315 ± 0.0045*
Kraken 0.7275 ± 0.0054 0.5326 ± 0.0181 0.8672 ± 0.0072 0.7542 ± 0.0087 0.7572 ± 0.0056
MEGAN 0.7290 ± 0.0114 0.5238 ± 0.0161 0.7071 ± 0.0053 0.5206 ± 0.0108 0.5227 ± 0.0140
RDP 0.7526 ± 0.0107 0.5692 ± 0.0194 0.8997 ± 0.0144 0.8221 ± 0.0105 0.8621 ± 0.0094
SPINGO 0.6570 ± 0.0124 0.5008 ± 0.0114 0.8256 ± 0.0038 0.7497 ± 0.0041 0.7805 ± 0.0021
Genus BLCA 0.9722 ± 0.0028* 0.9467 ± 0.0031* 0.9985 ± 0.0019* 0.9947 ± 0.0013* 0.9972 ± 0.0002*
Kraken 0.9072 ± 0.0066 0.8612 ± 0.0189 0.9691 ± 0.0051 0.9463 ± 0.0006 0.9437 ± 0.0034
MEGAN 0.9334 ± 0.0079 0.8830 ± 0.0115 0.9528 ± 0.0040 0.9002 ± 0.0027 0.8939 ± 0.0041
RDP 0.9319 ± 0.0044 0.8960 ± 0.0086 0.9710 ± 0.0049 0.9693 ± 0.0046 0.9729 ± 0.0003
SPINGO 0.8807 ± 0.0034 0.8354 ± 0.0041 0.9400 ± 0.0030 0.9287 ± 0.0024 0.9317 ± 0.0083
Family BLCA 0.9870 ± 0.0013* 0.9856 ± 0.0035* 0.9987 ± 0.0021* 0.9991 ± 0.0012* 0.9984 ± 0.0019*
Kraken 0.9594 ± 0.0038 0.9480 ± 0.0028 0.9882 ± 0.0021 0.9850 ± 0.0033 0.9799 ± 0.0032
MEGAN 0.9495 ± 0.0089 0.9413 ± 0.0015 0.9517 ± 0.0032 0.9397 ± 0.0044 0.9447 ± 0.0034
RDP 0.9696 ± 0.0040 0.9674 ± 0.0015 0.9836 ± 0.0017 0.9830 ± 0.0033 0.9868 ± 0.0004
SPINGO NA NA NA NA NA
  1. Each entry in the table shows the average and standard deviation of the F-scores for a particular classifier (i.e., rows) at a specific 16S region (i.e., columns) based on three random sets of 1000 test sequences. Two confidence score thresholds (CST), 0.8 and 0.5, were applied for BLCA, RDP Classifier, and SPINGO as described in the main text. The *indicates that the F-scores of BLCA are significantly higher than those of other software, based on a one-tailed paired t-test with a p-value less than 0.05. Similar statistical significance was also obtained using the one-tailed Wilcoxon signed-rank test. Note that the SPINGO program does not produce family-level classification. In addition, Kraken and MEGAN do not provide any probabilistic-based parameters for evaluating the assigned taxa, thus we used their default taxonomic assignments for comparison