Skip to main content

Table 2 BLCA accuracy is insenesitve to the inclusion of dissimilar BLAST hits

From: A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy

Taxonomic levels Genus Species
16S region topPercent Filter BLCA MEGAN BLCA MEGAN
V2 5% 0.9539 ± 0.0038 0.9531 ± 0.0044 0.7747 ± 0.0150 0.8091 ± 0.0153
10% 0.9498 ± 0.0019 0.9334 ± 0.0079 0.7594 ± 0.0164 0.7290 ± 0.0114
20% 0.9487 ± 0.0018 0.8966 ± 0.0080 0.7580 ± 0.0176 0.5983 ± 0.0075
V4 5% 0.9078 ± 0.0078 0.9230 ± 0.0082 0.5597 ± 0.0175 0.6497 ± 0.0058
10% 0.8982 ± 0.0107 0.8830 ± 0.0115 0.5331 ± 0.0208 0.5238 ± 0.0161
20% 0.8965 ± 0.0092 0.8016 ± 0.0041 0.5317 ± 0.0189 0.3915 ± 0.0119
V1V3 5% 0.9960 ± 0.0009 0.9778 ± 0.0006 0.9314 ± 0.0058 0.8394 ± 0.0069
10% 0.9965 ± 0.0012 0.9528 ± 0.004 0.9323 ± 0.0054 0.7071 ± 0.0053
20% 0.9959 ± 0.0009 0.8609 ± 0.0087 0.9321 ± 0.0053 0.4673 ± 0.0150
V3V5 5% 0.9865 ± 0.0020 0.9550 ± 0.0041 0.8380 ± 0.0064 0.7025 ± 0.0112
10% 0.9863 ± 0.0011 0.9002 ± 0.0027 0.8335 ± 0.0072 0.5206 ± 0.0108
20% 0.9863 ± 0.0011 0.7369 ± 0.0094 0.8361 ± 0.0039 0.2880 ± 0.0061
V6V9 5% 0.9933 ± 0.0011 0.9532 ± 0.0050 0.8722 ± 0.0066 0.7258 ± 0.0129
10% 0.9925 ± 0.0012 0.8939 ± 0.0041 0.8690 ± 0.0012 0.5227 ± 0.0140
20% 0.9931 ± 0.0017 0.7138 ± 0.0083 0.8701 ± 0.0050 0.2691 ± 0.0255
  1. The parameter topPercent is for keeping only the BLAST hits whose bit scores are within a given percentage of the best BLAST hit. The larger the parameter is, the more dissimilar database hits are included for taxonomic classification for the query sequence. The default value in MEGAN for this parameter is 10%. In our comparisons, we set the value of topPercent to be 5, 10 and 20% for both BLCA and MEGAN, the recommended range by the original MEGAN publication, to compare the performance of BLCA and MEGAN under different stringencies of retaining BLAST hits. Each table entry shows the average and standard deviation of the F-scores, based on the confidence score threshold of 0.8, for each tested software at the corresponding 16S region. The F-scores of BLCA are much less sensitive to the value of topPercent when compared to MEGAN