Skip to main content

Table 2 BLCA accuracy is insenesitve to the inclusion of dissimilar BLAST hits

From: A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy

Taxonomic levels

Genus

Species

16S region

topPercent Filter

BLCA

MEGAN

BLCA

MEGAN

V2

5%

0.9539 ± 0.0038

0.9531 ± 0.0044

0.7747 ± 0.0150

0.8091 ± 0.0153

10%

0.9498 ± 0.0019

0.9334 ± 0.0079

0.7594 ± 0.0164

0.7290 ± 0.0114

20%

0.9487 ± 0.0018

0.8966 ± 0.0080

0.7580 ± 0.0176

0.5983 ± 0.0075

V4

5%

0.9078 ± 0.0078

0.9230 ± 0.0082

0.5597 ± 0.0175

0.6497 ± 0.0058

10%

0.8982 ± 0.0107

0.8830 ± 0.0115

0.5331 ± 0.0208

0.5238 ± 0.0161

20%

0.8965 ± 0.0092

0.8016 ± 0.0041

0.5317 ± 0.0189

0.3915 ± 0.0119

V1V3

5%

0.9960 ± 0.0009

0.9778 ± 0.0006

0.9314 ± 0.0058

0.8394 ± 0.0069

10%

0.9965 ± 0.0012

0.9528 ± 0.004

0.9323 ± 0.0054

0.7071 ± 0.0053

20%

0.9959 ± 0.0009

0.8609 ± 0.0087

0.9321 ± 0.0053

0.4673 ± 0.0150

V3V5

5%

0.9865 ± 0.0020

0.9550 ± 0.0041

0.8380 ± 0.0064

0.7025 ± 0.0112

10%

0.9863 ± 0.0011

0.9002 ± 0.0027

0.8335 ± 0.0072

0.5206 ± 0.0108

20%

0.9863 ± 0.0011

0.7369 ± 0.0094

0.8361 ± 0.0039

0.2880 ± 0.0061

V6V9

5%

0.9933 ± 0.0011

0.9532 ± 0.0050

0.8722 ± 0.0066

0.7258 ± 0.0129

10%

0.9925 ± 0.0012

0.8939 ± 0.0041

0.8690 ± 0.0012

0.5227 ± 0.0140

20%

0.9931 ± 0.0017

0.7138 ± 0.0083

0.8701 ± 0.0050

0.2691 ± 0.0255

  1. The parameter topPercent is for keeping only the BLAST hits whose bit scores are within a given percentage of the best BLAST hit. The larger the parameter is, the more dissimilar database hits are included for taxonomic classification for the query sequence. The default value in MEGAN for this parameter is 10%. In our comparisons, we set the value of topPercent to be 5, 10 and 20% for both BLCA and MEGAN, the recommended range by the original MEGAN publication, to compare the performance of BLCA and MEGAN under different stringencies of retaining BLAST hits. Each table entry shows the average and standard deviation of the F-scores, based on the confidence score threshold of 0.8, for each tested software at the corresponding 16S region. The F-scores of BLCA are much less sensitive to the value of topPercent when compared to MEGAN