Skip to main content

Table 4 Performance on real sequencing data

From: Sketching and sampling approaches for fast and accurate long read classification

  

Total number of reads classified

Number of reads classified to multiple genomes

Number of unclassified reads

Number of reads with same prediction as Minimap2

No Threshold

 > 50% of read aligned

Vegan (1.90 M Reads)

Minimap2 (No threshold)

1,490,713 (78.3%)

485,233 (25.4%)

413,446 (21.7%)

N/A

Minimap2 (> 50% of read aligned)

1,069,306 (56.1%)

23,199 (1.2%)

834,853 (43.9%)

N/A

Kraken2

1,399,341 (73.5%)

562,744 (29.6%)

504,818 (26.5%)

827,572

816,331

MinHash

1,032,396 (54.2%)

261,732 (13.7%)

871,763 (45.8%)

890,766

821,556

Minimizer

1,029,996 (54.1%)

262,514 (13.8%)

874,163 (45.9%)

891,388

820,493

Uniform

1,021,555 (53.6%)

264,867 (13.9%)

882,604 (46.7%)

883,465

819,017

Omnivore (1.79 M Reads)

Minimap2 (No threshold)

1,530,795 (85.4%)

490,501 (27.4%)

261,351 (14.6%)

N/A

Minimap2 (> 50% of read aligned)

1,144,452 (63.8%)

24,271 (1.3%)

647,694 (36.2%)

N/A

Kraken2

1,442,671 (80.5%)

578,113 (32.2%)

349,475 (19.5%)

855,670

835,888

MinHash

1,111,102 (62.0%)

275,344 (15.4%)

681,044 (38.0%)

915,634

873,901

Minimizer

1,105,356 (61.7%)

278,654 (15.5%)

686,790 (38.3%)

916,998

872,955

Uniform

1,098,244 (61.3%)

281,745 (15.7%)

693,902 (38.7%)

911,554

871,675

  1. Comparison of the classification of our sketching and sampling approaches against Kraken2 and Minimap2 classifications across two PacBio HiFi Gut Microbiome datasets. The addition of a threshold requiring that > 50% of a read is aligned seems to remove a number of more spurious or insignificant calls, increasing concordance between Minimap2 and the other benchmarked approaches