Skip to main content

Table 1 Comparison of sketching and sampling approaches

From: Sketching and sampling approaches for fast and accurate long read classification

Approach

Index generation time

Total index size

K-mer query time

Uniform

O(n)

O(s)

O(1)

MinHash

O(n log s)

O(s)

O(1)

Weighted MinHash

O(n log s) + O(n)

O(s) + O(s) weights

O(1)

Order MinHash

O(n log s)

O(s) + O(s) positions

O(L)

Minimizer

O(n)

O(s)

O(1)

  1. The theoretical runtimes for generating and querying screens of size s generated from a genome of size n. The main three approaches (Uniform, MinHash and Minimizer), are largely equivalent in terms of their computational cost. The augmented approaches (Weighted, Order) incur additional overhead, with Order MinHash also involving a more complex query process when comparing two sketches, depending on the choice of size of sublists (L). Exact counts of screen sizes and the number of lookups performed during a classification experiment, as well as the overhead of an exhaustive approach, can be found in Additional file 1: Table 5