Skip to main content

Table 1 Summary of search space per k-mer size and number of k-mers found in datasets

From: Cnidaria: fast, reference-free clustering of raw and assembled genome and transcriptome NGS data

k-mer size

# Canonical k-mer combinations

% of k-mers found per sample

% of k-mers found per sample, shared by at least two samples

Median

MAD

Median

MAD

11-mer

2.10 × 1006

100.00 %

1.58 %

100.00 %

0.00 %

15-mer

5.40 × 1008

53.59 %

17.07 %

100.00 %

0.00 %

17-mer

8.60 × 1009

8.90 %

4.03 %

98.37 %

0.99 %

21-mer

2.20 × 1012

0.05 %

0.03 %

81.45 %

20.55 %

31-mer

2.30 × 1018

0.000000061 %

0.000000032 %

67.05 %

24.14 %

  1. The second column contains the total number of possible k-mers, calculated as (4k-mer size/2), where the division by two is due to canonization. The third column is the median and the Median Absolute Deviation (MAD) of the total number of k-mers found in the samples (Additional file 3: Table S3) divided by the number of possible k-mers, showing the percentage of combinations actually found and, consequently, the saturation of the search space; the fourth column gives the median and MAD of the percentage of valid k-mers (k-mers shared between at least two samples, Additional file 3: Table S3)