Skip to main content

Table 3 Number of minimal absent words and generic absent words for some genomes.

From: On finding minimal absent words

Organism Reference Genome size Length, n
    104 104 11
H. sapiens Release 36.1 ≈ 2.9 Gb 44 149 44 970 12
    2 039 862 2 368 682 13
    190 190 11
M. musculus Release m36.1 ≈ 2.6 Gb 52 087 53 573 12
    2 192 708 2 579 838 13
    104 104 11
D. melanogaster FB 5 ≈ 162 Mb 172 849 173 674 12
    10 040 282 11 335 034 13
    2 2 10
C. elegans WB 170 ≈ 100 Mb 7 664 7 680 11
    1 092 286 1 151 728 12
    2 262 2 262 11
N. crassa Assembly 7 ≈ 39 Mb 1 064 938 1 082 787 12
    20 213 298 27 903 272 13
    2 2 9
S. cerevisiae S228C SGD 1 ≈ 12 Mb 6 435 6 450 10
    414 520 462 882 11
    248 248 8
S. aureus MSSA476 NC002953 ≈ 2.8 Mb 11 908 13 744 9
    162 113 251 497 10
    1 1 8
T. kodakarensis NC006624 ≈ 2.09 Mb 2 314 2 322 9
    136 917 154 340 10
    3 3 6
M. jannaschii NC000909 ≈ 1.66 Mb 126 150 7
    3 790 4 834 8
    5 5 6
M. genitalium NC000908 ≈ 0.58 Mb 340 380 7
    6 156 8 733 8
  1. The notation corresponds to the number of minimal absent words of length n associated with string S, whereas has a similar meaning but for the case of generic absent words. The generic absent words have been generated using publicly available software provided by Herold et al. 3. The organisms are sorted according to decreasing genome size, which refers to the number of unambiguous bases of the genome. The reversed complement of the sequences has been considered in the generation of the results.