Skip to main content

Table 3 Number of minimal absent words and generic absent words for some genomes.

From: On finding minimal absent words

Organism

Reference

Genome size

Length, n

   

104

104

11

H. sapiens

Release 36.1

≈ 2.9 Gb

44 149

44 970

12

   

2 039 862

2 368 682

13

   

190

190

11

M. musculus

Release m36.1

≈ 2.6 Gb

52 087

53 573

12

   

2 192 708

2 579 838

13

   

104

104

11

D. melanogaster

FB 5

≈ 162 Mb

172 849

173 674

12

   

10 040 282

11 335 034

13

   

2

2

10

C. elegans

WB 170

≈ 100 Mb

7 664

7 680

11

   

1 092 286

1 151 728

12

   

2 262

2 262

11

N. crassa

Assembly 7

≈ 39 Mb

1 064 938

1 082 787

12

   

20 213 298

27 903 272

13

   

2

2

9

S. cerevisiae S228C

SGD 1

≈ 12 Mb

6 435

6 450

10

   

414 520

462 882

11

   

248

248

8

S. aureus MSSA476

NC002953

≈ 2.8 Mb

11 908

13 744

9

   

162 113

251 497

10

   

1

1

8

T. kodakarensis

NC006624

≈ 2.09 Mb

2 314

2 322

9

   

136 917

154 340

10

   

3

3

6

M. jannaschii

NC000909

≈ 1.66 Mb

126

150

7

   

3 790

4 834

8

   

5

5

6

M. genitalium

NC000908

≈ 0.58 Mb

340

380

7

   

6 156

8 733

8

  1. The notation corresponds to the number of minimal absent words of length n associated with string S, whereas has a similar meaning but for the case of generic absent words. The generic absent words have been generated using publicly available software provided by Herold et al. 3. The organisms are sorted according to decreasing genome size, which refers to the number of unambiguous bases of the genome. The reversed complement of the sequences has been considered in the generation of the results.