Skip to main content

Table 1 The genomes used to benchmark the performance of the algorithms

From: Fast parallel construction of variable-length Markov chains

Organism

GenBank identifier

Sequence length (bp)

GC %

Pandoravirus salinus

NC_022098.1

2,473,870

61.72

Sorangium cellulosum

GCF_004135735.1

11,261,481

72.58

Drosophila melanogaster (Fruit fly)

GCA_004798055.1

133,403,897

42.12

Oryza sativa (Rice)

GCA_001623365.2

387,424,359

43.61

Symbiodinium kawagutii (Dinoflagellate)

GCA_009767595.1

935,067,369

45.54

Homo sapiens (Human)

GCA_000001405.28

3,099,706,404

41.04

Palaemon carinicauda (Crustacean)

GCA_004011675.1

6,699,723,695

37.37

Pinus taeda (Loblolly Pine)

GCA_000404065.3

22,103,635,615

37.45

  1. The sequences are selected to represent various domains, with viruses, bacteria, insects, plants and animals represented, and with an emphasis on sequence length