Table 1 The genomes used to benchmark the performance of the algorithms

Organism GenBank identifier Sequence length (bp) GC %
Pandoravirus salinus NC_022098.1 2,473,870 61.72
Sorangium cellulosum GCF_004135735.1 11,261,481 72.58
Drosophila melanogaster (Fruit fly) GCA_004798055.1 133,403,897 42.12
Oryza sativa (Rice) GCA_001623365.2 387,424,359 43.61
Symbiodinium kawagutii (Dinoflagellate) GCA_009767595.1 935,067,369 45.54
Homo sapiens (Human) GCA_000001405.28 3,099,706,404 41.04
Palaemon carinicauda (Crustacean) GCA_004011675.1 6,699,723,695 37.37
Pinus taeda (Loblolly Pine) GCA_000404065.3 22,103,635,615 37.45
  1. The sequences are selected to represent various domains, with viruses, bacteria, insects, plants and animals represented, and with an emphasis on sequence length