Skip to main content

Table 1 DNA sequence dataset used in this report.

From: Local Renyi entropic profiles of DNA sequences

Name

Sequence description

Length [bp]

m3

random with inserted motif L = 3 'ATC'

2000

m4

random with inserted motif L = 4 'ATCG'

2000

m5

random with inserted motif L = 5 'ATCGA'

2000

Es

experimental promoter regions of B. subtilis

2000

Ec

Escherichia coli K12, complete genome [GenBank:NC_000913]

4639675

Hi

Haemophilus influenzae Rd KW20, complete genome [GenBank:NC_000907]

1830138

  1. The artificial sequences m3, m4 and m5 are obtained by generating random DNA (with symbol emission probabilities pA = pT = pC = pG = 0.25) and subsequently implanting the motifs described (respectively 'ATC', 'ATCG' and 'ATCGA') in specific positions. The sequence Es corresponds to the concatenation of real DNA from 20 promoter regions of Bacillus subtilis [45, 46], with known consensus structured motif TTGACA-(space)-TATAAT with at most one point mutation or substitution. The sequences Ec and Hi are the complete genomes of Escherichia coli and Haemophilus influenzae extracted from NCBI GenBank.