Skip to main content

Table 2 Running Time Analysis

From: XSTREAM: A practical algorithm for identification and architecture modeling of tandem repeats in protein sequences

Source

Length, Mbp

Time, min g= 3

Time, min g= 0

Longest period

S. cerevisiae Chr. I

0.23

0.25

0.12

135 (17.9)

S. cerevisiae Chr. VIII

0.56

0.58

0.29

1998 (2)

H. sapiens β TCR

0.68

0.77

0.36

340 (2)

S. cerevisiae Chr. XII

1.0

1.2

0.49

9137 (2)

M. magneticum AMB-1

4.9

6.4

2.2

1158 (4.2)

H. sapiens Chr. I contig

9.8

13.5

4.7

18557 (2.1)

Source

Length, Mbp

Time, min g = 3

Time, min g = 0

Longest period

H. sapiens Chr. XXI

33.0

34.4

16.4

3379 (2)

R. norvegicus

80.7

86.7

39.1

2715 (2)

H. sapiens Chr. X

127.6

134.7

64.1

4863 (2)

M. musculus Chr I

202.5

239.1

90.0

3773 (2)

Source

No. of Proteins

Time, min g = 3

Time, min g = 0

# TRs (# TRPs)

Swiss-Prot v.30

40292

1.5

0.55

2428 (3771)

Swiss-Prot v.38

80000

2.6

1.1

3762 (7012)

Swiss-Prot v.45

163633

5.4

2.4

5302 (12359)

Swiss-Prot v.50.5

230150

7.3

3.5

6444 (17097)

  1. Running times for the analysis of different input sequence datasets are shown, with the gap parameter g = 3, or g = 0. The following DNA sequences were downloaded from NCBI: S. cerevisiae Chromosomes I (gi 85666109), VIII (gi 82795252), and XII (gi 85666119), H. sapiens Chromosomes X (gi 89033689) and XXI (89058287), Chromosome I contig (gi 29789880), and the β T-cell receptor locus (gi 114841177), R. norvegicus Chromosome XVI (gi 109504251), M. musculus Chromosome I (gi 83274080), and the M. magneticum AMB-1 (gi 82943940) genome. Sequences at the top (0.23 – 9.8 Mbp) were run with minD = 20, minP = 1, and all possible maximum periods. Longer DNA sequences (33 – 202.5 Mbp) were run with minD = 50, minP = 10, and (due to memory limitations) maximum period = 100 kbp; divide-and-conquer (see Appendix) was used for periods < 1000 (fragment length = 1 Mbp). For each longest period found, the copy number is shown in parentheses. These data show a linear relationship between running time and increasing input sequence length (R2 > 0.99). Running times for analysis of 4 Swiss-Prot datasets, using minD = 10 and minP = 1, shown at the bottom, including the number of TRs detected (using consensus comparison, see Appendix) and the number of TR-containing proteins found (in parentheses). XSTREAM running time scaled linearly with increasing Swiss-Prot dataset size (R2 > 0.998).