Skip to main content

Table 1 Segregation success and run-times for different clustering strategies are tabulated for: a) a single peptide pre-sort with increasing numbers of alignment cycles, b) staged alignment cycles from 90 to 50 % identity each with a peptide presort (using 1 pass), c) as for (b) but with multiple pre-sort cycles indicated in parentheses)

From: Reduction, alignment and visualisation of large diverse sequence families

a

Single pre-sort

Alignment

Time

Sequences

Remaining

stages (to 50 %)

sec.

selected

subfamilies

1

58.8

1658

503

2

91.4

355

302

3

226.1

196

171

4

462.6

175

154

5

727.1

172

151

b

Staged pre-sort

Alignment

Time

Sequences

Remaining

stages (3 to X%)

sec.

selected

subfamilies

90 (1)

36.30

1597

947

80 (1)

1.33

563

314

70 (1)

1.00

165

31

60 (1)

0.53

104

24

50 (1)

0.47

71

21

c

Staged (multi-pass) pre-sort

Alignment

Time

Sequences

Remaining

stages (3 to X%)

sec.

selected

subfamilies

90 (8)

10.62

3641

598

80 (4)

4.26

1034

93

70 (2)

2.31

285

40

60 (1)

1.10

98

20

50 (1)

0.42

62

21

  1. The data columns indicate the elapsed time in seconds (real time reported by the Linux time utility), the number of the 10,000 starting sequences remaining after each stage and the number of families or subfamilies (defined by sequence adjacency