Skip to main content

Table 1 Clustering of different data-sets of small, medium and large sized protein sequences using different methods

From: CLAP: A web-server for automatic classification of proteins with special reference to multi-domain proteins

Small proteins (10–100 amino acids length)

Number of sequences - 500

Method

# of clusters

Threshold

Word-length

Time

CW

15

0.5

NA

0 m 11.835 s

k-tuple

3

0.5

2

0 m 1.539 s

CLAP

7

0.5

5

2 m 28.322 s

CLUSS

68

NA

4

0 m 11.000 s

CD-HIT

223

0.5

3

0 m 0.034 s

Small proteins (10–100 amino acids length)

Number of sequences - 1000

Method

# of clusters

Threshold

Word-length

Time

CW

23

0.5

NA

0 m 59.788 s

k-tuple

3

0.5

2

0 m 5.659 s

CLAP

17

0.5

5

9 m 52.099 s

CLUSS

NA

NA

NA

0 m 11.000 s

CD-HIT

607

0.5

3

0 m 0.091 s

Medium proteins (400–600 amino acids length)

Number of sequences - 500

Method

# of clusters

Threshold

Word-length

Time

CW

2

0.5

NA

8 m 46.895 s

k-tuple

3

0.5

2

0 m 2.25 s

CLAP

3

0.5

5

2 m 50.918 s

CLUSS

95

NA

4

0 m 3.133 s

CD-HIT

227

0.5

3

0 m 0.592 s

Medium proteins (400–600 amino acids length)

Number of sequences - 1000

Method

# of clusters

Threshold

Word-length

Time

CW

5

0.5

NA

32 m 50.379 s

k-tuple

2

0.5

2

0 m 7.789 s

CLAP

7

0.5

5

11 m1 2.664 s

CLUSS

NA

NA

NA

NA

CD-HIT

708

0.5

3

0 m 3.281 s

Large proteins (850–1000 amino acids length)

Number of sequences - 500

Method

# of clusters

Threshold

Word-length

Time

CW

15

0.5

NA

42 m 1.184 s

k-tuple

4

0.5

2

0 m 2.91 s

CLAP

4

0.5

5

4 m 22.752 s

CLUSS

NA

NA

NA

NA

CD-HIT

125

0.5

3

0 m0.916 s

  1. The processing time was computed using the workstation that hosts the CLAP web-server, with a 2.40 GHz, Intel xeon processor and 16GB RAM running CentOS. The number of clusters generated at a specific threshold and word-length used in the computations is also shown.