Skip to main content

Table 1 Statistics of the sequences included in the two datasets

From: String kernels for protein sequence comparisons: improved fold recognition

  

Dataset

 
 

CATH2833

CATH793

CATH fold ID

Na

L (SD)b

N

L (SD)

1.10.10

381

79 (26)

36

135 (10)

2.60.40

555

110 (29)

130

140 (16)

3.20.20

251

294 (69)

2

157 (14)

3.30.70

368

182 (59)

52

141 (18)

3.40.50

1278

153 (77)

573

151 (17)

  1. aNumber of proteins in the fold
  2. bMean (standard deviation) of the lengths of the proteins in the fold