Skip to main content

Table 1 Statistics of the sequences included in the two datasets

From: String kernels for protein sequence comparisons: improved fold recognition

   Dataset  
  CATH2833 CATH793
CATH fold ID Na L (SD)b N L (SD)
1.10.10 381 79 (26) 36 135 (10)
2.60.40 555 110 (29) 130 140 (16)
3.20.20 251 294 (69) 2 157 (14)
3.30.70 368 182 (59) 52 141 (18)
3.40.50 1278 153 (77) 573 151 (17)
  1. aNumber of proteins in the fold
  2. bMean (standard deviation) of the lengths of the proteins in the fold