Skip to main content

Table 5 The CD-HIT clustering results for the four benchmark datasets.

From: An improved classification of G-protein-coupled receptors using sequence-derived features

   Dataset   
γ D167 D566 D1238 D365
1.0 167 566 1238 365
0.9 100 346 777 361
0.8 73 226 540 361
0.7 61 169 421 361
0.6 52 142 358 359
0.5 38 106 281 357
0.4 30 69 207 356
  1. γ denotes the threshold for the sequence identity percentage. The row of γ = 1.0 gives the total number of proteins in each dataset.