Skip to main content

Table 5 The CD-HIT clustering results for the four benchmark datasets.

From: An improved classification of G-protein-coupled receptors using sequence-derived features

  

Dataset

  

γ

D167

D566

D1238

D365

1.0

167

566

1238

365

0.9

100

346

777

361

0.8

73

226

540

361

0.7

61

169

421

361

0.6

52

142

358

359

0.5

38

106

281

357

0.4

30

69

207

356

  1. γ denotes the threshold for the sequence identity percentage. The row of γ = 1.0 gives the total number of proteins in each dataset.