BMC Bioinformatics

Table 5 The CD-HIT clustering results for the four benchmark datasets.

From: An improved classification of G-protein-coupled receptors using sequence-derived features

		Dataset
γ	*D167*	*D566*	*D1238*	*D365*
1.0	167	566	1238	365
0.9	100	346	777	361
0.8	73	226	540	361
0.7	61	169	421	361
0.6	52	142	358	359
0.5	38	106	281	357
0.4	30	69	207	356

γ denotes the threshold for the sequence identity percentage. The row of γ = 1.0 gives the total number of proteins in each dataset.

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com