BMC Bioinformatics

Table 6 Number of sequences, families, and super-families in the datasets

From: Evaluation and improvements of clustering algorithms for detecting remote homologous protein families

Dataset	Sequences		Families		Super-families
A-10	3461	(55%)	970	(25%)	589	(30%)
A-20	4260	(60%)	1144	(28%)	684	(34%)
A-30	6532	(72%)	1572	(38%)	868	(44%)
A-50	10816	(84%)	2109	(49%)	1080	(55%)
A-70	13391	(87%)	2306	(54%)	1162	(59%)
A-90	15861	(90%)	2420	(56%)	1222	(62%)
A-95	17505	(91%)	2521	(59%)	1273	(64%)
GOLD	866	(100%)	91	(100%)	5	(100%)

Numbers in parenthesis indicate the percentage of sequences/families/super-families that remained after removing singletons.

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com