BMC Bioinformatics

Table 1 Composition of the smaller data sets (#1-#8)

From: Alignment-free clustering of large data sets of unannotated protein conserved regions using minhashing

Data set	# Sequences	% Bacteria	% Archaea	% Eukaryota
#1	1424	100%	0%	0%
#2	1542	100%	0%	0%
#3	1479	100%	0%	0%
#4	2037	95.4%	2.6%	2.0%
#5	808	93.1%	3.4%	3.5%
#6	2565	63.4%	1.2%	35.4%
#7	2138	29.5%	1.7%	68.8%
#8	1938	11.4%	1.8%	86.8%

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com