Skip to main content

Table 1 Composition of the smaller data sets (#1-#8)

From: Alignment-free clustering of large data sets of unannotated protein conserved regions using minhashing

Data set

# Sequences

% Bacteria

% Archaea

% Eukaryota

#1

1424

100%

0%

0%

#2

1542

100%

0%

0%

#3

1479

100%

0%

0%

#4

2037

95.4%

2.6%

2.0%

#5

808

93.1%

3.4%

3.5%

#6

2565

63.4%

1.2%

35.4%

#7

2138

29.5%

1.7%

68.8%

#8

1938

11.4%

1.8%

86.8%