From: Alignment-free clustering of large data sets of unannotated protein conserved regions using minhashing
Data set
# Sequences
% Bacteria
% Archaea
% Eukaryota
#1
1424
100%
0%
#2
1542
#3
1479
#4
2037
95.4%
2.6%
2.0%
#5
808
93.1%
3.4%
3.5%
#6
2565
63.4%
1.2%
35.4%
#7
2138
29.5%
1.7%
68.8%
#8
1938
11.4%
1.8%
86.8%