Fig. 5From: Alignment-free clustering of large data sets of unannotated protein conserved regions using minhashingHeatmap generated based on the pairwise local similarity percentages of the sequences in the PF02801.19 domain family of Pfam. The darker rectangles represent sub-clusters that are more similar to each other than to the rest of the cluster. The overlaid percentages show the F1-value of the matching clusters from the output of our algorithm and the sub-clusters obtained by cutting the hierarchical clustering tree to generate four sub-clusters based on pairwise similarity scores. The F1 score of the matches from larger to smaller sub-clusters are 93%, 90%, 13%, and 66%Back to article page