Fig. 7From: Alignment-free clustering of large data sets of unannotated protein conserved regions using minhashingF1-value comparison for NADDA-annotated conserved regions of data set #9 using different numbers of hash functions. The red line represents the F1 score computed at the end of each iteration for checking with the termination condition (d=40). A comparison between Pfam and pClust protein clusters (overlapping) and the clustering of proteins generated at each iteration is shown with blue and red lines. The dashed line represents the number of hash functions where the termination condition is met (for Ï„=0.9 and d=40)Back to article page