Figure 2From: Gene identification and protein classification in microbial metagenomic sequence data via incremental clusteringPercentage of Unrelated Pairs in Clusters. For all clusters, only Pfam match-containing sequences were considered. For the top curve (labeled All), all clusters with at least two Pfam match-containing sequences were considered where as for the second curve (labeled At least 5), only those clusters with at least five Pfam match-containing sequences were considered. For the later curve, it is seen that 94% of the reported clusters have no unrelated pairs. The bottom two curves show the trends for the "strict" version of unrelatedness.Back to article page