Skip to main content
Figure 4 | BMC Bioinformatics

Figure 4

From: Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering

Figure 4

Log-Log Plot of Cluster Size Distribution. The x-axis is the logarithm of the cluster size C and the y-axis is the logarithm of the number of clusters of size ≥C. Logarithms are in base 10. The blue curve is the observed data, which is consistent with a power law. There is an inflection point around C = 2500 (a value of 3.4 on the x-axis). The two red lines are the least square fit to C ≤ 2500 and C > 2500, respectively. The former line is y = -0.733*x + 5.517, with R2 = 0.995, and the later line is y = -1.686*x + 8.813, with R2 = 0.992.

Back to article page