Results from phylogenetic profile comparison of 708,645 pairs of proteins chosen from among a subset of 1,347 E. coli proteins. (a) Predictive power of pyholgenetic profile analysis. Each point in this plot represents a specific mutual information threshold at which the measures were recorded. Reference sets with diverse bacterial genomes along with a few archaeal and/or eukaryotic genomes (BA, BAE1, BAE2, BAE3a, BAE3b, NR, NR-3, NR-8, LA, and BAE4) perform well over a reference set (B), which comprises just the bacterial genomes. The performances of BAE3a and NR are almost the same in the zoomed-in high specificity region (inset), which suggests that adding redundancy (different strains of the same organism) to the reference set does not improve the performance. The removal of evolutionarily closely-related (uninformative) genomes from the best performing BAE3a (NR-3, NR-8) decreases the performance, but to a small extent. (b) Sensitivity versus specificity plot.