Skip to main content
Figure 5 | BMC Bioinformatics

Figure 5

From: Improving probe set selection for microbial community analysis by leveraging taxonomic information of training sequences

Figure 5

Average pairwise sequence distance metric. This metric focuses on how inaccurate a probe set's low-fidelity fingerprints are. Lower scores are better. The graph was constructed as follows. For each low-fidelity distinct fingerprint of a probe set, the average pairwise sequence difference between its underlying DNA sequences was determined. A count of how many fingerprints within each binned (1% increments) average was kept. Each point represents the average count of each bin for 100 probe sets. MFPS A (OTU and genus penalties set to 1 and 0, respectively) is superior to MDPS except for having a few more fingerprints from 0% to 1%; scores in this range are from highly similar sequences but from OTUs in different genera. MFPS B (OTU and genus penalties set to 1 and 30, respectively) shows further improvement in distances greater than 1%, but unlike MFPS A or MDPS, has markedly fewer low-fidelity distinct fingerprints with sequence distances from 0% to 1%. The improvement in distances greater than 1% is the same windfall seen in HFR scores when the genus-level penalty was set to 30 (see Figure 3). MFPS C (OTU and genus penalties set to 1 and 200, respectively) shows only a small improvement over MFPS B. Error bars (showing upper bars only for better visibility) are standard deviations from 100 probe sets.

Back to article page