Skip to main content


Figure 2 | BMC Bioinformatics

Figure 2

From: Genome sequence-based species delimitation with confidence intervals and improved distance functions

Figure 2

Results of the correlation analyses between GBDP-derived distances and DDH as opposed to the correlations between ANI and DDH. A: The performance of both GBDP and ANI regarding their correlation with wet-lab DDH is shown. The boxplots visualize the correlation results for the data sets DS1-4, created for conducting fair comparisons between GBDP, the original ANI implementation [6] and JSpecies [7](green circles: Kendall’s τ; orange triangles: Pearson’s ρ). For the purpose of an easier visualization, the scale has been bound by 0 and -1, thus omitting a few outliers greater than 0, and the sign of correlation values involving similarities was inverted. The correlation coefficients between ANI and DDH are highlighted by horizontal lines, either dotted (DS3, ANI; DS4, ANIm), dot-dashed (DS4, ANIb) or long-dashed (DS4, Tetra). B: GBDP correlations (DS1) dependent on the alignment tools used: BLAT (BT), BLAST+ (BP), NCBI-BLAST (NB), WU-BLAST (WU), MUMmer (MU) and BLASTZ (BZ). The dotted lines represent the globally best correlation (i.e., the most negative one), and the boxplots are sorted increasingly by their most negative Kendall coefficient, i.e., the best setting can be found at the leftmost position. The same applies to C and D. C: Results for DS1 dependent on the algorithms “coverage” (COV), “greedy” (GR) and “greedy-with-trimming” (TR). D: Correlations based on DS1 dependent on distance formulae d0 - d9. For obvious reasons, the distance formulae d0, d1, d4, d6 and d7 yielded the same Kendall correlations as their logarithmized variants d2, d3, d5, d8 and d9.

Back to article page