Skip to main content

Advertisement

Table 1 Preselection of well-performing GBDP methods from the correlation analysis

From: Genome sequence-based species delimitation with confidence intervals and improved distance functions

  Correlations Settings
Dataset Type Estimate Alignment tool or method E-value filter Algorithm Formula
DS1 Kendall -0.761 BLAT 10 Coverage d6, d8
   -0.752 BLAST+ (WL46) 10 Coverage d6, d8
   -0.677 BLAST+ (WL46) 10 Coverage d 4
  Pearson -0.956 BLAT 10 Greedy d 6
   -0.956 BLAT 10−2 Trimming d 6
   -0.946 BLAST+ (WL38) 10 Coverage d 4
   -0.935 BLAST+ (WL46) 10 Coverage d6, d8
DS2 Kendall -0.763 BLAT 10 Coverage d6, d8
  Pearson -0.954 BLAT 10 Coverage d6, d8
DS3 Kendall -0.783 BLAST+ (WL38) any Coverage d6, d8
   -0.717 ANI - - -
  Pearson -0.980 MUMmer (MR20) - Greedy d0, d6
   -0.973 ANI - - -
DS4 Kendall -0.737 BLAT 10, 10−2 Coverage d6, d8
   -0.735 BLAST+ (WL45) any Coverage d6, d8
   -0.693 Tetra - - -
   -0.598 ANIb - - -
   -0.594 ANIm - - -
  Pearson -0.957 BLAT 10−2 Greedy d 6
   -0.904 ANIm - - -
   -0.703 ANIb - - -
   -0.693 Tetra - - -
  1. Juxtaposition of DDH correlation values for best-performing GBDP methods as well as (i) ANI [6] and (ii) JSpecies[7] implementation (ANIm, ANIb, Tetra). The content of the respective data sets DS1-DS4 is described in Materials and Methods, whereas the full table with all correlation results is found in Additional file 4. For convenience, the correlation coefficients’ sign of the ANI values is inverted to allow for the direct comparison toward GBDP (-1 is the optimal value). Abbreviations used: WL (wordlength) and MR (mumreference). The listed BLAT runs were all conducted under the same settings (minScore=30, minIdentity=90, tileSize=12). GBDP-based correlations surpass any of the ANI implementations throughout the respective data sets.