Skip to main content

Table 1 Preselection of well-performing GBDP methods from the correlation analysis

From: Genome sequence-based species delimitation with confidence intervals and improved distance functions

 

Correlations

Settings

Dataset

Type

Estimate

Alignment tool or method

E-value filter

Algorithm

Formula

DS1

Kendall

-0.761

BLAT

10

Coverage

d6, d8

  

-0.752

BLAST+ (WL46)

10

Coverage

d6, d8

  

-0.677

BLAST+ (WL46)

10

Coverage

d 4

 

Pearson

-0.956

BLAT

10

Greedy

d 6

  

-0.956

BLAT

10−2

Trimming

d 6

  

-0.946

BLAST+ (WL38)

10

Coverage

d 4

  

-0.935

BLAST+ (WL46)

10

Coverage

d6, d8

DS2

Kendall

-0.763

BLAT

10

Coverage

d6, d8

 

Pearson

-0.954

BLAT

10

Coverage

d6, d8

DS3

Kendall

-0.783

BLAST+ (WL38)

any

Coverage

d6, d8

  

-0.717

ANI

-

-

-

 

Pearson

-0.980

MUMmer (MR20)

-

Greedy

d0, d6

  

-0.973

ANI

-

-

-

DS4

Kendall

-0.737

BLAT

10, 10−2

Coverage

d6, d8

  

-0.735

BLAST+ (WL45)

any

Coverage

d6, d8

  

-0.693

Tetra

-

-

-

  

-0.598

ANIb

-

-

-

  

-0.594

ANIm

-

-

-

 

Pearson

-0.957

BLAT

10−2

Greedy

d 6

  

-0.904

ANIm

-

-

-

  

-0.703

ANIb

-

-

-

  

-0.693

Tetra

-

-

-

  1. Juxtaposition of DDH correlation values for best-performing GBDP methods as well as (i) ANI [6] and (ii) JSpecies[7] implementation (ANIm, ANIb, Tetra). The content of the respective data sets DS1-DS4 is described in Materials and Methods, whereas the full table with all correlation results is found in Additional file 4. For convenience, the correlation coefficients’ sign of the ANI values is inverted to allow for the direct comparison toward GBDP (-1 is the optimal value). Abbreviations used: WL (wordlength) and MR (mumreference). The listed BLAT runs were all conducted under the same settings (minScore=30, minIdentity=90, tileSize=12). GBDP-based correlations surpass any of the ANI implementations throughout the respective data sets.