Skip to main content

Table 3 Number of correctly and incorrectly predicted speciesa for different thresholdsb without clade exclusion. Some methods vastly overpredict the number of species, even when the true number of species is low (in this case the true number of species is 11)

From: Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities

 

No cutoffb

Cutoff > 0.01 %b

Cutoff > 0.1 %b

Cutoff > 1 %b

Method

Correct

Incorrect

Correct

Incorrect

Correct

Incorrect

Correct

Incorrect

CARMA3

11

56

11

4

11

0

10

0

CLARK

11

364

11

25

11

5

11

0

DiScRIBinATE RAPSearch2c

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

Kraken

11

327

11

25

11

5

11

0

Filtered Kraken

11

14

11

1

11

0

11

0

MEGAN4 BlastN

11

110

11

19

11

3

9

1

MEGAN4 RAPSearch2

11

183

11

41

11

1

9

1

MetaBin

11

561

10

77

10

6

10

1

MetaCV

11

1226

11

232

11

6

10

1

MetaPhyler

11

9

11

9

11

5

7

1

PhymmBLc

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

RITA

11

466

10

80

10

10

10

1

TACOAc

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

MG-RAST best hit

11

927

10

180

10

36

10

8

MG-RAST LCA

11

476

11

69

11

5

11

1

  1. aUsing the FW in vitro dataset of sequenced reads from 11 species
  2. bA cutoff of > × %, for example 0.01 %, would indicate that only species with a predicted abundance of at least x % of the total set of predictions were considered. Correctly predicted species are any of the 11 species that were used to simulate the reads in the dataset, whereas any other predicted species was incorrect
  3. cThese methods do not predict to the species level at this read length (they require longer read lengths). See additional analyses at other levels of clade exclusion