Skip to main content

Table 6 Overall accuracy statistics for different tools

From: eCAMBer: efficient support for large-scale comparative analysis of multiple bacterial strains

 

PATRIC

Prodigal

Statistic

Input

MA

eCAMBer

Input

GMV

MA

eCAMBer

# of incorrectly removed genes

NA

0

1224

NA

0

0

388

# of incorrectly added genes

NA

1177

792

NA

0

344

331

# of correctly removed genes

NA

0

3993

NA

0

0

1185

# of correctly added genes

NA

410

701

NA

0

210

1447

# of incorrect →correct TIS changes

NA

4812

1591

NA

149

1015

290

# of incorrect →incorrect TIS changes

NA

2223

747

NA

28

1018

113

# of correct →incorrect TIS changes

NA

4279

669

NA

78

3618

170

Precision for gene starts

0.665

0.663

0.699

0.764

0.764

0.734

0.775

Recall for gene starts

0.695

0.702

0.703

0.752

0.753

0.727

0.765

f1 for gene starts

0.680

0.682

0.701

0.758

0.759

0.731

0.770

Precision for gene ends

0.892

0.882

0.920

0.931

0.931

0.928

0.940

Recall for gene ends

0.931

0.935

0.926

0.917

0.917

0.919

0.927

f1 for gene ends

0.911

0.908

0.923

0.924

0.924

0.923

0.934

  1. Overall statistics for accuracy of changes introduced by eCAMBer, Mugsy-Annotator (MA) and the GMV pipeline. The tools were run on the dataset of 20 E. coli with annotations from the PATRIC database (columns 2 to 4) and generated using Prodigal (columns 5 to 8). Correctness of the changes introduced was assessed by comparison with annotations from the Coliscope database. Columns Input correspond to the original annotations. “NA” stands for not applicable. Rows correspond to different statistics of running each tool.