Skip to main content

Table 1 Comparison of methods for document-level species annotation

From: Species identification for gene name normalization

  

GS: MeSH terms

GS: UniProt references

Species

Method

 

P

R

F

 

P

R

F

 

journal heuristic

 

0.908

0.632

0.745

 

(0.011)

0.231

(0.021)

 

SVM

 

0.710

0.775

0.741

 

(0.024)

0.781

(0.046)

 

Ali Baba [1]

 

0.888

0.583

0.703

 

(0.033)

0.654

(0.063)

Human

LINNAEUS [2]

 

0.900

0.660

0.761

 

(0.030)

0.659

(0.057)

 

GNAT [3] ( Ali Baba)

 

0.878

0.318

0.467

 

(0.056)

0.618

(0.103)

 

GNAT ( LINNAEUS)

 

0.609

0.507

0.553

 

(0.037)

0.944

(0.072)

 

UniProt

 

0.934

0.031

0.060

 

(1.000)

(1.000)

(1.000)

 

journal heuristic

 

0.146

0.310

0.198

 

(0.008)

0.468

(0.015)

 

SVM

 

0.217

0.289

0.248

 

(0.010)

0.387

(0.019)

 

Ali Baba

 

0.654

0.605

0.628

 

(0.031)

0.829

(0.059)

E.Coli

LINNAEUS

 

0.665

0.602

0.632

 

(0.032)

0.838

(0.061)

 

GNAT ( Ali Baba)

 

0.771

0.301

0.434

 

(0.064)

0.730

(0.118)

 

GNAT ( LINNAEUS)

 

0.058

0.415

0.102

 

(0.004)

0.847

(0.008)

 

UniProt

 

0.946

(0.032)

(0.063)

 

(1.000)

(1.000)

(1.000)

 

RegulonDB [4]

 

0.857

(0.107)

(0.191)

 

(0.175)

0.640

(0.275)

  1. Legend: GS - gold standard species labeling. Only human and E. coli shown for brevity. For comparison, we also provide inter-gold standard agreement between MeSH, UniProt and RegulonDB. Using UniProt as gold standard, only recall can be compared in a cross-corpus sense as UniProt does not reference all papers mentioning a protein. For the same reason, when using databases for prediction, only precision is comparable.