Skip to main content

Table 4 Performance evaluation of LINNAEUS species tagging on different evaluation sets

From: LINNAEUS: A species name identification system for biomedical literature

Set

Level

Main set

TP

FP

FN

Recall

Prec.

NCBI taxonomy

Doc.

MEDLINE

6,888

10,032

(1,807)

0.7922

(0.4071)

  

PMC OA abs

15

20

(6)

0.7143

(0.4286)

  

PMC OA full (abs)

16

166

(3)

0.8421

(0.0791)

  

PMC OA full (all)

22

196

(4)

0.8462

(0.1010)

MeSH

Doc.

MEDLINE

5,073,147

4,577,293

2,315,811

0.6866

0.5257

  

PMC OA abs

36,641

49,151

(14,797)

0.7123

(0.4271)

  

PMC OA full (abs)

46,484

291,872

(2,219)

0.9544

(0.1374)

  

PMC OA full (all)

54,814

346,071

(2,880)

0.9201

(0.1367)

Entrez gene

Doc.

MEDLINE

346,989

171,001

(139,702)

0.7130

(0.6699)

  

PMC OA abs

6,946

4,110

(2,357)

0.7466

(0.6283)

  

PMC OA full (abs)

8,184

38,275

(470)

0.9457

(0.1762)

  

PMC OA full (all)

9,662

42,209

(628)

0.9390

(0.1863)

EMBL

Doc.

MEDLINE

158,462

183,950

(235,745)

0.4020

(0.4627)

  

PMC OA abs

4,807

4,360

(7,902)

0.3782

(0.5244)

  

PMC OA full (abs)

6,601

34,447

(3,859)

0.6311

(0.1608)

  

PMC OA full (all)

9,433

40,212

(5,613)

0.6269

(0.1900)

PMC linkouts

Doc.

MEDLINE

(27,259)

(23,377)

(122,596)

(0.1819)

(0.5383)

  

PMC OA abs

(30,315)

(27,192)

(141,735)

(0.1762)

(0.5272)

  

PMC OA full (abs)

110,288

156,012

61,656

0.6414

0.4141

  

PMC OA full (all)

11,2069

163,052

61,671

0.6450

0.4073

Whatizit-Organisms

Doc.

PMC OA abs

64,686

29,222

12,930

0.8334

0.6888

  

PMC OA full (abs)

308,410

67,171

100,079

0.7550

0.8211

  

PMC OA full (all)

344,445

73,489

109,668

0.7585

0.8242

 

Mention

PMC OA abs

139,077

147,426

39,351

0.7794

0.4854

  

PMC OA full (xml)

1,164,799

1,596,615

527,284

0.6883

0.4218

  

PMC OA full (all)

1,304,620

2,398,321

1,133,018

0.5352

0.3523

Manual

Doc.

PMC OA abs

101

0

3

0.9712

1.0

  

PMC OA full (abs)

421

46

9

0.9791

0.9015

  

PMC OA full (all)

462

49

9

0.9809

0.9041

 

Mention

PMC OA abs

326

3

19

0.9449

0.9909

  

PMC OA full (xml)

3,190

92

222

0.9350

0.9720

  

PMC OA full (all)

3,973

120

241

0.9428

0.9707

  1. Values in parentheses are for comparisons between document sets of different type (for example, evaluation tag sets based on full text compared against species tags generated on abstracts) or when the evaluation set is likely to exclude a large number of species mentions. PMC OA full (all) shows accuracy for all full-text documents. PMC OA full (abs) shows accuracy for all full-text documents with an abstract that can be extracted, allowing comparison of document-level accuracy between full-text and abstract. PMC OA full (xml) shows accuracy for all full-text documents with XML abstract, allowing comparison of mention-level accuracy between full-text and abstracts.