Skip to main content

Table 1 Species name tag sets for different evaluation corpora and LINNAEUS output

From: LINNAEUS: A species name identification system for biomedical literature

Tag set

Document set

Documents

Species

Tags

NCBI taxonomy

MEDLINE

5,237

6,871

8,701

 

PMC OA abs

10

21

21

 

PMC OA

12

26

26

MeSH

MEDLINE

6,817,973

824

7,388,958

 

PMC OA abs

44,552

518

51,592

 

PMC OA

88,826

527

57,874

Entrez gene

MEDLINE

440,084

3,125

486,791

 

PMC OA abs

8,371

406

9,307

 

PMC OA

9,327

428

10,294

EMBL

MEDLINE

174,074

149,598

396,853

 

PMC OA abs

5,157

7,582

12,775

 

PMC OA

7,374

7,867

15,136

PMC linkouts

MEDLINE

35,534

29,351

248,222

 

PMC OA abs

41,054

41,070

286,998

 

PMC OA

42,910

32,187

289,411

Whatizit-Organisms

MEDLINE

71,856

23,598

3,328,853

 

PMC OA abs

82,410 (64,228)

25,375

3,791,412

 

PMC OA

94,289

26,557

4,075,644

Manual

MEDLINE

75

176

3,205

 

PMC OA abs

89 (76)

215

3,878

 

PMC OA

100

233

4,259

LINNAEUS output

MEDLINE

9,919,312

57,802

30,786,517

 

PMC OA abs

88,962 (65,739)

5,114

303,146

 

PMC OA

105,106

18,943

4,189,681

  1. Numbers in parentheses show the portion of abstracts that can be extracted from the document XML files, enabling mention-level accuracy comparisons (see Methods for details).