Skip to main content

Table 2 Statistics for the concept annotation classes used in the training (67-document) and evaluation (30-document) data sets and for those added as additional training data for concept normalization for all ontologies

From: Concept recognition as a machine translation problem

Ontology

# training set annotation classes

Avg/median # training set annotation classes per article

# classes added to training set

# evaluation set annotation classes

Avg/median # evaluation set annotation classes per article

ChEBI

1463

22/18

58,214

627

21/20

ChEBI_EXT

2852

43/38

58,439

1167

39/39

CL

581

9/7

2163

253

8/9

CL_EXT

651

10/8

2168

286

10/10

GO_BP

1586

24/21

29,213

682

23/23

GO_BP_EXT

2511

37/33

29,301

1090

36/37

GO_CC

677

10/9

4052

212

7/6

GO_CC_EXT

896

13/12

4086

296

10/9

GO_MF

49

1/1

10951

19

1/1

GO_MF_EXT

738

11/11

10,031

377

13/12

MOP

85

1/1

3574

32

1/1

MOP_EXT

108

2/1

3578

40

1/1

NCBITaxon

690

10/9

1,175,661

315

11/9

NCBITaxon_EXT

757

11/10

1,175,682

346

12/10

PR

1278

19/18

213,371

466

16/16

PR_EXT

1534

23/22

213,531

588

20/19

SO

1216

18/18

2256

544

18/19

SO_EXT

3172

47/47

2405

1409

47/48

UBERON

2048

31/24

14,057

1040

35/31

UBERON_EXT

2409

36/29

14,113

1217

41/38

  1. Avg average