Skip to main content

Table 1 Statistics for the concept annotations in the training (67-document) and evaluation (30-document) data sets for all ontologies

From: Concept recognition as a machine translation problem

Ontology

# training set annotations

avg/median # training set annotations per article

# evaluation set annotations

Avg/median # evaluation set annotations per article

ChEBI

4548

68/45

2200

73/45

ChEBI_EXT

11,915

178/142

5248

175/142

CL

4043

60/32

1749

58/32

CL_EXT

6276

94/64

2872

96/64

GO_BP

9280

139/108

3681

123/108

GO_BP_EXT

13,954

208/158

5847

195/158

GO_CC

4075

61/33

1184

39/33

GO_CC_EXT

8495

127/91

3217

107/91

GO_MF

375

6/2

94

3/2

GO_MF_EXT

4070

61/34

1822

61/34

MOP

240

4/2

101

3/2

MOP_EXT

386

6/2

111

4/2

NCBITaxon

7362

110/90

3101

103/90

NCBITaxon_EXT

7592

113/97

3219

107/97

PR

17,038

254/198

6409

214/198

PR_EXT

19,862

296/246

7932

264/246

SO

8797

131/118

3446

115/118

SO_EXT

24,955

372/341

9136

305/341

UBERON

12,269

183/130

6551

218/130

UBERON_EXT

14,910

223/165

7416

247/165

  1. Avg average