Concept recognition as a machine translation problem

BMC Bioinformatics

Table 1 Statistics for the concept annotations in the training (67-document) and evaluation (30-document) data sets for all ontologies

Ontology	# training set annotations	avg/median # training set annotations per article	# evaluation set annotations	Avg/median # evaluation set annotations per article
ChEBI	4548	68/45	2200	73/45
ChEBI_EXT	11,915	178/142	5248	175/142
CL	4043	60/32	1749	58/32
CL_EXT	6276	94/64	2872	96/64
GO_BP	9280	139/108	3681	123/108
GO_BP_EXT	13,954	208/158	5847	195/158
GO_CC	4075	61/33	1184	39/33
GO_CC_EXT	8495	127/91	3217	107/91
GO_MF	375	6/2	94	3/2
GO_MF_EXT	4070	61/34	1822	61/34
MOP	240	4/2	101	3/2
MOP_EXT	386	6/2	111	4/2
NCBITaxon	7362	110/90	3101	103/90
NCBITaxon_EXT	7592	113/97	3219	107/97
PR	17,038	254/198	6409	214/198
PR_EXT	19,862	296/246	7932	264/246
SO	8797	131/118	3446	115/118
SO_EXT	24,955	372/341	9136	305/341
UBERON	12,269	183/130	6551	218/130
UBERON_EXT	14,910	223/165	7416	247/165

ISSN: 1471-2105