Skip to main content

Table 4 Quantification of discontinuous and overlapping words in all concept mentions

From: Concept recognition as a machine translation problem

Ontology

# words in all concept mentions

% words in discontinuous mentions (%)

% words between text spans of discontinuous mentions (%)

% words overlapping multiple mentions (%)

ChEBI

5985

0.3

0.6

0.1

CL

6576

4.3

4.3

2.6

GO_BP

12,956

5.2

7.0

1.6

GO_CC

5864

1.5

2.1

0.5

GO_MF

376

0

0

0

MOP

257

0

0

0

NCBITaxon

7696

0.03

0.03

0.03

PR

23,261

0.5

0.2

0.9

SO

10,348

1.2

1.8

0.5

UBERON

15,681

2.0

2.3

0.8

  1. All numbers are based on the number of words, not concepts