Skip to main content

Table 2 Number of CUIs expected between the gold standard annotations in the Quaero corpus and the adapted Quaero corpus

From: SIFR annotator: ontology-based semantic annotation of French biomedical text and clinical notes

 

Quaero

Adapted Quaero

 

EMEA Dev

MEDLINE Dev

EMEA Dev

MEDLINE Dev

CUIs (uniq.)

2261 (526)

2978 (1843)

1733 (425)

2465 (1477)

EMEA Test

MEDLINE Test

EMEA Test

MEDLINE Test

CUIs (uniq.)

2203 (474)

3093 (1907)

1710 (388)

2606 (1544)

EMEA Train

MEDLINE Train

EMEA Train

MEDLINE Train

CUIs (uniq.)

2695 (651)

2995 (1861)

2279 (541)

2491 (1488)

  1. For the uniq. Statistic, only the first occurrence of a CUI is counted. In MEDLINE, each document is a title of 10–15 word forms on average, while EMEA documents are full notices with several hundred word forms each