Skip to main content

Table 2 Number of CUIs expected between the gold standard annotations in the Quaero corpus and the adapted Quaero corpus

From: SIFR annotator: ontology-based semantic annotation of French biomedical text and clinical notes

  Quaero Adapted Quaero
  EMEA Dev MEDLINE Dev EMEA Dev MEDLINE Dev
CUIs (uniq.) 2261 (526) 2978 (1843) 1733 (425) 2465 (1477)
EMEA Test MEDLINE Test EMEA Test MEDLINE Test
CUIs (uniq.) 2203 (474) 3093 (1907) 1710 (388) 2606 (1544)
EMEA Train MEDLINE Train EMEA Train MEDLINE Train
CUIs (uniq.) 2695 (651) 2995 (1861) 2279 (541) 2491 (1488)
  1. For the uniq. Statistic, only the first occurrence of a CUI is counted. In MEDLINE, each document is a title of 10–15 word forms on average, while EMEA documents are full notices with several hundred word forms each
\