Skip to main content

Table 5 Analysis of error causes in extracted data

From: Combining MEDLINE and publisher data to create parallel corpora for the automatic translation of biomedical text

Extraction

Total number

Cause

Number

French

Correct

15

No TIFR in MEDLINE

6

TIFR difference in MEDLINE vs. publisher

9

Incorrect

90

Inverted EN/FR Incomplete title extraction

60

Keyword extraction instead of title

18

Incomplete abstract extraction

8

4

Spanish

Correct

59

Title difference in MEDLINE vs. publisher

9

No TIES in MEDLINE

49

Abstract difference in MEDLINE vs. publisher

1

Incorrect

174

Title difference in MEDLINE vs. publisher

7

Incomplete abstract extraction

66

  

Erroneous ABEN extraction

101