Skip to main content

Table 9 NER performance before and after typo correction in the SPR dataset

From: MLM-based typographical error correction of unstructured medical texts for named entity recognition

NER

Precision

Recall

f1-score

Support

Typos

Typo correction

Typos

Typo correction

Typos

Typo correction

B-ORGAN

0.97

1.00

0.31

0.82

0.47

0.90

1022

I-ORGAN

1.00

1.00

0.37

0.73

0.54

0.84

196

B-LOCATION

1.00

1.00

0.25

0.79

0.41

0.88

625

I-LOCATION

1.00

1.00

0.26

0.81

0.41

0.89

1338

B-OPNAME

0.90

0.98

0.54

0.97

0.68

0.97

822

I-OPNAME

1.00

1.00

0.46

0.96

0.63

0.98

803

B-HISTOLOGIC DIAGNOSIS

1.00

1.00

0.83

0.99

0.91

0.99

70

I-HISTOLOGIC DIAGNOSIS

1.00

1.00

0.90

1.00

0.95

1.00

114

B-TUMOR_SIZE

1.00

1.00

0.99

0.99

1.00

1.00

222

 

Typos

Typo correction

 

Accuracy

0.40

0.87

2047

f1-score

0.60

0.85

  1. Dataset included 7% typos. There are total seven tag values. Among them, four entity types, such as ORGAN, LOCATION, OPNAME, and HISTOLOGIC DIAGNOSIS, were located both in B and I, respectively. TUMOR_SIZE was located only in the B format.