Datasets | Type of typo | No Typo | Total |
---|
Replace | Delete | Transpose | Insert |
---|
NCBI-disease | 1014 | 1073 | 911 | 946 | 20,553 | 24,497 |
SPRs | 965 | 948 | 877 | 827 | 46,051 | 49,668 |
- The total number of tokens was 24,497 in the NCBI-disease dataset. the total number of tokens with four error types was 3944, accounting for 16.10% of words. The total number of tokens without error (No typo) was 20,553, accounting for 83.91% of words. The total number of tokens was 49,668 in the SPR dataset. The total number tokens with four error types was 3617, accounting for 7.28% of words. The total number of tokens without error was 46,051, accounting for 92.72% words.