Skip to main content

Table 2 Overview of our results.

From: Automatic construction of rule-based ICD-9-CM coding systems

  train test
45-class statistical 88.20 86.69
Simple rule-based 84.07 83.21
Rule-based with label-dependencies 85.57 84.85
Hybrid rule-based + C4.5 90.22 88.92
Hybrid rule-based + MaxEnt 90.26 88.93
CMC challenge best system 90.02 89.08
  1. All values are micro-averaged Fβ=1.
  2. The 45-class statistical row stands for a C4.5 classifier trained for single labels. The CMC challenge best system gives the results of the best system that was submitted to the CMC challenge. All our models use the same algorithm to detect negation and speculative assertions, and were trained using the whole training set (simple rule-based model needs no training) and evaluated on the training and the challenge test sets. The difference in performance between the 45-class statistical model and our best hybrid system (that is, using rule-based + MaxEnt models) proved to be statistically significant on both the training and test datasets, using McNemar's test with a p < 0.05 confidence level. On the other hand, the difference between our best hybrid model (constructed automatically) and our manually constructed ICD-9-CM coder (the CMC challenge best system) was not statistically significant on either set.