Skip to main content

Table 3 Cross-validation results using different methods

From: Using natural language processing and machine learning to identify breast cancer local recurrence

Methods

P (SD)

R (SD)

F (SD)

AUC (SD)

Filtered MetaMap +Pathology Report Count (4151)

0.84 (0.04)

0.76 (0.02)

0.80 (0.02)

0.93 (0.01)

Full MetaMap (17897)

0.80 (0.06)

0.48 (0.05)

0.60 (0.05)

0.83 (0.03)

Filtered MetaMap (4150)

0.82 (0.03)

0.67 (0.02)

0.74 (0.02)

0.90 (0.01)

Bag of Words (57612)

0.69 (0.07)

0.42 (0.062)

0.52 (0.06)

0.78 (0.03)

  1. The number in the parenthesis in the first column is the number of features. The number in parenthesis in the 2nd~5th columns is standard deviation
  2. Gray shade indicates baseline methods
  3. P stands for precision, R stands for recall, F stands for f score, AUC stands for area under the receiver operator characteristic curve, and SD is standard deviation