Skip to main content

Table 2 Collocations found in redundant and non-redundant corpora

From: Redundancy in electronic health record corpora: analysis, impact on text mining performance and mitigation strategies

Ā 

All informative (redundant)

Last informative (non-redundant)

Word Types

81,928

40,774

Words

3,641,031

545,231

Collocations

15,814

2,527

Collocations/Word

0.004

0.004

Avg. number of patients per collocation

18.2

66

% collocations that appear in notes of 3 patients or less

36 %

1 %

  1. Collocations were extracted using a stringent cutoff of 0.001 PMI.