Skip to main content

Table 3 Collocation detection results in the different corpora

From: Redundancy in electronic health record corpora: analysis, impact on text mining performance and mitigation strategies

Ā 

All informative (redundant) - 8,557 notes

Last informative (non-redundant) - 1,247 notes

Reduced redundancy - 3,970 notes

Collocations (TMI/PMI)

5,649/15,814

2,082/2,527

3,590/6,034

Avg. number of patients per collocation (TMI/PMI)

32/18

74/66

48/37

% collocations that appear in notes of 3 patients or less (TMI/PMI)

32%/36%

1.2%/1%

6.2%/5.8%