Skip to main content

Table 6 Redundancy in same patient note pairs

From: Redundancy in electronic health record corpora: analysis, impact on text mining performance and mitigation strategies

Corpus

Redundancy of in-corpus note pairs

Number of pairs in sample

All Informative

29%

2,000

Selective- Fingerprinting maximum similarity 0.33

12.70%

380

Selective-Fingerprinting maximum similarity 0.25

9.80%

305

Selective-Fingerprinting maximum similarity 0.2

9.30%

263

  1. Amount of redundancy in a random sample of 2,000 same-patient note pairs within the corpora using different similarity thresholds.