Skip to main content

Table 7 EHR corpora descriptive statistics

From: Redundancy in electronic health record corpora: analysis, impact on text mining performance and mitigation strategies

Corpus

# Patients

# Notes

# Words / # Unique Words

# Concepts / # Unique Concepts

All Notes

1,604

22,564

6,131,879 / 138,877

599,847 / 7,174

All Informative Notes

1,247

8,557

2,243,551 / 51,234

319,298 / 5,389

Last Informative Note

1,247

1,247

338,207 / 25,624

46,311 / 3,711