Skip to main content

Table 1 Four medical informatics datasets used in experiments

From: Mapping biological entities using the longest approximately common prefix method

# Dataset # of concepts # of terms Size in kilobytes
D 1 The UMLS most frequent concepts from multiple sources 100 4,979 369
D 2 The SNOMED CT most frequent concepts 155 5,000 281
D 3 The UMLS concepts with longest terms (“longest concepts”) 3,337 5,000 1,693
D 4 The SNOMED CT longest concepts 1,805 5,000 903