Skip to main content

Table 3 Performance of the 8 versions of the classifier, compared to the baseline

From: Automatic classification of registered clinical trials towards the Global Burden of Diseases taxonomy of diseases and injuries

 

Word Sense Disambiguation

Expert-based enrichment

Priority to health condition field

Exact matching proportion

Weighted average across 28 GBD categories

All trials N = 2,763

One GBD category N = 2,328

Two or more GBD categories N = 28

No GBD category N = 407

Sensitivity

Specificity

1

Yes

Yes

Yes

77.8

82.7

28.6

53.1

81.9

97.4

2

Yes

Yes

No

77.5

82.5

28.6

52.1

81.8

97.4

3

Yes

No

Yes

76.9

81.4

28.6

54.8

81.0

97.2

4

Yes

No

No

76.9

81.5

28.6

53.8

81.1

97.2

5

No

Yes

Yes

75.6

80.1

28.6

53.1

81.9

97.0

6

No

Yes

No

75.3

79.9

28.6

52.1

81.8

97.0

7

No

No

Yes

74.8

79.0

25.0

54.8

81.0

96.9

8

No

No

No

74.8

79.1

25.0

53.8

81.2

96.9

Baselines

Condition field

48.7

40.5

10.7

98.5

49.3

91.4

Public title

38.1

27.6

7.1

100.0

38.2

89.6

Official title

38.0

27.6

7.1

99.3

38.2

89.6

Three text fields

51.4

43.7

17.9

97.8

52.3

92.0

  1. Exact-matching and weighted averaged sensitivities and specificities for 8 versions of the classifier for the 28 GBD categories, compared to the baseline. Exact-matching corresponds to the proportion (in %) of trials for which the automatic GBD classification is correct. Exact-matching was estimated over all trials (N = 2,763), trials concerning a unique GBD category (N = 2,328), trials concerning 2 or more GBD categories (N = 28), and trials not relevant for the GBD (N = 407). The weighted averaged sensitivity and specificity corresponds to the weighted average across GBD categories of the sensitivities and specificities for each GBD category plus the “No GBD” category (in %). The 8 versions correspond to the combinations of the use or not of the Word Sense Disambiguation server during the text annotation, the expert-based enrichment database, and the priority to the health condition field as a prioritization rule. The baseline did not used the UMLS knowledge source, but a clinical trial record was classified to a GBD category if at least one of the disease names defining that GBD category appeared verbatim in the condition field, the public or scientific titles, separately, or in at least one of these three text fields