Skip to main content

Table 3 The predictive performance of BICEPP by characteristics categories

From: BICEPP: an example-based statistical text mining method for predicting the binary characteristics of drugs

Category

Best AUC

Algorithm

  

NB

IBk

SVM/L

SVM/RBF

Best of 4

AMH major classes

> 0.80

20 (100)

19 (95)

19 (95)

20 (100)

20 (100)

 

> 0.90

15 (75)

16 (80)

16 (80)

16 (80)

19 (95)

 

> 0.95

10 (50)

12 (60)

11 (55)

11 (55)

12 (60)

AMH minor classes

> 0.80

98 (50)

133 (68)

130 (66)

134 (68)

135 (69)

 

> 0.90

86 (44)

121 (61)

120 (61)

117 (59)

123 (62)

 

> 0.95*

73 (37)

114 (58)

102 (52)

106 (54)

114 (58)

AMH adverse events

> 0.80

134 (56)

145 (61)

114 (48)

119 (50)

159 (67)

 

> 0.90

65 (27)

76 (32)

56 (24)

63 (26)

86 (36)

 

> 0.95

30 (13)

38 (16)

30 (13)

35 (15)

41 (17)

PKIS perpetrator

> 0.80

3 (20)

7 (47)

3 (20)

4 (27)

7 (47)

 

> 0.90

1 (7)

4 (27)

2 (13)

3 (20)

5 (33)

 

> 0.95

0 (0)

2 (13)

2 (13)

2 (13)

2 (13)

Narrow therapeutic index drugs

> 0.80

8 (57)

9 (64)

8 (57)

8 (57)

9 (64)

 

> 0.90

7 (50)

8 (57)

5 (36)

7 (50)

8 (57)

 

> 0.95

3 (21)

5 (36)

3 (21)

2 (14)

5 (36)

Overall

> 0.80

263 (54)

313 (65)

274 (57)

285 (59)

330 (68)

 

> 0.90

174 (36)

225 (46)

199 (41)

206 (43)

241 (50)

 

> 0.95*

116 (24)

171 (35)

148 (31)

156 (32)

174 (36)

  1. The numbers in this table indicate the number of characteristics (percentage) that achieved an AUC above the given threshold in stratified cross-validation evaluations. The performance is indicated by AUC and can be interpreted as good (> 0.80), very good (> 0.9), and excellent (> 0.95), respectively. Overall, 68% of drug characteristics can be predicted with good AUC (numbers in boldface) and 36% of characteristics can be predicted very accurately (AUC > 0.95) with at least one classifier. The last column (best of 4) shows how many characteristics achieved AUC above the given threshold by any of the four algorithms. Pearson's chi-square test was applied to examine the homogeneity between algorithms. *) indicate the statistically significant categories at α = 0.05 (analysed as 4 × 1 tables with 3 d.f.). However, no categories were statistically performance significant after adjusting for family-wise error rate using Bonferroni method (n = 18). Abbreviations: AE: adverse events; AMH: Australian Medicines Handbook; IBk: k-nearest neighbour algorithm; NB: Naive Bayes; SVM: support vector machine; SVM/L: linear SVM; SVM/RBF: support vector machine with radial basis function kernel. PKIS: PharmacoKinetic Interaction Screening database.