Skip to main content

Table 3 Average performance metrics of different ML algorithms based on 10 runs of split training and testing sets

From: Improved cytokine–receptor interaction prediction by exploiting the negative sample space

 

SE

SP

ACC

MCC

AUC

g-means

Random (70% training, 30% testing)

 NB

63.7

53.5

58.4

0.178

0.595

56.8

 A1DE

70.7

57.5

62.2

0.255

0.674

63.4

 SMO-RBF

79.3

72.5

75.7

0.520

0.759

75.6

 SMO-PolyK

66.7

50.0

55.2

0.194

0.583

57.1

 SMO-PuK

74.8

77.8

76.3

0.529

0.763

76.0

 IBK

79.8

68.9

74.1

0.491

0.743

73.9

 Bagging

65.5

58.9

62.0

0.244

0.682

62.0

 RF

74.5

64.2

69.1

0.390

0.779

68.9

K-means (70% training, 30% testing)

 NB

79.7

50.0

63.6

0.299

0.742

62.3

 A1DE

77.6

87.2

82.9

0.659

0.857

82.4

 SMO-RBF

89.5

89.0

89.2

0.785

0.890

89.2

 SMO-PolyK

88.1

78.9

83.2

0.692

0.828

83.2

 SMO-PuK

82.9

85.1

84.1

0.683

0.845

83.8

 IBK

82.2

87.0

84.2

0.692

0.841

84.0

 Bagging

84.8

92.3

88.7

0.776

0.934

88.4

 RF

90.3

93.9

92.2

0.844

0.959

92.0

  1. The different sets are generated through either random (top) or K-means sampling (bottom). As for the latter, each run, the 12,343 negative protein-receptor combinations undergo K-means sampling with K = 203. The randomly chosen 203 negative samples, one per cluster, are then randomly split into training (70%) and testing (30%) sets. Similarly, the 203 positive samples are randomly split into training (70%) and testing (30%) sets