Skip to main content

Table 3 Average performance metrics of different ML algorithms based on 10 runs of split training and testing sets

From: Improved cytokine–receptor interaction prediction by exploiting the negative sample space

  SE SP ACC MCC AUC g-means
Random (70% training, 30% testing)
 NB 63.7 53.5 58.4 0.178 0.595 56.8
 A1DE 70.7 57.5 62.2 0.255 0.674 63.4
 SMO-RBF 79.3 72.5 75.7 0.520 0.759 75.6
 SMO-PolyK 66.7 50.0 55.2 0.194 0.583 57.1
 SMO-PuK 74.8 77.8 76.3 0.529 0.763 76.0
 IBK 79.8 68.9 74.1 0.491 0.743 73.9
 Bagging 65.5 58.9 62.0 0.244 0.682 62.0
 RF 74.5 64.2 69.1 0.390 0.779 68.9
K-means (70% training, 30% testing)
 NB 79.7 50.0 63.6 0.299 0.742 62.3
 A1DE 77.6 87.2 82.9 0.659 0.857 82.4
 SMO-RBF 89.5 89.0 89.2 0.785 0.890 89.2
 SMO-PolyK 88.1 78.9 83.2 0.692 0.828 83.2
 SMO-PuK 82.9 85.1 84.1 0.683 0.845 83.8
 IBK 82.2 87.0 84.2 0.692 0.841 84.0
 Bagging 84.8 92.3 88.7 0.776 0.934 88.4
 RF 90.3 93.9 92.2 0.844 0.959 92.0
  1. The different sets are generated through either random (top) or K-means sampling (bottom). As for the latter, each run, the 12,343 negative protein-receptor combinations undergo K-means sampling with K = 203. The randomly chosen 203 negative samples, one per cluster, are then randomly split into training (70%) and testing (30%) sets. Similarly, the 203 positive samples are randomly split into training (70%) and testing (30%) sets