Improved cytokine–receptor interaction prediction by exploiting the negative sample space

BMC Bioinformatics

Table 3 Average performance metrics of different ML algorithms based on 10 runs of split training and testing sets

	SE	SP	ACC	MCC	AUC	g-means
Random (70% training, 30% testing)
NB	63.7	53.5	58.4	0.178	0.595	56.8
A1DE	70.7	57.5	62.2	0.255	0.674	63.4
SMO-RBF	79.3	72.5	75.7	0.520	0.759	75.6
SMO-PolyK	66.7	50.0	55.2	0.194	0.583	57.1
SMO-PuK	74.8	77.8	76.3	0.529	0.763	76.0
IBK	79.8	68.9	74.1	0.491	0.743	73.9
Bagging	65.5	58.9	62.0	0.244	0.682	62.0
RF	74.5	64.2	69.1	0.390	0.779	68.9
K-means (70% training, 30% testing)
NB	79.7	50.0	63.6	0.299	0.742	62.3
A1DE	77.6	87.2	82.9	0.659	0.857	82.4
SMO-RBF	89.5	89.0	89.2	0.785	0.890	89.2
SMO-PolyK	88.1	78.9	83.2	0.692	0.828	83.2
SMO-PuK	82.9	85.1	84.1	0.683	0.845	83.8
IBK	82.2	87.0	84.2	0.692	0.841	84.0
Bagging	84.8	92.3	88.7	0.776	0.934	88.4
RF	90.3	93.9	92.2	0.844	0.959	92.0

The different sets are generated through either random (top) or K-means sampling (bottom). As for the latter, each run, the 12,343 negative protein-receptor combinations undergo K-means sampling with K = 203. The randomly chosen 203 negative samples, one per cluster, are then randomly split into training (70%) and testing (30%) sets. Similarly, the 203 positive samples are randomly split into training (70%) and testing (30%) sets

ISSN: 1471-2105