Skip to main content

Table 2 Performance parameters of 100 runs using various ratios of training/test sets

From: Predicting substrates of the human breast cancer resistance protein using a support vector machine method

Training/Test ratio

Category

ACC

SE

SP

MCC

0.5/0.5

Training

83.8

94.6

65.8

0.649

 

Test

67.8

82.2

44.0

0.288

 

External

69.5

78.3

54.7

0.344

0.6/0.4

Training

85.1

94.7

69.1

0.678

 

Test

69.0

83.2

45.5

0.315

 

External

70.8

79.0

57.1

0.372

0.7/0.3

Training

83.4

93.3

67.1

0.642

 

Test

71.1

84.2

49.1

0.360

 

External

70.8

78.4

58.1

0.375

0.75/0.25

Training

84.5

93.8

69.1

0.665

 

Test

69.7

83.3

47.2

0.332

 

External

70.9

77.7

59.6

0.382

0.8/0.2

Training

83.6

93.5

67.1

0.644

 

Test

70.4

83.6

48.7

0.35

 

External

70.6

77.5

59.1

0.376

0.85/0.15

Training

82.9

93.5

65.1

0.627

 

Test

70.3

85.1

46.4

0.347

 

External

70.9

78.5

58.1

0.376

  1. The total number of molecules used in the training and test data sets were 223. The number of molecules in the external validation data set was 40. ACC, accuracy (overall prediction accuracy); SP, specificity (prediction accuracy for the non-substrates); SE, sensitivity (prediction accuracy for the substrates); MCC, the Matthews correlation coefficient (a more balanced prediction parameter than ACC). The external data set was only used to validate the prediction power of the models constructed, and was not used for model selection.