Skip to main content

Table 2 Performance parameters of 100 runs using various ratios of training/test sets

From: Predicting substrates of the human breast cancer resistance protein using a support vector machine method

Training/Test ratio Category ACC SE SP MCC
0.5/0.5 Training 83.8 94.6 65.8 0.649
  Test 67.8 82.2 44.0 0.288
  External 69.5 78.3 54.7 0.344
0.6/0.4 Training 85.1 94.7 69.1 0.678
  Test 69.0 83.2 45.5 0.315
  External 70.8 79.0 57.1 0.372
0.7/0.3 Training 83.4 93.3 67.1 0.642
  Test 71.1 84.2 49.1 0.360
  External 70.8 78.4 58.1 0.375
0.75/0.25 Training 84.5 93.8 69.1 0.665
  Test 69.7 83.3 47.2 0.332
  External 70.9 77.7 59.6 0.382
0.8/0.2 Training 83.6 93.5 67.1 0.644
  Test 70.4 83.6 48.7 0.35
  External 70.6 77.5 59.1 0.376
0.85/0.15 Training 82.9 93.5 65.1 0.627
  Test 70.3 85.1 46.4 0.347
  External 70.9 78.5 58.1 0.376
  1. The total number of molecules used in the training and test data sets were 223. The number of molecules in the external validation data set was 40. ACC, accuracy (overall prediction accuracy); SP, specificity (prediction accuracy for the non-substrates); SE, sensitivity (prediction accuracy for the substrates); MCC, the Matthews correlation coefficient (a more balanced prediction parameter than ACC). The external data set was only used to validate the prediction power of the models constructed, and was not used for model selection.