# | Name | Abbr. | Formula | Description |
---|---|---|---|---|
1 | Accuracy | ACC | (TP + TN)/(TP + TN + FP + FN) | The number of correctly classified instances divided by the total number of instances [6]. |
2 | Area under ROC curve | AUC | - | It measures the discriminating ability of the model and it takes values between 0.5 for random drawing and 1.0 for perfect classifier [6]. |
3 | Enrichment Factor | EF | [CS/(CS + WS)]/[S/(S + I)] | EF is especially suitable for the unbalanced datasets [27]. |
CS: Number of correctly classified soluble proteins. | ||||
WS: Number of soluble proteins wrongly classified as insoluble. | ||||
S: total number of soluble proteins. | ||||
I: total number of insoluble proteins. | ||||
4 | False Negative | FN | - | The number of incorrectly predicted negatives [10]. |
5 | False Positive | FP | - | The number of incorrectly predicted positives [10]. |
6 | F-Score | FS | 2 × Precision × Recall/(Precision + Recall) | The harmonic mean of recall and precision [10]. |
7 | Gain | GAIN | Precision/proportion of the given class in the full data set. | It is an important performance measure that quantifies how much better the decision is in comparison with random drawing of instances [6]. |
8 | Matthew’s Correlation Coefficient | MCC | (TP × TN - FP × FN)/((TP + FP)(TP + FN)(TN + FP)(TN + FN)) | It indicates the correlation between the classifier assignments and the actual class in the two-class case. It is a good measure of classifier performance even when classes are unbalanced [6]. The MCC ranges between -1 and 1, and a large positive value indicates a better prediction [10]. |
9 | Precision (Selectivity) | PRC | TP/(TP + FP) Or TN/(TN + FN) | The ratio of the number of correctly classified positive or negative instances to the number of all instances classified as positive or negative, for positive and negative class respectively [6]. |
10 | ROC Curve | ROC | Plotting the “FP-rate” against the “TP- rate”, while the probability is increased from 0 to 1.0 with 0.01 increments. | The receiver-operator characteristic curve, showing the trade-off between the ratio of false positives and false negatives in testing a classifier [48]. A larger area value indicates a more robust prediction method [10]. |
11 | Recall | REC | TP/(TP + FN) | The ratio of the number of correctly classified positive instances to the number of all instances from the positive class [6]. |
(Sensitivity) | ||||
(True positive rate) | ||||
(TP- rate) | ||||
12 | Specificity | SPC | TN/(TN + FP) | The ratio of the number of correctly classified negative instances to the sum of all negative instances [6]. |
(True Negative Rate) | ||||
(TN-rate) | ||||
13 | True Positive | TP | - | The number of correctly predicted positives [10]. |
14 | True Negative | TN | - | The number of correctly predicted negatives [10]. |