Skip to main content

Table 2 Comparison of ML and PolyBayes on test data set

From: Application of machine learning in SNP discovery

Measure

Decision Tree

Production Rules

PolyBayes

TP

1153

1202

1435

TN

16,748

16,706

NA

FP

207

249

16,955

FN

282

233

NA

Accuracy

97.3

97.4

7.8

Sensitivity

80.3

83.8

100 (Set)

Specificity

98.7

98.5

NA

Positive Predictive Value

84.8

82.8

7.8

Negative Predictive Value

98.3

98.6

NA

  1. We define the following terms used to contrast ML performance with PolyBayes: We say that a SNP prediction program produces a true positive (TP) if it predicts a SNP that is judged true by the expert. Likewise, a false positive (FP) is a predicted SNP that is judged false by the expert, a true negative (TN) is a prediction of a non-SNP that concurs with the expert, and a false negative (FN) is a failure to identify a SNP that is identified by the expert. Also the following parameters were used to measure the performance of the ML output: Accuracy (i.e., fraction of candidate SNP correctly classified), sensitivity (i.e., fraction of positive outcomes correctly identified), specificity (i.e., fraction of the negative outcomes correctly identified), positive predictive value (i.e., fraction of predicted SNP being true) and negative predictive value (i.e., fraction of predicted false SNP being correctly classified)
  2. Accuracy = (TP + TN)/total
  3. Sensitivity = TP/(TP + FN)
  4. Specificity = TN/(FP + TN)
  5. Positive Predictive Value (PPV) = TP/(TP + FP)Negative Predictive Value (NPV) = TN/(TN + FN)
  6. Application of machine learning program substantially reduces the number of false positives from 16,955 to only about 250. Other statistical measures also demonstrate considerable advantage in the application of machine learning.