Skip to main content

Table 7 Prediction accuracies of T4 mutant classifiers

From: 3D deep convolutional neural networks for amino acid environment similarity analysis

Features 4 fold cross-validation
  Lasso Train Lasso Test SVM Train SVM Test
6-Feature- S freq 0.90 0.825 0.967 0.825
6-Feature- S dot 0.875 0.85 0.933 0.825
6-Feature- SBLOSUM 0.883 0.85 0.967 0.775
6-Feature- SPAM 0.883 0.875 0.958 0.775
6-Feature- SWAC 0.85 0.825 0.925 0.775
3-Feature- S freq 0.808 0.775 0.833 0.825
3-Feature- S dot 0.800 0.825 0.858 0.775
3-Feature- SBLOSUM 0.817 0.800 0.858 0.825
3-Feature- SPAM 0.867 0.850 0.917 0.80
3-Feature- SWAC 0.825 0.825 0.858 0.825
1-Feature- S freq 0.725 0.675 0.724 0.7
1-Feature- S dot 0.708 0.675 0.742 0.725
1-Feature- SBLOSUM 0.633 0.575 0.667 0.475
1-Feature- SPAM 0.525 0.525 0.708 0.4
1-Feature- SWAC 0.525 0.525 0.633 0.5
  1. Performances of Lasso and SVM models built with 1-Feature, 3-Feature, and 6-Feature set from 5 different matrices are compared. The 6-Feature set comprises the substitution scores indexed by the six pairs of true and predicted class for the wild type and mutant variant microenvironment. Specifically, 6-Feature set = [S(WT,WP), S(WT,MT), S(WT,MP), S(WP,MT), S(WP,MP),S(MT,MP)], where S(i,j) is the similarity score taken from the (i,j) element of a score matrix, WT, WP, MT and MP denote the wild type true label, wild type predicted label, mutant true label, and mutant predicted label, respectively. The 3-Feature set is composed of [S(WT,WP), S(WT,MT), S(WP,MT)] and the 1-Feature set only contains [S(WT,MT)]
  2. Sfreq and Sdot  matrices show significant advantage with the 1-Feature set (highlighted in boldface), when only the wild type true label and the mutant true labels are known. Models using the 3-Feature and 6- Feature sets achieved better prediction accuracies than using the 1- Feature set alone. Significant boosts of performance using the 3-Feature set over the 1-Feature set are observed for models built with BLOSUM and PAM matrices. The addition information of the predicted label for the wild type structure provides key information that was not captured by sequence derived matrices