Predicting changes in protein thermostability brought about by single- or multi-site mutations

BMC Bioinformatics

Table 1 Classification and regression performance of Prethermut on the M-dataset

Method^a	Mutation Numbers	n^b	MCC	Q2 (%)	Sensitivity (%)	Specificity (%)	r
RF	1	2765	0.46	77.3	71.3	7 9.7	0.70
RF	2	441	0.66	84.8	81.0	86.5	0.79
RF	3	93	0.86	96.8	84.6	98.8	0.87
RF	≥4	67	0.92	97.0	93.8	98.0	0.86
RF	≥1	3366	0.50	79.7	73.6	81.1	0.72
SVM	1	2765	0.39	79.8	41.2	92.1	0.64
SVM	2	441	0.59	83.0	51.1	97.4	0.74
SVM	3	93	0.45	89.7	23.1	100.0	0.79
SVM	≥4	67	0.66	88.1	50.0	100.0	0.78
SVM	≥1	3366	0.43	79.7	42.7	93.2	0.67

All of the results were obtained by a 10-fold cross validation on the M-dataset. See Methods for definitions of overall accuracy (Q2), Matthews correlation coefficient (MCC), sensitivity, specificity, and Pearson correlation coefficient (r). ^aThe number of trees in the random forests (RF) method is 10000; the parameters for the support vector machine (SVM) method are gamma (g) = 2, cost (c) = 8, and the weight for the positive samples (w) = 3. ^bn is the number of mutant proteins in the sample; the total number of proteins in the M-dataset was 3366.

ISSN: 1471-2105