Skip to main content

Table 2 Summary of the different R2 and RMSE values obtained through predictions for the full set of protein sequences and after an 80/20 splitting in order to generate a training set and a validation set

From: Application of fourier transform and proteochemometrics principles to protein engineering

Set

Partition

cvR2

cvRMSE

Cyt P450 (thermostability)

Full set (10-fold CV)

0.96

1.19

Train set (80%) (10-fold CV)

0.93

1.33

Validation set (20%)

0.92

1.72

Enterotoxins (thermostability)

Full set (LOOCV)

0.95

1.58

Train set (80%) (LOOCV)

0.85

2.58

Validation set (20%)

0.99

0.59

TNF (relative binding affinities)

Full set (LOOCV)

0.85

0.31

Train set (80%) (LOOCV)

0.86

0.33

Validation set (20%)

0.92

0.20

GLP-2 Potency (fold-increase in cAMP)

Full set (LOOCV)

0.42

2.05

Train set (80%) (LOOCV)

0.75

1.39

Validation set (20%)

0.71

1.44

  1. For the full set and train set (80%), cvR2 and cvRMSE (same units as the activity for RMSE) values were evaluated after leave-one-out cross-validation (LOOCV) or 10-fold cross-validation scheme