Skip to main content

Table 1 Performance of classification models based on different features and training algorithms

From: Distinguishing crystallographic from biological interfaces in protein complexes: role of intermolecular contacts and energetics for classification

 

Training Features

Bagging

Random Forest

Adaptive Boosting

Gradient Boosting

Neural Network

Average

S1

BSA

0.74

0.74

0.81

0.81

0.55

0.73

(0.51)

(0.51)

(0.43)

(0.41)

(0.50)

(0.47)

S2

RCs

0.86

0.86

0.85

0.86

0.85

0.86

(0.50)

(0.50)

(0.51)

(0.50)

(0.54)

(0.51)

S3

CC, CP, CA, PP, AP, AA

0.89

0.90

0.89

0.89

0.89

0.89

(0.67)

(0.70)

(0.69)

(0.67)

(0.67)

(0.68)

S4

CC, CP, CA, PP, AP, AA, ANIS, CNIS, PNIS

0.90

(0.69)

0.90

(0.69)

0.89

(0.66)

0.89

(0.67)

0.89

(0.67)

0.89

(0.68)

S5

CC, CP, CA, PP, AP, AA, LD, G, A, L, M, F, W, K, Q, E, S, P, V, I, C, Y, H, R, N, D, T

0.92

(0.74)

0.92

(0.73)

0.91

(0.74)

0.92

(0.71)

0.91

(0.77)

0.92

(0.74)

S6

CC, CP, CA, PP, AP, AA, ANIS, CNIS, PNIS, LD, G, A, L, M, F, W, K, Q, E, S, P, V, I, C, Y, H, R, N, D, T

0.92

(0.73)

0.92

(0.75)

0.91

(0.74)

0.93

(0.70)

0.92

(0.76)

0.92

(0.74)

E1

HS

0.76

0.76

0.83

0.82

0.82

0.80

(0.59)

(0.59)

(0.62)

(0.62)

(0.59)

(0.60)

E2

Eelec, Evdw, Edes

0.87

0.87

0.87

0.87

0.85

0.87

(0.64)

(0.61)

(0.62)

(0.62)

(0.68)

(0.63)

 

CC, CP, CA, PP, AP, AA, ANIS, CNIS, PNIS, LD, G, A, L, M, F, W, K, Q, E, S, P, V, I, C, Y, H, R, N, D, T, Eelec, Evdw, Edes

0.92

(0.72)

0.93

(0.73)

0.92

(0.74)

0.93

(0.72)

0.90

(0.77)

0.92

(0.74)

C

  1. Accuracy values calculated according Eq. 2 in “Methods
  2. The predictive accuracies have been reported for several classification models tested. Nine sets of features have been used to train new predictive models, based on structural properties (S1, S2, S3, S4, S4, S6), energetics (E1, E2) and a combination of structure and energetics (C). For each set of training features, five machine learning algorithms have been used for the training (Bagging, Random Forest, Adaptive Boosting, Gradient Boosting and Neural Network). For the trained models, the accuracies on the Many [34] and the DC [15] (numbers in brackets) datasets are reported. The accuracy on the Many is reported as average of the 10-fold cross validation. In brackets the accuracy over the DC dataset is reported