Skip to main content

Table 1 20 input features used for Sigma-RF are listed along with their importance estimates

From: Sigma-RF: prediction of the variability of spatial restraints in template-based modeling by random forest

Index

Feature

Importance

F1

|I−J|

7.51

F2

|I−J|/N r e s,t a r g e t

2.91

F3

d template

9.43

F4

m I,K m J,L

2.55

F5

\(\sum _{\substack {I \leq i \leq J\\ K \leq j \leq L}} m_{i,j}\delta (i,j)/\sum _{\substack {I \leq i \leq J\\ K \leq j \leq L}}\delta (i,j)\)

16.81

F6

\(N_{\textit {gap}}^{IJ}\)

1.91

F7

\(N_{\textit {gap}}^{IJ}/|I-J|\)

1.36

F8

1/|I ′−I|

0.12

F9

1/|J ′−J|

0.20

F10

\(N_{\textit {gap}}^{KL}\)

0.37

F11

\(N_{\textit {gap}}^{KL}/|K-L|\)

0.32

F12

1/|K ′−K|

0.23

F13

1/|L ′−L|

0.49

F14

\(\sum _{s=H,E,C} p(s)\delta (s_{I},s_{K})\)

0.16

F15

\(\sum _{s=H,E,C} p(s)\delta (s_{J},s_{L})\)

0.88

F16

\(\sum _{acc=B,E} p(acc)\delta (acc_{I},acc_{K})\)

0.53

F17

\(\sum _{acc=B,E} p(acc)\delta (acc_{J},acc_{L})\)

0.58

F18

F 4 F 14 F 15 F 16 F 17

3.62

F19

\(\frac {F_{18}}{1+F_{6}+F_{10}}\)

3.02

F20

\(\frac {F_{19}}{1+F_{8}+F_{9}+F_{12}+F_{13}}\)

4.22

  1. I and J (>I) indicate the residue indices in the target sequence, and K and L (>K) indicate those in the template sequence. When two residue pairs [(I, K) and (J, L)] are aligned, we extract the distance information of d template between two atoms in the template. N r e s,t a r g e t is the chain length of the target sequence. m I,K is the match score of the aligned pair (I, K). In F5, δ(i,j)=1, if residues i,j are aligned, otherwise δ(i,j)=0. \(N^{I,J}_{\textit {gap}}\) is the number of gaps between I and J in the target sequence. I ′, J ′, K ′ and L ′ represent the residue indices of the closest gaps of I, J, K and L, respectively. p(s) represents the PSI-PRED scores of the secondary structure elements, helix (H), strand (E) and coil (C). p(acc) represents the SANN scores of the solvent accessibility states, buried (B) and exposed (E).