Skip to main content

Table 1 Parameters used for Indel annotation with their defined values and the source of the data

From: A comprehensive study of small non-frameshift insertions/deletions in proteins and prediction of their phenotypic effects by a machine learning method (KD4i)

Class Parameters In final method? Values Source
Conservation Conserved residue Yes Yes/No MACSIMS via SM2PH
Block Yes
Functional Pfam domain Yes Yes/No MACSIMS via SM2PH
Prosite motif No
Uniprot domain Yes
Physico-chemical properties (average) Volume No See table 2 In-house
Hydrophobicity No
Polarity Yes
Charge No
Physico-chemical properties (total) Volume Yes See table 2 In-house
Hydrophobicity Yes
Polarity No
Charge No
Local perturbation in site (average) Volume Yes −2 to +2* In-house
Hydrophobicity Yes
Polarity No
Charge No
Local perturbation in environment (average) Volume No −2 to +2* In-house
Hydrophobicity No
Polarity No
Charge No
Local perturbation in region (average) Volume No −2 to +2* In-house
Hydrophobicity No
Polarity No
Charge No
Local perturbation in site (total) Volume Yes −2 to +2* In-house
Hydrophobicity Yes
Polarity No
Charge No
Local perturbation in environment (total) Volume No −2 to +2* In-house
Hydrophobicity No
Polarity No
Charge No
Local perturbation in region (total) Volume No −2 to +2* In-house
Hydrophobicity No
Polarity Yes
Charge No
Structural Disorder Yes Structured (probability of disorder P < 0.4) Spine-D
Semi-disorder (0.4 < P < 0.7)
Disorder (P > 0.7)
RSA Secondary structure Yes Fully buried (RSA value (Rv < 30) Spine-D
Buried (30 < Rv < 60)
Intermediate (60 < Rv < 90)
Exposed (90 < Rv < 120)
Fully exposed (Rv > 120)
Secondary structure Relative Indel Position Yes Coil Spine-D
Helix
Strand
Two (if NFS-Indel is in the transition zone between a strand/helix and coil)
Others Relative Indel Position Yes N-terminal In-house
Middle
C-terminal
Indel length Yes One In-house
More than one
Presence of Proline No Yes/No In-house
  Presence of Glycine No Yes/No In-house
  1. The column ‘In final method?’ indicates whether the parameter is used in the final ILP rule set for prediction of deleterious NFS-Indels. *The numerical values range from −5 to +5 but, in order to reduce computational cost, we have regrouped values higher than ±2 into the semantic category two or more/two or less.