Skip to main content

Table 7 Rule-sets obtained by BioHEL for CN and RSA predictions using the full AA alphabet.

From: Automated Alphabet Reduction for Protein Datasets

Rules for CN prediction Rules for RSA prediction
1:If AA-4 {E,L,M,N,R,X}, AA-3 1:If AA-4 {G,I,L,V,X,F,Y}, AA-3
{D,E,N,H,R,F,W,Y,X}, AA-2 {E,F,W,N,S,P}, {G,Q,F,W}, AA-2 {C,N,P}, AA-1 {A,I,V,Q,Y}, AA {K},
AA-1 {D,E,F,G,H,K,N,Q}, AA {C,I,L,M,V}, AA1 AA1 {F,I,L,M,V,N,T,P}, AA2 {N,Q,S,P}, AA3
{D,E,K,R,N,Q,S,P}, AA2 {H,R,M,P,T,N,W,X}, AA3 {C,I,L,R,W}, AA4 {A,C,I,L,R,S} then RSA is high
{A,C,I,L,M,V,F,G,H,X}, AA4 {A,C,L,M,G,H,F,W} then 2:If AA-4 {A,I,L,V,G,W,F}, AA-3
CN is High {C,I,M,V,G,P,S,T,Y,F}, AA-2 {C,H,R,F,W}, AA-1
2:If AA-4 {E,H,K,R,N,Q,P,W,X}, AA-3 {F,H,I}, AA {E,K}, AA1 {I,M,V,N,S}, AA2
{D,E,K,R,M,N,T,P,Y}, AA-2 {D,N,S}, AA-1 {C,D,H,N,S}, AA3 {A,C,I,L,V,H,N,W,Y,F}, AA4
{D,E,G,K,N,P}, AA {A,C,I,L,M,W}, AA1 {G,H,I,L,M,P,F,W,Y} then RSA is high
{D,E,G,K,P,N,Q,S,T}, AA2 {C,I,D,G,P,S,X,Y}, AA3 .
{D,E,G,K,R,N,Q,S,P,X}, AA4 {A,C,I,L,M,V,F,G,T} then .
CN is high .
. .
. .
. .
32:If AA-4 {E,F,P,K,R,S,X}, AA-3 .
{A,C,I,L,V,G,F,W,X,Y}, AA-2 {F,H,I,M,P,N,Q,X} 57:If AA-4 {Q}, AA-3 {G,H,I,V,P,Y}, AA-2
AA-1 {C,I,D,E,G,P,K,R,N,S}, AA {G,H,T,M,V,W,Y,F}, AA-1 {G,I,M,V,X,Y},
{A,C,I,L,M,V,F,W,Y}, AA1 {G,P,N,Q,T,X}, AA2 AA {D,E,G,H,K,P,Q,S}, AA1 {E,F,W}, AA2
{D,G,N,Q,S}, AA3 {D,K,P,Q,W}, AA4 {A,I,M,R,X} {D,E,G,H,K,N,S,T,P,X}, AA3 {G,K,F,W}, AA4
then CN is high {L,M,R,W} then RSA is high
33:Default class: CN is low 58:Default class: RSA is low
  1. Rule set at the left is for CN prediction. Rule set at the right is for RSA prediction. AA± nmeans AA type for residue in position ± n in respect to the target residue. X means end of chain, in case one of the residues of the window overlaps with either one of the two ends of a chain.