Automated Alphabet Reduction for Protein Datasets

Bacardit, Jaume; Stout, Michael; Hirst, Jonathan D; Valencia, Alfonso; Smith, Robert E; Krasnogor, Natalio

doi:10.1186/1471-2105-10-6

BMC Bioinformatics

Table 7 Rule-sets obtained by BioHEL for CN and RSA predictions using the full AA alphabet.

From: Automated Alphabet Reduction for Protein Datasets

Rules for CN prediction	Rules for RSA prediction
1:If AA_-4 ∉ {E,L,M,N,R,X}, AA_-3 ∉	1:If AA_-4 ∉ {G,I,L,V,X,F,Y}, AA_-3 ∉
{D,E,N,H,R,F,W,Y,X}, AA_-2 ∉ {E,F,W,N,S,P},	{G,Q,F,W}, AA_-2 ∉ {C,N,P}, AA_-1 ∉ {A,I,V,Q,Y}, AA ∈ {K},
AA_-1 ∉ {D,E,F,G,H,K,N,Q}, AA ∉ {C,I,L,M,V}, AA₁ ∉	AA₁ ∉ {F,I,L,M,V,N,T,P}, AA₂ ∉ {N,Q,S,P}, AA₃ ∉
{D,E,K,R,N,Q,S,P}, AA₂ ∉ {H,R,M,P,T,N,W,X}, AA₃ ∉	{C,I,L,R,W}, AA₄ ∉ {A,C,I,L,R,S} then RSA is high
{A,C,I,L,M,V,F,G,H,X}, AA₄ ∉ {A,C,L,M,G,H,F,W} then	2:If AA_-4 ∉ {A,I,L,V,G,W,F}, AA_-3 ∉
CN is High	{C,I,M,V,G,P,S,T,Y,F}, AA_-2 ∉ {C,H,R,F,W}, AA_-1 ∉
2:If AA_-4 ∉ {E,H,K,R,N,Q,P,W,X}, AA_-3 ∉	{F,H,I}, AA ∈ {E,K}, AA₁ ∉ {I,M,V,N,S}, AA₂ ∉
{D,E,K,R,M,N,T,P,Y}, AA_-2 ∉ {D,N,S}, AA_-1 ∉	{C,D,H,N,S}, AA₃ ∉ {A,C,I,L,V,H,N,W,Y,F}, AA₄ ∉
{D,E,G,K,N,P}, AA ∈ {A,C,I,L,M,W}, AA₁ ∉	{G,H,I,L,M,P,F,W,Y} then RSA is high
{D,E,G,K,P,N,Q,S,T}, AA₂ ∉ {C,I,D,G,P,S,X,Y}, AA₃ ∉	.
{D,E,G,K,R,N,Q,S,P,X}, AA₄ ∈ {A,C,I,L,M,V,F,G,T} then	.
CN is high	.
.	.
.	.
.	.
32:If AA_-4 ∉ {E,F,P,K,R,S,X}, AA_-3 ∈	.
{A,C,I,L,V,G,F,W,X,Y}, AA_-2 ∉ {F,H,I,M,P,N,Q,X}	57:If AA_-4 ∉ {Q}, AA_-3 ∉ {G,H,I,V,P,Y}, AA_-2 ∉
AA_-1 ∉ {C,I,D,E,G,P,K,R,N,S}, AA ∈	{G,H,T,M,V,W,Y,F}, AA_-1 ∉ {G,I,M,V,X,Y},
{A,C,I,L,M,V,F,W,Y}, AA₁ ∉ {G,P,N,Q,T,X}, AA₂ ∉	AA ∈ {D,E,G,H,K,P,Q,S}, AA₁ ∉ {E,F,W}, AA₂ ∈
{D,G,N,Q,S}, AA₃ ∉ {D,K,P,Q,W}, AA₄ ∉ {A,I,M,R,X}	{D,E,G,H,K,N,S,T,P,X}, AA₃ ∉ {G,K,F,W}, AA₄ ∉
then CN is high	{L,M,R,W} then RSA is high
33:Default class: CN is low	58:Default class: RSA is low

Rule set at the left is for CN prediction. Rule set at the right is for RSA prediction. AA_{± n}means AA type for residue in position ± n in respect to the target residue. X means end of chain, in case one of the residues of the window overlaps with either one of the two ends of a chain.

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com