Skip to main content

Table 8 The HI rules learnt to identify 1CPC (C-Phycocyanin) are illustrated first in their original Prolog form and in English translation. Two sets of rules are shown those using HIall, and those learnt from HIseq. All numbers were discretised into 10 levels for ease of symbolic induction (1 low – 10 high).

From: Homology Induction: the use of machine learning to improve sequence similarity searches

PDB 1CPC C-Phycocyanin

HI all

 

Prolog

 
 

homologous(A) :-

 

desc(A,chain),

 

amino_acid_ratio_rule(A,h,1).

 

homologous(A) :-

 

keyword(A,phycobilisome).

English

 
 

A protein is homologous if

a1

   it has the word 'chain' in its SWISS-PROT description line and

 

   it has a level 1 histidine content in the residue chain and

a2

   or it has the word 'phycobilisome' as a SWISS-PROT keyword.

HI seq

 

Prolog

 
 

homologous(A) :-

 

amino_acid_ratio_rule(A,w,1),

 

amino_acid_ratio_rule(A,h,1),

 

amino_acid_pair_ratio_rule(A,l,r,10).

 

homologous(A):-

 

mol_wt_rule(A,3),

 

sec_struc_distribution_rule(A,a,10).

English

 
 

A protein is homologous if

s1

   it has a level 1 tryptophan content and

 

   it has a level 1 histidine content and

 

   it has a level 10 leucine-arginine pair content.

s2

or

 

   it has a level 3 molecular weight and

 

   it has a level 10 predicted α-helix content.