Skip to main content

Table 8 The HI rules learnt to identify 1CPC (C-Phycocyanin) are illustrated first in their original Prolog form and in English translation. Two sets of rules are shown those using HIall, and those learnt from HIseq. All numbers were discretised into 10 levels for ease of symbolic induction (1 low – 10 high).

From: Homology Induction: the use of machine learning to improve sequence similarity searches

PDB 1CPC C-Phycocyanin
HI all  
Prolog  
  homologous(A) :-
  desc(A,chain),
  amino_acid_ratio_rule(A,h,1).
  homologous(A) :-
  keyword(A,phycobilisome).
English  
  A protein is homologous if
a1    it has the word 'chain' in its SWISS-PROT description line and
     it has a level 1 histidine content in the residue chain and
a2    or it has the word 'phycobilisome' as a SWISS-PROT keyword.
HI seq  
Prolog  
  homologous(A) :-
  amino_acid_ratio_rule(A,w,1),
  amino_acid_ratio_rule(A,h,1),
  amino_acid_pair_ratio_rule(A,l,r,10).
  homologous(A):-
  mol_wt_rule(A,3),
  sec_struc_distribution_rule(A,a,10).
English  
  A protein is homologous if
s1    it has a level 1 tryptophan content and
     it has a level 1 histidine content and
     it has a level 10 leucine-arginine pair content.
s2 or
     it has a level 3 molecular weight and
     it has a level 10 predicted α-helix content.