Table 2 Initial set of features extracted from amino acid sequences

From: Predictability of gene ontology slim-terms from primary structure information in Embryophyta plant proteins

Nature Description Number
Physical-chemical Sequence length 1
  Molecular weight 1
  Positively charged residues (%) 1
  Negatively charged residues (%) 1
  Isoelectric point 1
Primary structurestatistics Amino acid frequencies 20
  Amino acid dimer frequencies 400
Secondary structurestatistics Structure frequencies 3
  Structural dimer frequencies 9
  Total 438
  1. Features are divided into three broad categories: physical-chemical features, primary structure composition statistics and secondary structure composition statistics.