Skip to main content

Table 3 Biological catagories for the interpretation of functional annotations. The interpretation of extracted annotations is based on the automatic assignment of semantic labels to the arguments of a PAS. Because a comprehensive ontology is not available two categorisation schema are tested in this study. The first is the design of a scheme (MAN) based on an analysis of relevant MEDLINE sentences for residue annotation (bottom-up approach). Alternatively, the categories in the feature table of UniProtKb (FEAT) can be reused (top-down approach). Both categorisation schemes reflect concepts of biological interest. However the bottom-up approach has the advantage that proposed categories are data-driven, while in a top-down approach examples of listed categories may not be present in natural language text, or other categories are missing in the scheme.

From: Annotation of protein residues based on a literature analysis: cross-validation against UniProtKb

MAN FEAT
Category Defintion Category Defintion
STR_COMP Structure component. Class denoting concepts that represent pieces and parts of the protein structure. DOMAIN Extent of a domain, which is defined as a specific combination of secondary structures organised into a characteristic three-dimensional structure of fold.
   MOTIF Short (up to 20 amino acids) sequence motif of biological interest.
   TOPO_DOM Topological domain.
   CHAIN Extent of a polypeptide chain in the mature protein.
   TRANSMEM Extent of a transmembrane region.
   COILED Extent of a coiled-coil region.
CHEM_MOD Chemical modification. Class denoting changes to the protein sequence and the chemical composition. VARIANT Authors report that sequence variants exist.
   MOD_RES Posttranslational modification of a residue.
   PEPTIDE Extent of a released active peptide.
   VAR_SEQ Description of sequence variants produced by alternative splicing, alternative promoter usage, alternative initiation and ribosomal frameshifting.
   LIPID Covalent binding of a lipid moiety.
   CARBOHYD Glycosylation site.
STR_MOD Structural modification. Class denoting the changes to the protein structure without changes to the chemical composition. REGION Extent of a region of interest in the sequence.
   SITE Any interesting single amino-acid site on the sequence, that is not defined by another feature key.
BINDING Binding type. Class denoting different physico-chemical forces leading to a bond formation between a protein structure component and a chemical entity. BINDING Binding site for any chemical group (co-enzyme, prosthetic group, etc.).
   METAL Binding site for a metal ion.
   DISULFID Disulfide bond.
   CROSSLNK Posttranslationally formed amino acid bonds.
   DNA_BIND Extent of a DNA-binding region.
   NP_BIND Extent of a nucleotide phosphate-binding region.
   ZN_FING Extent of a zinc finger region.
   CA_BIND Extent of a calcium-binding region.
ENZ_ACT Enzymatic activity. Types of enzymatic reactions as a subpart to protein functions. ACT_SITE Amino acid(s) involved in the activity of an enzyme.
CELL Cellular phenotype. Class denoting different cellular phenotypes that can be affected by structural or compositional changes of a protein. N/A