Skip to main content

Table 3 Biological catagories for the interpretation of functional annotations. The interpretation of extracted annotations is based on the automatic assignment of semantic labels to the arguments of a PAS. Because a comprehensive ontology is not available two categorisation schema are tested in this study. The first is the design of a scheme (MAN) based on an analysis of relevant MEDLINE sentences for residue annotation (bottom-up approach). Alternatively, the categories in the feature table of UniProtKb (FEAT) can be reused (top-down approach). Both categorisation schemes reflect concepts of biological interest. However the bottom-up approach has the advantage that proposed categories are data-driven, while in a top-down approach examples of listed categories may not be present in natural language text, or other categories are missing in the scheme.

From: Annotation of protein residues based on a literature analysis: cross-validation against UniProtKb

MAN

FEAT

Category

Defintion

Category

Defintion

STR_COMP

Structure component. Class denoting concepts that represent pieces and parts of the protein structure.

DOMAIN

Extent of a domain, which is defined as a specific combination of secondary structures organised into a characteristic three-dimensional structure of fold.

  

MOTIF

Short (up to 20 amino acids) sequence motif of biological interest.

  

TOPO_DOM

Topological domain.

  

CHAIN

Extent of a polypeptide chain in the mature protein.

  

TRANSMEM

Extent of a transmembrane region.

  

COILED

Extent of a coiled-coil region.

CHEM_MOD

Chemical modification. Class denoting changes to the protein sequence and the chemical composition.

VARIANT

Authors report that sequence variants exist.

  

MOD_RES

Posttranslational modification of a residue.

  

PEPTIDE

Extent of a released active peptide.

  

VAR_SEQ

Description of sequence variants produced by alternative splicing, alternative promoter usage, alternative initiation and ribosomal frameshifting.

  

LIPID

Covalent binding of a lipid moiety.

  

CARBOHYD

Glycosylation site.

STR_MOD

Structural modification. Class denoting the changes to the protein structure without changes to the chemical composition.

REGION

Extent of a region of interest in the sequence.

  

SITE

Any interesting single amino-acid site on the sequence, that is not defined by another feature key.

BINDING

Binding type. Class denoting different physico-chemical forces leading to a bond formation between a protein structure component and a chemical entity.

BINDING

Binding site for any chemical group (co-enzyme, prosthetic group, etc.).

  

METAL

Binding site for a metal ion.

  

DISULFID

Disulfide bond.

  

CROSSLNK

Posttranslationally formed amino acid bonds.

  

DNA_BIND

Extent of a DNA-binding region.

  

NP_BIND

Extent of a nucleotide phosphate-binding region.

  

ZN_FING

Extent of a zinc finger region.

  

CA_BIND

Extent of a calcium-binding region.

ENZ_ACT

Enzymatic activity. Types of enzymatic reactions as a subpart to protein functions.

ACT_SITE

Amino acid(s) involved in the activity of an enzyme.

CELL

Cellular phenotype. Class denoting different cellular phenotypes that can be affected by structural or compositional changes of a protein.

N/A

Â