Annotation of protein residues based on a literature analysis: cross-validation against UniProtKb

Nagel, Kevin; Jimeno-Yepes, Antonio; Rebholz-Schuhmann, Dietrich

doi:10.1186/1471-2105-10-S8-S4

BMC Bioinformatics

Table 3 Biological catagories for the interpretation of functional annotations. The interpretation of extracted annotations is based on the automatic assignment of semantic labels to the arguments of a PAS. Because a comprehensive ontology is not available two categorisation schema are tested in this study. The first is the design of a scheme (MAN) based on an analysis of relevant MEDLINE sentences for residue annotation (bottom-up approach). Alternatively, the categories in the feature table of UniProtKb (FEAT) can be reused (top-down approach). Both categorisation schemes reflect concepts of biological interest. However the bottom-up approach has the advantage that proposed categories are data-driven, while in a top-down approach examples of listed categories may not be present in natural language text, or other categories are missing in the scheme.

From: Annotation of protein residues based on a literature analysis: cross-validation against UniProtKb

MAN		FEAT
Category	Defintion	Category	Defintion
STR_COMP	Structure component. Class denoting concepts that represent pieces and parts of the protein structure.	DOMAIN	Extent of a domain, which is defined as a specific combination of secondary structures organised into a characteristic three-dimensional structure of fold.
		MOTIF	Short (up to 20 amino acids) sequence motif of biological interest.
		TOPO_DOM	Topological domain.
		CHAIN	Extent of a polypeptide chain in the mature protein.
		TRANSMEM	Extent of a transmembrane region.
		COILED	Extent of a coiled-coil region.
CHEM_MOD	Chemical modification. Class denoting changes to the protein sequence and the chemical composition.	VARIANT	Authors report that sequence variants exist.
		MOD_RES	Posttranslational modification of a residue.
		PEPTIDE	Extent of a released active peptide.
		VAR_SEQ	Description of sequence variants produced by alternative splicing, alternative promoter usage, alternative initiation and ribosomal frameshifting.
		LIPID	Covalent binding of a lipid moiety.
		CARBOHYD	Glycosylation site.
STR_MOD	Structural modification. Class denoting the changes to the protein structure without changes to the chemical composition.	REGION	Extent of a region of interest in the sequence.
		SITE	Any interesting single amino-acid site on the sequence, that is not defined by another feature key.
BINDING	Binding type. Class denoting different physico-chemical forces leading to a bond formation between a protein structure component and a chemical entity.	BINDING	Binding site for any chemical group (co-enzyme, prosthetic group, etc.).
		METAL	Binding site for a metal ion.
		DISULFID	Disulfide bond.
		CROSSLNK	Posttranslationally formed amino acid bonds.
		DNA_BIND	Extent of a DNA-binding region.
		NP_BIND	Extent of a nucleotide phosphate-binding region.
		ZN_FING	Extent of a zinc finger region.
		CA_BIND	Extent of a calcium-binding region.
ENZ_ACT	Enzymatic activity. Types of enzymatic reactions as a subpart to protein functions.	ACT_SITE	Amino acid(s) involved in the activity of an enzyme.
CELL	Cellular phenotype. Class denoting different cellular phenotypes that can be affected by structural or compositional changes of a protein.	N/A

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com