Skip to main content

Table 1 Rule-based classifiers

From: Multi-stage gene normalization for full-text articles with context-based species filtering for dynamic dictionary entry selection

Species a
S(id) refers to the species keywords of id
Cell b
C(id) refers to the cell line keywords of id
PPI(id) refers to the interaction partner of id
Full name/Acronym
FN(id) refers to the gene mention’s full name (its identifier is id)
Tissue c
T(id) refers to the tissue keywords of id
Domain d
D(id) refers to the domain keywords of id
Family d
F(id) refers to the family keywords of id
M(id) refers to the MASS of id
Gene Ontology
GO(id) refers to the GO terms of id
Chromosome Location e
CL(id) refers to the chromosome locations of id
Sequence Length d
SL(id) refers to the sequence lengths of id
RS Number d
R(id) refers to the RS number of id
The id refers to an identifier from the ambiguous list.
The nid refers to a successfully normalized identifier stored in the metadata.
  1. a Information collected from NCBI Taxonomy
  2. b Information collected from Cell Bank[23], HyperCLDB[24] and Invitrogen[25]
  3. cInformation collected from Human Protein Reference Database (HPRD)
  4. d Information collected from UniProt database
  5. e Information collected from EntrezGene database