Skip to main content

Table 1 Rule-based classifiers

From: Multi-stage gene normalization for full-text articles with context-based species filtering for dynamic dictionary entry selection

Species a

S(id) refers to the species keywords of id

Cell b

C(id) refers to the cell line keywords of id

PPI c

PPI(id) refers to the interaction partner of id

History

Full name/Acronym

FN(id) refers to the gene mention’s full name (its identifier is id)

Tissue c

T(id) refers to the tissue keywords of id

Domain d

D(id) refers to the domain keywords of id

Family d

F(id) refers to the family keywords of id

MASS d

M(id) refers to the MASS of id

Gene Ontology

GO(id) refers to the GO terms of id

Chromosome Location e

CL(id) refers to the chromosome locations of id

Sequence Length d

SL(id) refers to the sequence lengths of id

RS Number d

R(id) refers to the RS number of id

The id refers to an identifier from the ambiguous list.

The nid refers to a successfully normalized identifier stored in the metadata.

  1. a Information collected from NCBI Taxonomy
  2. b Information collected from Cell Bank[23], HyperCLDB[24] and Invitrogen[25]
  3. cInformation collected from Human Protein Reference Database (HPRD)
  4. d Information collected from UniProt database
  5. e Information collected from EntrezGene database