Skip to main content

Table 4 Features used by our system. Most are based on Tokenization2 except when specified

From: Extracting biomedical events from pairs of text entities

  Features Examples
Candidate entity features Base form (stem) of the head token. regul for tokens "regul*" (e.g. "regulation" in Figure 3).
  Base form of the head token without '-' or '/' before of after. depend for token "-dependent"
  Sub-string after '-' in the head token. dependent for token "-dependent"
  POS of the head token. VBZ for token "requires" in Figure 3
  First token of the entity is after '-' or '/'. -First for entity "-independent pathways"
  Last token of the entity is before '-' or '/'. Last- for entity "phobol ester-"
  Head token has a special prefix: "over", "up", "down", "co" up for "upregulation"
  Concat. of base form and POS of parents of the head token in dependency parse. NSUBJ←requir/VBZ for "regulation" in Figure 3
  Concat. of base form and POS of children of the head token in dependency parse. NSUBJ→regul/NN, DOBJ→recruit/NN for "requires" in Figure 3
  Base forms of k neighboring tokens around the entity. Base forms from the 2nd previous token to the 2nd next token are PROT, promot, PROT, PROT for "requires" in Figure 3
  POS of k neighboring tokens around the entity. POS from the 2nd previous token to the 2nd next token are JJ, JJ, IN, DT for "regulation" in Figure 3
  Neighborhood of the entity has '-' or '/'. Features from the 2nd previous token to the 2nd next token are NONE, NONE, hyphen, hyphen for "requires" in Figure 3
  Sentence has "mRNA". True if "mRNA" exists in any position of the sentence
  Entity is connected with another string using Tokenization1. PROT-expression and PROT-express for token "Tax-expression"
Argument features Argument is a protein. True if the argument entity overlaps any protein
  POS of the head token. NN for "NF-kappa" in Figure 3
  Features extracted from IntAct when the argument is a protein. association, physical association for protein name "c-Rel"
  Base forms of k neighboring tokens around the argument. Base forms from the 1st previous token to the 1st next token are requir, PROT for "NF-kappa" in Figure 3
  POS of k neighboring tokens around the argument. POS from the 1st previous token to the 1st next token are VBZ, IN for "NF-kappa" in Figure 3
  Concat. of base form and POS of parents of the head token in dependency parse. NSUBJ←requir/VBZ for "regulation" in Figure 3
Joint features Token sequence between candidate and argument has proteins. [PROT] ... PROT ... [trigger] and
[PROT] ... PROT ... [recruit]
for ("NF-kappaB", "recruitment") in Figure 3
  V-walk features between candidate and argument with base forms. regul PREP_OF promot, promot NN PROT for example in Figure 4
  E-walk features between candidate and argument with base forms. START regul PREP_OF , PREP_OF , prompt NN NN PROT END for example in Figure 4
  V-walk features between candidate and argument with POS. NN, PREP_OF NN, NN, NN PROT for example in Figure 4
  E-walk features between candidate and argument with POS. START NN PREP_OF NN NN , NN PROT END for example in Figure 4
  Candidate and the argument share a token using Tokenization1. ARG-express for "Tax" and "expression" in "Tax-expression"
  1. Most are based on Tokenization2 except when specified.