Skip to main content

Table 4 Features used by our system. Most are based on Tokenization2 except when specified

From: Extracting biomedical events from pairs of text entities

 

Features

Examples

Candidate entity features

Base form (stem) of the head token.

regul for tokens "regul*" (e.g. "regulation" in Figure 3).

 

Base form of the head token without '-' or '/' before of after.

depend for token "-dependent"

 

Sub-string after '-' in the head token.

dependent for token "-dependent"

 

POS of the head token.

VBZ for token "requires" in Figure 3

 

First token of the entity is after '-' or '/'.

-First for entity "-independent pathways"

 

Last token of the entity is before '-' or '/'.

Last- for entity "phobol ester-"

 

Head token has a special prefix: "over", "up", "down", "co"

up for "upregulation"

 

Concat. of base form and POS of parents of the head token in dependency parse.

NSUBJ←requir/VBZ for "regulation" in Figure 3

 

Concat. of base form and POS of children of the head token in dependency parse.

NSUBJ→regul/NN, DOBJ→recruit/NN for "requires" in Figure 3

 

Base forms of k neighboring tokens around the entity.

Base forms from the 2nd previous token to the 2nd next token are PROT, promot, PROT, PROT for "requires" in Figure 3

 

POS of k neighboring tokens around the entity.

POS from the 2nd previous token to the 2nd next token are JJ, JJ, IN, DT for "regulation" in Figure 3

 

Neighborhood of the entity has '-' or '/'.

Features from the 2nd previous token to the 2nd next token are NONE, NONE, hyphen, hyphen for "requires" in Figure 3

 

Sentence has "mRNA".

True if "mRNA" exists in any position of the sentence

 

Entity is connected with another string using Tokenization1.

PROT-expression and PROT-express for token "Tax-expression"

Argument features

Argument is a protein.

True if the argument entity overlaps any protein

 

POS of the head token.

NN for "NF-kappa" in Figure 3

 

Features extracted from IntAct when the argument is a protein.

association, physical association for protein name "c-Rel"

 

Base forms of k neighboring tokens around the argument.

Base forms from the 1st previous token to the 1st next token are requir, PROT for "NF-kappa" in Figure 3

 

POS of k neighboring tokens around the argument.

POS from the 1st previous token to the 1st next token are VBZ, IN for "NF-kappa" in Figure 3

 

Concat. of base form and POS of parents of the head token in dependency parse.

NSUBJ←requir/VBZ for "regulation" in Figure 3

Joint features

Token sequence between candidate and argument has proteins.

[PROT] ... PROT ... [trigger] and

[PROT] ... PROT ... [recruit]

for ("NF-kappaB", "recruitment") in Figure 3

 

V-walk features between candidate and argument with base forms.

regul PREP_OF promot, promot NN PROT for example in Figure 4

 

E-walk features between candidate and argument with base forms.

START regul PREP_OF , PREP_OF , prompt NN NN PROT END for example in Figure 4

 

V-walk features between candidate and argument with POS.

NN, PREP_OF NN, NN, NN PROT for example in Figure 4

 

E-walk features between candidate and argument with POS.

START NN PREP_OF NN NN , NN PROT END for example in Figure 4

 

Candidate and the argument share a token using Tokenization1.

ARG-express for "Tax" and "expression" in "Tax-expression"

  1. Most are based on Tokenization2 except when specified.