From: Extracting biomedical events from pairs of text entities
Features | Examples | |
---|---|---|
Candidate entity features | Base form (stem) of the head token. | regul for tokens "regul*" (e.g. "regulation" in Figure 3). |
Base form of the head token without '-' or '/' before of after. | depend for token "-dependent" | |
Sub-string after '-' in the head token. | dependent for token "-dependent" | |
POS of the head token. | VBZ for token "requires" in Figure 3 | |
First token of the entity is after '-' or '/'. | -First for entity "-independent pathways" | |
Last token of the entity is before '-' or '/'. | Last- for entity "phobol ester-" | |
Head token has a special prefix: "over", "up", "down", "co" | up for "upregulation" | |
Concat. of base form and POS of parents of the head token in dependency parse. | NSUBJ←requir/VBZ for "regulation" in Figure 3 | |
Concat. of base form and POS of children of the head token in dependency parse. | NSUBJ→regul/NN, DOBJ→recruit/NN for "requires" in Figure 3 | |
Base forms of k neighboring tokens around the entity. | Base forms from the 2nd previous token to the 2nd next token are PROT, promot, PROT, PROT for "requires" in Figure 3 | |
POS of k neighboring tokens around the entity. | POS from the 2nd previous token to the 2nd next token are JJ, JJ, IN, DT for "regulation" in Figure 3 | |
Neighborhood of the entity has '-' or '/'. | Features from the 2nd previous token to the 2nd next token are NONE, NONE, hyphen, hyphen for "requires" in Figure 3 | |
Sentence has "mRNA". | True if "mRNA" exists in any position of the sentence | |
Entity is connected with another string using Tokenization1. | PROT-expression and PROT-express for token "Tax-expression" | |
Argument features | Argument is a protein. | True if the argument entity overlaps any protein |
POS of the head token. | NN for "NF-kappa" in Figure 3 | |
Features extracted from IntAct when the argument is a protein. | association, physical association for protein name "c-Rel" | |
Base forms of k neighboring tokens around the argument. | Base forms from the 1st previous token to the 1st next token are requir, PROT for "NF-kappa" in Figure 3 | |
POS of k neighboring tokens around the argument. | POS from the 1st previous token to the 1st next token are VBZ, IN for "NF-kappa" in Figure 3 | |
Concat. of base form and POS of parents of the head token in dependency parse. | NSUBJ←requir/VBZ for "regulation" in Figure 3 | |
Joint features | Token sequence between candidate and argument has proteins. | [PROT] ... PROT ... [trigger] and [PROT] ... PROT ... [recruit] for ("NF-kappaB", "recruitment") in Figure 3 |
V-walk features between candidate and argument with base forms. | regul promot, promot PROT for example in Figure 4 | |
E-walk features between candidate and argument with base forms. | regul , , prompt PROT for example in Figure 4 | |
V-walk features between candidate and argument with POS. | NN, NN, NN, PROT for example in Figure 4 | |
E-walk features between candidate and argument with POS. | NN NN , PROT for example in Figure 4 | |
Candidate and the argument share a token using Tokenization1. | ARG-express for "Tax" and "expression" in "Tax-expression" |