Extracting biomedical events from pairs of text entities

Liu, Xiao; Bordes, Antoine; Grandvalet, Yves

doi:10.1186/1471-2105-16-S10-S8

BMC Bioinformatics

Table 4 Features used by our system. Most are based on Tokenization2 except when specified

From: Extracting biomedical events from pairs of text entities

	Features	Examples
Candidate entity features	Base form (stem) of the head token.	regul for tokens "regul*" (e.g. "regulation" in Figure 3).
	Base form of the head token without '-' or '/' before of after.	depend for token "-dependent"
	Sub-string after '-' in the head token.	dependent for token "-dependent"
	POS of the head token.	VBZ for token "requires" in Figure 3
	First token of the entity is after '-' or '/'.	-First for entity "-independent pathways"
	Last token of the entity is before '-' or '/'.	Last- for entity "phobol ester-"
	Head token has a special prefix: "over", "up", "down", "co"	up for "upregulation"
	Concat. of base form and POS of parents of the head token in dependency parse.	NSUBJ←requir/VBZ for "regulation" in Figure 3
	Concat. of base form and POS of children of the head token in dependency parse.	NSUBJ→regul/NN, DOBJ→recruit/NN for "requires" in Figure 3
	Base forms of k neighboring tokens around the entity.	Base forms from the 2nd previous token to the 2nd next token are PROT, promot, PROT, PROT for "requires" in Figure 3
	POS of k neighboring tokens around the entity.	POS from the 2nd previous token to the 2nd next token are JJ, JJ, IN, DT for "regulation" in Figure 3
	Neighborhood of the entity has '-' or '/'.	Features from the 2nd previous token to the 2nd next token are NONE, NONE, hyphen, hyphen for "requires" in Figure 3
	Sentence has "mRNA".	True if "mRNA" exists in any position of the sentence
	Entity is connected with another string using Tokenization1.	PROT-expression and PROT-express for token "Tax-expression"
Argument features	Argument is a protein.	True if the argument entity overlaps any protein
	POS of the head token.	NN for "NF-kappa" in Figure 3
	Features extracted from IntAct when the argument is a protein.	association, physical association for protein name "c-Rel"
	Base forms of k neighboring tokens around the argument.	Base forms from the 1st previous token to the 1st next token are requir, PROT for "NF-kappa" in Figure 3
	POS of k neighboring tokens around the argument.	POS from the 1st previous token to the 1st next token are VBZ, IN for "NF-kappa" in Figure 3
	Concat. of base form and POS of parents of the head token in dependency parse.	NSUBJ←requir/VBZ for "regulation" in Figure 3
Joint features	Token sequence between candidate and argument has proteins.	[PROT] ... PROT ... [trigger] and [PROT] ... PROT ... [recruit] for ("NF-kappaB", "recruitment") in Figure 3
	V-walk features between candidate and argument with base forms.	regul $\underset{\leftarrow}{PREP_OF}$ promot, promot $\underset{\to}{NN}$ PROT for example in Figure 4
	E-walk features between candidate and argument with base forms.	$\underset{\to}{START}$ regul $\underset{\to}{PREP_OF}$ , $\underset{\to}{PREP_OF}$ , prompt $\underset{\to}{NN}$ $\underset{\to}{NN}$ PROT $\underset{\to}{END}$ for example in Figure 4
	V-walk features between candidate and argument with POS.	NN, $\underset{\leftarrow}{PREP_OF}$ NN, NN, $\underset{\to}{NN}$ PROT for example in Figure 4
	E-walk features between candidate and argument with POS.	$\underset{\to}{START}$ NN $\underset{\to}{PREP_OF}$ NN $\underset{\to}{NN}$ , $\underset{\to}{NN}$ PROT $\underset{\to}{END}$ for example in Figure 4
	Candidate and the argument share a token using Tokenization1.	ARG-express for "Tax" and "expression" in "Tax-expression"

Most are based on Tokenization2 except when specified.

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com