Sieve-based relation extraction of gene regulatory networks from biological literature

Žitnik, Slavko; Žitnik, Marinka; Zupan, Blaž; Bajec, Marko

doi:10.1186/1471-2105-16-S16-S1

BMC Bioinformatics

Table 2 Feature function generators description.

From: Sieve-based relation extraction of gene regulatory networks from biological literature

Name	Description	Options	Observable data
Prefix value	Value of the prefix for the mention on offset distance from the current mention.	string length: {2, 3}; offset: [−5, 5]	text
Suffix value	Value of the suffix for the mention on offset distance from the current mention.	string length: {2, 3}; offset: [−5, 5]	text
Consequent value	A combination of values of the two consequent mentions on offset distance from the current mention, e.g., PDT/NNS.	offset: [−4,4]	text, part-of-speech, lemma, entity type, coreference
Current value	A value of the mention on offset distance from the current mention, e.g., NNS.	offset: [−4,4]	text, part-of-speech, lemma, entity type, coreference
Context value	Matching of specified length of character-based ngram values within the selected range of words from the current and previous mentions using Jaccard coefficient. According to the match result, feature function values are discretized into eight levels. Different feature functions are generated for the context left/right of both mentions, between the two, outside the two and union of all.	range: 5, ngram: 3	text
Previous / next value combination	A combination of token values from the selected distance to the current and the previous mentions.	distance: {−2, 2}	text, part-of-speech, lemma
Left / right / between value	Token values on the left/right or in between the two mentions on the selected distance.	distance: [1, 5]	text, part-of-speech, lemma
Split to values	Split the current mention into tokens by the selected delimiter and output first N tokens.	N: 2, delimiter: '	text, lemma

According to the implementation, different options and observable values, the generators generate specific feature functions using a single scan over training data. The feature functions are used by all CRF-based sieves for all selected skip-mention CRF models. All extracted features are modeled both as unigram and bigram features (except prefix and suffix, which are of unigram type only). Unigram features are used for current label factor and bigram features are used for transition factor between two labels.

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com