Skip to main content

Table 1 Feature functions description.

From: Sieve-based relation extraction of gene regulatory networks from biological literature

Name

Description

Options

Target label distribution

Distribution of target labels.

--

Starts upper

Does a mention start with an upper case leter.

current, previous mention

Starts upper twice

Do two consequent mentions start with an upper case letter.

current, previous mention

Hearst co-occurence [58]

Does the text between the two mentions follow some predefined rules, e.g., mi such as mj.

--

Mention token distance

Distance between the two mentions in number of mentions.

--

Parse tree mention depth

Depth of the mention within the parse tree.

--

Parse tree parent value

Parse tree value of the mention on length l

l {1, 2, 3}

Parse tree path

Path values between the two mentions in a parse tree, e.g., DT/NP/NNS/.../NP/NP/VBG.

up to three tokens from every mention

BSubtilis

If the two mentions are known as B. subtilis, what is the probability of protein-protein interaction using STRING data [29], i.e., very low, low, medium, high, very high.

--

IsBSubtilis

Is the current mention known as B. subtilis gene.

--

IsBsubtilisPair

Which of the two consequent mentions is known as B. subtilis genes, i.e., left, right, both or none.

--

  1. The feature functions are used by all CRF-based sieves for all selected skip-mention CRF models. All extracted features are modeled both as unigram and bigram features. Unigram features are used for current label factor and bigram features are used for transition factor between two labels.