Skip to main content

Table 5 Features used for IMT

From: Simple and efficient machine learning frameworks for identifying protein-protein interaction relevant articles and experimental methods used to study the interactions

Feature

Feature type

Description

Perfect match (2 features)

Binary

For each node, checks if (1) the concept name or (2) any synonym name appears in the article

Term match (4 features)

Binary

For each node, checks if any unigram/bigram in the node’s (1, 2) concept name or (3, 4) synonyms appears in the article

Term match ratio (4 features)

Continuous

For each node, the ratio unigram/bigram in the node’s (1, 2) concept name or (3, 4) synonyms that appears in the article

Matched terms mutual information sum (4 features)

Continuous

Sum of mutual information score of each matching uni-gram/bigram in the node’s (1, 2) concept name or (3, 4) any synonym.

Matched term chi-squared sum (4 features)

Continuous

Sum of chi-squared value of each matching unigram/bigram in the node’s (1, 2) concept name or (3, 4) any synonym.

Node popularity

Integer

The number of times this node is annotated in the training data

Regex annotation

Binary

Checks if the regular expression-based annotator that was provided by the organizers of BioCreative III annotates the current article-ontology node pair

Keyword presence

Binary

Checks if the keyword for the ontology node appears in the article