Skip to main content

Table 5 Features used for IMT

From: Simple and efficient machine learning frameworks for identifying protein-protein interaction relevant articles and experimental methods used to study the interactions


Feature type


Perfect match (2 features)


For each node, checks if (1) the concept name or (2) any synonym name appears in the article

Term match (4 features)


For each node, checks if any unigram/bigram in the node’s (1, 2) concept name or (3, 4) synonyms appears in the article

Term match ratio (4 features)


For each node, the ratio unigram/bigram in the node’s (1, 2) concept name or (3, 4) synonyms that appears in the article

Matched terms mutual information sum (4 features)


Sum of mutual information score of each matching uni-gram/bigram in the node’s (1, 2) concept name or (3, 4) any synonym.

Matched term chi-squared sum (4 features)


Sum of chi-squared value of each matching unigram/bigram in the node’s (1, 2) concept name or (3, 4) any synonym.

Node popularity


The number of times this node is annotated in the training data

Regex annotation


Checks if the regular expression-based annotator that was provided by the organizers of BioCreative III annotates the current article-ontology node pair

Keyword presence


Checks if the keyword for the ontology node appears in the article