Skip to main content

Table 5 Features used for IMT

From: Simple and efficient machine learning frameworks for identifying protein-protein interaction relevant articles and experimental methods used to study the interactions

Feature Feature type Description
Perfect match (2 features) Binary For each node, checks if (1) the concept name or (2) any synonym name appears in the article
Term match (4 features) Binary For each node, checks if any unigram/bigram in the node’s (1, 2) concept name or (3, 4) synonyms appears in the article
Term match ratio (4 features) Continuous For each node, the ratio unigram/bigram in the node’s (1, 2) concept name or (3, 4) synonyms that appears in the article
Matched terms mutual information sum (4 features) Continuous Sum of mutual information score of each matching uni-gram/bigram in the node’s (1, 2) concept name or (3, 4) any synonym.
Matched term chi-squared sum (4 features) Continuous Sum of chi-squared value of each matching unigram/bigram in the node’s (1, 2) concept name or (3, 4) any synonym.
Node popularity Integer The number of times this node is annotated in the training data
Regex annotation Binary Checks if the regular expression-based annotator that was provided by the organizers of BioCreative III annotates the current article-ontology node pair
Keyword presence Binary Checks if the keyword for the ontology node appears in the article