Skip to main content

Table 2 The feature combinations used for submitted runs on the article classification task

From: Classifying protein-protein interaction articles using word and syntactic features

  BC3 Dev Set Multi-word MeSH Term Stemmed GRs Feature Cut Higher Order
   UNI BI TRI     
Run 1   X X   X    
Run 2 X X X   X    
Run 3   X X X X X X  
Run 4 X X X X X X X  
Run 5   X X X X X X X
  1. The training data used in official submissions includes all examples of previous BioCreative PPI article tasks. However, the BioCreative III development set was selectively added for training in different runs. Unigrams (UNI), bigrams (BI), and trigrams (TRI) were used as multi-word features. MeSH feature is unigrams and bigrams from MeSH terms. For grammar relations (GRs), stemming was performed on Run 3 through Run 5. Feature cut was performed based on the frequency threshold four.