Skip to main content

Table 2 The feature combinations used for submitted runs on the article classification task

From: Classifying protein-protein interaction articles using word and syntactic features

 

BC3 Dev Set

Multi-word

MeSH Term

Stemmed GRs

Feature Cut

Higher Order

  

UNI

BI

TRI

    

Run 1

 

X

X

 

X

   

Run 2

X

X

X

 

X

   

Run 3

 

X

X

X

X

X

X

 

Run 4

X

X

X

X

X

X

X

 

Run 5

 

X

X

X

X

X

X

X

  1. The training data used in official submissions includes all examples of previous BioCreative PPI article tasks. However, the BioCreative III development set was selectively added for training in different runs. Unigrams (UNI), bigrams (BI), and trigrams (TRI) were used as multi-word features. MeSH feature is unigrams and bigrams from MeSH terms. For grammar relations (GRs), stemming was performed on Run 3 through Run 5. Feature cut was performed based on the frequency threshold four.