Skip to main content

Table 1 The corpus information used in our experiments.

From: Classifying protein-protein interaction articles using word and syntactic features

Corpus Name Positive Examples Negative Examples Total Examples
BioCreative II 3874 2298 6172
BioCreative II.5 124 1066 1190
BioCreative III Training Set 1140 1140 2280
BioCreative III Development Set 682 3318 4000
Total Training Set 5820 7822 13642
BioCreative III Test Set 910 5090 6000
  1. BioCreative II, BioCreative II.5, BioCreative III training, and development sets were used as the training corpus for the ACT competition. While the training corpus is balanced, the BioCreative III test set is an imbalanced set with the number of negative examples about six times higher than the number of positive examples. Hence, for the official submission, system parameters were tuned for the BioCreative III development set.