Skip to main content

Table 1 The corpus information used in our experiments.

From: Classifying protein-protein interaction articles using word and syntactic features

Corpus Name

Positive Examples

Negative Examples

Total Examples

BioCreative II

3874

2298

6172

BioCreative II.5

124

1066

1190

BioCreative III Training Set

1140

1140

2280

BioCreative III Development Set

682

3318

4000

Total Training Set

5820

7822

13642

BioCreative III Test Set

910

5090

6000

  1. BioCreative II, BioCreative II.5, BioCreative III training, and development sets were used as the training corpus for the ACT competition. While the training corpus is balanced, the BioCreative III test set is an imbalanced set with the number of negative examples about six times higher than the number of positive examples. Hence, for the official submission, system parameters were tuned for the BioCreative III development set.