From: Exploiting likely-positive and unlabeled data to improve the identification of protein-protein interaction articles
Dataset
Size (# of abstracts)
Training
True positive (TP)
3,536
True negative (TN)
1,959
Likely-positive (LP)
18,930
Unlabeled (U)
105,000
Test
Positive
338
Negative
339