Figure 2

Comparison of entity count features for ABNER protein mentions in abstracts in training set D (top), and CHEBI compound names in full text documents in training data DPMC (bottom). The horizontal axis represents the number of mentions x, and the vertical axis the probability of documents with at least x mentions. The green lines denote probabilities for documents labeled relevant p P (n π ≥ x), while the red lines denote probabilities documents labeled irrelevant p N (n π ≥ x); the blue lines denote the difference between green and red lines (|p P – p N |).