Figure 3From: A linear classifier based on entity recognition tools and a statistical approach to method extraction in the protein-protein interaction literatureComparison of entity count features for NLProt protein and OSCAR compound mentions in abstracts in training set D (top), and ABNER Protein mentions in figure captions and PSI-MI method mentions in full text documents in training data DPMC (bottom). The horizontal axis represents the number of mentions x and the vertical axis the probability of documents with at least x mentions. The green line denotes probabilities for documents labeled relevant p P (n π ≥ x), while the red line denotes probabilities for documents labeled irrelevant p N (n π ≥ x); the blue line denotes the difference between green and red lines (|p P – p N |).Back to article page