A linear classifier based on entity recognition tools and a statistical approach to method extraction in the protein-protein interaction literature

Table 8 IMT Runs on the training set (after code correction)

Run	Precision	Recall	F-Score	MCC	AUC iP/R	Total Docs Evaluated
All	2.38%	94.80%	0.0465	0.0937	0.2032	2002
Top 40	4.54%	85.16%	0.0864	0.1598	0.2063	2002
RScore ≥6	26.30%	58.72%	0.3633	0.3806	0.1997	1947
RScore ≥7	29.14%	50.25%	0.3689	0.3711	0.1816	1871

The table shows the results of running our (corrected) program on the BC 3 training set. The measurements shown are of precision, recall, F-score, Matthews Correlation Coefficient (MCC), Area under the Curve, and the total number of articles being evaluated by our program.
The rows reflect four different runs: The first based on pattern-matching of methods to the text alone (All); the second scoring the sentence-method associations and reporting the top 40 scoring methods; the third reporting the top scoring methods whose raw score was at least 6, while the last reporting the top scoring methods whose top score was at least 7.

ISSN: 1471-2105