A sentence sliding window approach to extract protein annotations from biomedical articles

BMC Bioinformatics

Table 1 Result summary for task 2a. The table shows the percentages of evaluated evidences organized by precision categories for proteins (rows) versus precision categories of GO-terms (columns). The label corresponds to, high: correct prediction, general: not totally wrong prediction but too general to be really useful for protein annotation (for GO-terms) and that the specific protein is not there but a homologue from another organism or a reference to the protein family is contained (for Protein), low: means basically wrong. Total refers to the entity extraction (protein or GO-term) and None are not evaluated cases.

Entity evaluations	GO Low	GO General	GO High	GO None	Total
Protein High	221 (21.05%)	69 (6.57%)	303 (28.85%)	1 (0%)	594 (56.47%)
Protein General	47 (4.48%)	24 (2.28%)	112 (10.67%)	0 (0%)	183 (17.43%)
Protein Low	127 (12.10%)	43 (4.10%)	86 (8.19%)	0 (0%)	256 (24.39%)
Protein None	1 (0.10%)	0 (0%)	0 (0%)	17 (1.61%)	18 (1.71%)
Total	396 (37.73%)	136 (12.95%)	501 (47.71%)	17 (1.61%)	1050 (100%)

ISSN: 1471-2105