Gene/protein name recognition based on support vector machine using dictionary as features

BMC Bioinformatics

Table 8 Effect of each feature on system performance I. The first column shows the values when all features were used in the SVM learning (word, POS, orthography (orth.), prefix (pre.), suffix (suf.), dictionary matching (dic.), and preceding class (pc.)). The other columns show the values when a feature was ignored in the learning. The parenthesized values are the p-values. The values in bold have a statistically significant difference from the base value. A difference is labeled statistically significant when the p-value is less than 0.05 on the Wilcoxon signed-ranks sum test (two-sided).

	base	-word	-pos.	-orth.	-pre.	-suf.	-dic.	-pc.
Precision	0.8189	0.8076 (0.002)	0.8239 (0.037)	0.8210 (0.375)	0.8051 (0.002)	0.8000 (0.002)	0.8143 (0.105)	0.7009 (0.002)
Recall	0.7661	0.7619 (0.008)	0.7658 (0.945)	0.7590 (0.020)	0.7574 (0.004)	0.7233 (0.002)	0.7478 (0.002)	0.7508 (0.002)
Balanced f-score	0.7916	0.7840 (0.002)	0.7937 (0.131)	0.7887 (0.160)	0.7805 (0.002)	0.7597 (0.002)	0.7796 (0.004)	0.7250 (0.002)

ISSN: 1471-2105