Protein Function Prediction using Text-based Features extracted from the Biomedical Literature: The CAFA Challenge

Table 6 Prediction performance on biological process classes, over the dataset of textless proteins.

Function	# Test Proteins	Text-KNN (Textless)			Text-KNN (Cross-validation)
		P	R	F	P	R	F
GO:0065007	19	0.28	0.47	0.35	0.23	*0.52*	0.31
GO:0032502	18	0.19	0.22	0.21	*0.22*	0.19	0.20
GO:0009987	8	0.04	0.13	0.06	*0.24*	*0.29*	*0.26*
GO:0050896	20	0.38	0.30	0.33	0.25	0.16	0.19
GO:0008152	7	0.29	0.29	0.29	0.23	0.14	0.17
GO:0051234	9	0.33	0.33	0.33	0.32	0.20	0.25
GO:0016043	6	0.00	0.00	0.00	*0.13*	*0.05*	*0.07*
GO:0023052	3	0.00	0.00	0.00	*0.18*	*0.11*	*0.14*
GO:0032501	9	0.00	0.00	0.00	*0.12*	*0.02*	*0.04*
GO:0022414	7	0.00	0.00	0.00	*0.51*	*0.15*	*0.24*
GO:0051704	1	0.00	0.00	0.00	0.00	0.00	0.00
GO:0040011	3	0.00	0.00	0.00	0.00	0.00	0.00
GO:0002376	1	0.00	0.00	0.00	0.00	0.00	0.00

Prediction performance of Text-KNN on proteins that have no associated text is shown in the Text-KNN (Textless) column. As a point of reference, the average cross-validation results, denoted as Text-KNN (Cross-Validation) as obtained over the whole cross-validation dataset, are shown for comparison only. The columns P, R, and F refer, respectively, to the Precision, Recall, and F-measure of the classifier over individual GO categories. A precision and recall values of 0 on a class indicates that all the proteins belonging to that class are misclassified into another class.

ISSN: 1471-2105