Protein Function Prediction using Text-based Features extracted from the Biomedical Literature: The CAFA Challenge

Table 5 Prediction performance on molecular function classes, over the dataset of textless proteins.

Function	# Textless Proteins	Text-KNN (Textless)			Text-KNN (Cross-validation)
		P	R	F	P	R	F
GO:0005488	58	0.82	0.47	0.59	0.65	*0.88*	*0.75*
GO:0003824	9	0.29	0.56	0.38	*0.52*	0.23	0.32
GO:0030528	1	0.04	1.00	0.08	*0.44*	0.24	*0.31*
GO:0005215	5	0.50	0.20	0.29	*0.59*	*0.38*	*0.46*
GO:0060089	7	0.44	0.57	0.50	0.39	0.16	0.22
GO:0005198	2	0.00	0.00	0.00	0.04	0.01	0.01

Prediction performance of Text-KNN on proteins that have no associated text is shown in the Text-KNN (Textless) column. As a point of reference, the average cross-validation results, denoted as Text-KNN (Cross-Validation) as obtained over the whole cross-validation dataset, are shown for comparison only. The columns P, R, and F refer, respectively, to the Precision, Recall, and F-measure of the classifier over individual GO categories. A precision and recall values of 0 on a class indicates that all the proteins belonging to that class are misclassified into another class.

ISSN: 1471-2105