Skip to main content

Table 6 Prediction performance on biological process classes, over the dataset of textless proteins.

From: Protein Function Prediction using Text-based Features extracted from the Biomedical Literature: The CAFA Challenge

Function # Test Proteins Text-KNN (Textless) Text-KNN (Cross-validation)
   P R F P R F
GO:0065007 19 0.28 0.47 0.35 0.23 0.52 0.31
GO:0032502 18 0.19 0.22 0.21 0.22 0.19 0.20
GO:0009987 8 0.04 0.13 0.06 0.24 0.29 0.26
GO:0050896 20 0.38 0.30 0.33 0.25 0.16 0.19
GO:0008152 7 0.29 0.29 0.29 0.23 0.14 0.17
GO:0051234 9 0.33 0.33 0.33 0.32 0.20 0.25
GO:0016043 6 0.00 0.00 0.00 0.13 0.05 0.07
GO:0023052 3 0.00 0.00 0.00 0.18 0.11 0.14
GO:0032501 9 0.00 0.00 0.00 0.12 0.02 0.04
GO:0022414 7 0.00 0.00 0.00 0.51 0.15 0.24
GO:0051704 1 0.00 0.00 0.00 0.00 0.00 0.00
GO:0040011 3 0.00 0.00 0.00 0.00 0.00 0.00
GO:0002376 1 0.00 0.00 0.00 0.00 0.00 0.00
  1. Prediction performance of Text-KNN on proteins that have no associated text is shown in the Text-KNN (Textless) column. As a point of reference, the average cross-validation results, denoted as Text-KNN (Cross-Validation) as obtained over the whole cross-validation dataset, are shown for comparison only. The columns P, R, and F refer, respectively, to the Precision, Recall, and F-measure of the classifier over individual GO categories. A precision and recall values of 0 on a class indicates that all the proteins belonging to that class are misclassified into another class.