Skip to main content

Table 4 Prediction performance on biological process classes, over the cross-validation dataset.

From: Protein Function Prediction using Text-based Features extracted from the Biomedical Literature: The CAFA Challenge

Function # Training Protein # Test Protein Text-KNN Base-Prior Base-Seq
    P R F P R F P R F
GO:0065007 3626 906 0.23 0.52 0.31 0.20 0.24 0.22 0.32 0.48 0.38
GO:0032502 3338 835 0.22 0.19 0.20 0.12 0.17 0.14 0.22 0.24 0.23
GO:0009987 1790 447 0.24 0.29 0.26 0.17 0.14 0.15 0.26 0.27 0.27
GO:0050896 1780 445 0.25 0.16 0.19 0.10 0.10 0.10 0.16 0.09 0.11
GO:0008152 1658 415 0.23 0.14 0.17 0.08 0.06 0.07 0.28 0.34 0.31
GO:0051234 1204 301 0.32 0.20 0.25 0.05 0.05 0.05 0.44 0.45 0.45
GO:0016043 1145 286 0.13 0.05 0.07 0.06 0.05 0.06 0.15 0.12 0.13
GO:0023052 965 241 0.18 0.11 0.14 0.05 0.04 0.04 0.30 0.28 0.29
GO:0032501 606 151 0.12 0.02 0.04 0.04 0.03 0.04 0.24 0.11 0.16
GO:0022414 346 86 0.51 0.15 0.24 0.02 0.02 0.02 0.14 0.03 0.05
GO:0051704 272 68 0.29 0.09 0.14 0.01 0.01 0.01 0.09 0.04 0.05
GO:0040011 170 42 0.13 0.01 0.01 0.01 0.01 0.01 1.00 0.05 0.09
GO:0040007 165 41 0.01 0.01 0.01 0.01 0.01 0.01 0.00 0.00 0.00
GO:0051179 151 38 0.03 0.01 0.01 0.01 0.01 0.01 0.00 0.00 0.00
GO:0022610 128 32 0.07 0.02 0.03 0.01 0.01 0.01 0.00 0.00 0.00
GO:0008283 118 29 0.01 0.01 0.01 0.01 0.01 0.01 0.00 0.00 0.00
GO:0000003 96 24 0.00 0.00 0.00 0.01 0.01 0.01 0.00 0.00 0.00
GO:0002376 74 19 0.06 0.03 0.04 0.00 0.00 0.00 0.00 0.00 0.00
GO:0016265 64 16 0.00 0.00 0.00 0.01 0.01 0.01 0.00 0.00 0.00
GO:0071554 46 11 0.38 0.08 0.13 0.01 0.00 0.00 0.00 0.00 0.00
GO:0048511 43 11 0.31 0.06 0.10 0.00 0.00 0.00 0.00 0.00 0.00
GO:0023046 35 9 0.01 0.01 0.01 0.00 0.00 0.00 0.00 0.00 0.00
GO:0044085 16 4 0.01 0.01 0.01 0.00 0.00 0.00 0.00 0.00 0.00
GO:0043473 13 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
  1. The text-based classifier, Text-KNN, is compared with two baselines: Base-Prior, and Base-Seq. The columns P, R, and F refer, respectively, to the Precision, Recall, and F-measure of the classifier over individual GO categories. A precision and recall values of 0 on a class indicates that all the proteins belonging to that class are misclassified into another class.