Skip to main content

Table 3 Prediction performance on molecular function classes, over the cross-validation dataset.

From: Protein Function Prediction using Text-based Features extracted from the Biomedical Literature: The CAFA Challenge

Function #
Training Proteins
#
Test Proteins
Text-KNN Base-Prior Base-Seq
    P R F P R F P R F
GO:0005488 10720 2680 0.65 0.88 0.75 0.63 0.64 0.63 0.67 0.75 0.71
GO:0003824 2943 736 0.52 0.23 0.32 0.16 0.15 0.15 0.38 0.29 0.33
GO:0030528 1276 319 0.44 0.24 0.31 0.07 0.07 0.07 0.49 0.37 0.42
GO:0005215 782 196 0.59 0.38 0.46 0.04 0.04 0.04 0.50 0.43 0.46
GO:0060089 738 184 0.39 0.16 0.22 0.04 0.04 0.04 0.26 0.27 0.27
GO:0030234 485 121 0.43 0.05 0.08 0.03 0.03 0.03 0.16 0.09 0.12
GO:0005198 334 84 0.04 0.01 0.01 0.02 0.02 0.02 0.11 0.11 0.11
GO:0016247 58 14 0.60 0.24 0.35 0.01 0.01 0.01 0.00 0.00 0.00
GO:0009055 54 14 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
GO:0045182 21 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
  1. The text-based classifier, Text-KNN, is compared with two baselines: Base-Prior, and Base-Seq. The columns P, R, and F refer, respectively, to the Precision, Recall, and F-measure of the classifier over individual GO categories. A precision and recall values of 0 on a class indicates that all the proteins belonging to that class are misclassified into another class.