Skip to main content

Table 8 Prediction performance for biological process classes, over the CAFA evaluation dataset. (The number of proteins in each class is shown below each function header)

From: Protein Function Prediction using Text-based Features extracted from the Biomedical Literature: The CAFA Challenge

Function Text-KNN
(confidence = 0.75)
CAFA-Prior
(confidence = 0.01)
CAFA-Seq
(confidence = 0.95)
GOtcha
(confidence = 0.14)
  P R S P R S P R S P R S
biological regulation
(114 proteins)
0.5 0.009 0.997 0.261 1 0 0.632 0.105 0.978 0.404 0.351 0.817
multi-organism process
(29 proteins)
0.00 0.00 0.939 0.067 1 0 0.00 0.00 0.99 0.286 0.069 0.988
localization
(60 proteins)
0.2 0.017 0.989 0.138 1 0 0.44 0.067 0.976 0.297 0.317 0.88
establishment of localization
(38 proteins)
0.25 0.026 0.992 0.087 1 0 0.5 0.105 0.99 0.263 0.395 0.894
response to stimulus
(106 proteins)
0.125 0.009 0.979 0.243 1 0 0.5 0.047 0.985 0.39 0.302 0.848
developmental process
(83 proteins)
0.00 0.00 0.997 0.19 1 0 0.556 0.06 0.989 0.263 0.181 0.881
multicellular organismal process
(87 proteins)
0.069 0.023 0.923 0.2 1 0 0.625 0.115 0.983 0.343 0.264 0.874
signalling
(33 proteins)
0.5 0.03 0.998 0.076 1 0 0.25 0.061 0.985 0.077 0.061 0.94
biological adhesion
(52 proteins)
0.00 0.00 0.971 0.06 1 0 0.00 0.00 0.998 0.00 0.00 0.993
cellular component organization
(64 proteins)
0.00 0.00 0.997 0.147 1 0 0.286 0.031 0.987 0.192 0.156 0.887
cellular process
(368 proteins)
0.857 0.016 0.985 0.844 1 0 0.867 0.071 0.941 0.866 0.829 0.309
metabolic process
(213 proteins)
0.00 0.00 0.991 0.489 1 0 0.588 0.047 0.969 0.633 0.559 0.691
reproduction
(25 proteins)
0.083 0.08 0.946 0.057 1 0 0.00 0.00 0.995 0.214 0.12 0.973
reproductive process
(25 proteins)
0.083 0.08 0.946 0.057 1 0 0.00 0.00 0.995 0.273 0.12 0.981
  1. The text-based classifier, Text-KNN, compared with baseline results provided by the CAFA challenge: CAFA-Prior, CAFA-Seq, and GOtcha. The confidence threshold used for each classifier is shown under its name in the respective column. The confidence threshold for Text-kNN, GOtcha, and CAFA-Prior are, respectively, set at 0.75, 0.14, and 0.01 since these classifiers make no predictions for over 75% of the classes at higher confidence thresholds.
  2. The columns P, R, and S refer, respectively, to the Precision, Recall, and Specificity of the classifier over individual classes. Precision and recall values of 0 for a class indicate that all the proteins belonging to that class are misclassified (at the respective confidence level). CAFA-Prior always has a specificity value of 0, because it assigns all the proteins to each class, and as such the number of true negatives is always 0.
  3. A specificity value that is close to 1, for a class whose precision and recall are both 0, indicates that most proteins in the dataset are not in the class (true negatives) and are indeed not assigned to the class. A few proteins from other classes are misclassified into the class (false positives), hence the specificity is slightly less than 1.