Skip to main content

Table 7 Prediction performance for molecular function classes, over the CAFA evaluation dataset. (The number of proteins in each class is shown below each function header)

From: Protein Function Prediction using Text-based Features extracted from the Biomedical Literature: The CAFA Challenge

Function Text-KNN
(confidence = 0.95)
CAFA-Prior
(confidence = 0.01)
CAFA-Seq
(confidence = 0.95)
GOtcha
(confidence = 0.95)
  P R S P R S P R S P R S
binding
(212 proteins)
0.643 0.17 0.87 0.579 1 0.00 0.9 0.085 0.987 0.723 0.16 0.916
transporter activity
(28 proteins)
0.00 0.00 0.97 0.077 1 0.00 0.5 0.036 0.997 0.714 0.179 0.994
catalytic activity
(165 proteins)
0.312 0.03 0.95 0.451 1 0.00 0.714 0.03 0.990 0.917 0.067 0.995
  1. The text-based classifier, Text-KNN, is compared with baseline results provided by the CAFA challenge: CAFA-Prior, CAFA-Seq, and GOtcha. The confidence threshold used for each classifier is shown under its name in the respective column. A confidence threshold of 0.01 is used for CAFA-Prior because the classifier does not make any predictions for the 'transporter activity' class at higher confidence thresholds.
  2. The columns P, R, and S refer, respectively, to the Precision, Recall, and Specificity of the classifiers over individual classes. Precision and recall values of 0 for a class indicate that all the proteins belonging to that class are misclassified (when the confidence score is 0.95). CAFA-Prior always has a specificity value of 0, because it assigns all the proteins to each class, and as such the number of true negatives is always 0.
  3. A specificity value that is close to 1, for a class whose precision and recall are both 0, indicates that most proteins in the dataset are not in the class (true negatives) and are indeed not assigned to the class. A few proteins from other classes are misclassified into the class (false positives), hence the specificity is slightly less than 1.