Function | Text-KNN (confidence = 0.95) | CAFA-Prior (confidence = 0.01) | CAFA-Seq (confidence = 0.95) | GOtcha (confidence = 0.95) |
---|
 | P | R | S | P | R | S | P | R | S | P | R | S |
---|
binding
(212 proteins) | 0.643 | 0.17 | 0.87 | 0.579 |
1
| 0.00 |
0.9
| 0.085 |
0.987
| 0.723 | 0.16 | 0.916 |
transporter activity
(28 proteins) | 0.00 | 0.00 | 0.97 | 0.077 |
1
| 0.00 | 0.5 | 0.036 |
0.997
|
0.714
| 0.179 | 0.994 |
catalytic activity
(165 proteins) | 0.312 | 0.03 | 0.95 | 0.451 |
1
| 0.00 | 0.714 | 0.03 |
0.990
|
0.917
| 0.067 |
0.995
|
- The text-based classifier, Text-KNN, is compared with baseline results provided by the CAFA challenge: CAFA-Prior, CAFA-Seq, and GOtcha. The confidence threshold used for each classifier is shown under its name in the respective column. A confidence threshold of 0.01 is used for CAFA-Prior because the classifier does not make any predictions for the 'transporter activity' class at higher confidence thresholds.
- The columns P, R, and S refer, respectively, to the Precision, Recall, and Specificity of the classifiers over individual classes. Precision and recall values of 0 for a class indicate that all the proteins belonging to that class are misclassified (when the confidence score is 0.95). CAFA-Prior always has a specificity value of 0, because it assigns all the proteins to each class, and as such the number of true negatives is always 0.
- A specificity value that is close to 1, for a class whose precision and recall are both 0, indicates that most proteins in the dataset are not in the class (true negatives) and are indeed not assigned to the class. A few proteins from other classes are misclassified into the class (false positives), hence the specificity is slightly less than 1.