Skip to main content

Table 3 Prediction performance on molecular function classes, over the cross-validation dataset.

From: Protein Function Prediction using Text-based Features extracted from the Biomedical Literature: The CAFA Challenge

Function

#

Training Proteins

#

Test Proteins

Text-KNN

Base-Prior

Base-Seq

   

P

R

F

P

R

F

P

R

F

GO:0005488

10720

2680

0.65

0.88

0.75

0.63

0.64

0.63

0.67

0.75

0.71

GO:0003824

2943

736

0.52

0.23

0.32

0.16

0.15

0.15

0.38

0.29

0.33

GO:0030528

1276

319

0.44

0.24

0.31

0.07

0.07

0.07

0.49

0.37

0.42

GO:0005215

782

196

0.59

0.38

0.46

0.04

0.04

0.04

0.50

0.43

0.46

GO:0060089

738

184

0.39

0.16

0.22

0.04

0.04

0.04

0.26

0.27

0.27

GO:0030234

485

121

0.43

0.05

0.08

0.03

0.03

0.03

0.16

0.09

0.12

GO:0005198

334

84

0.04

0.01

0.01

0.02

0.02

0.02

0.11

0.11

0.11

GO:0016247

58

14

0.60

0.24

0.35

0.01

0.01

0.01

0.00

0.00

0.00

GO:0009055

54

14

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

GO:0045182

21

5

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

  1. The text-based classifier, Text-KNN, is compared with two baselines: Base-Prior, and Base-Seq. The columns P, R, and F refer, respectively, to the Precision, Recall, and F-measure of the classifier over individual GO categories. A precision and recall values of 0 on a class indicates that all the proteins belonging to that class are misclassified into another class.