Skip to main content

Table 2 Evaluation of catalytic site predictions.1

From: How accurate and statistically robust are catalytic site predictions based on closeness centrality?

Avg.#/PDB Total accuracy2 Per PDB accuracy3 p-value4 TP & FP rate5 TP:FP ratio 1 correct per PDB6 1 correct expect7
(a.) Raw CC values (no filter)
1.3 6.0 2.7 (10.8) 2.7E-09 2.1/0.4 6.0 7.6 1.1
2.4 6.8 4.2 (11.6) 2.8E-22 4.9/0.7 7.2 15.0 2.0
3.6 6.5 4.5 (10.6) 2.4E-30 7.0/1.0 6.9 19.9 3.1
4.6 6.3 4.7 (10.0) 2.4E-37 8.8/1.3 6.9 23.4 3.9
5.7 6.3 4.9 (9.6) 9.4E-47 11.0/1.6 6.9 27.6 4.8
(b.) Solvent accessibility filter
1.1 14.2 7.5 (17.4) 2.8E-42 5.3/0.3 18.7 15.9 1.0
2.2 13.0 9.2 (16.8) 7.5E-72 9.7/0.6 16.9 25.4 1.9
3.3 11.1 8.7 (14.7) 6.8E-82 12.2/0.9 14.2 29.3 2.9
4.4 10.8 8.9 (13.2) 4.5E-103 15.8/1.2 13.7 36.7 3.9
5.4 10.4 8.9 (12.4) 2.7E-120 4 18.8/1.4 13.2 41.3 4.8
(c.) Residue identify filter
1.1 22.4 11.3 (21.0) 3.8E-83 8.3/0.3 32.6 23.0 1.0
2.2 19.6 13.5 (19.8) 8.8E-134 14.5/0.5 27.6 35.7 1.9
3.2 17.9 13.8 (18.2) 0.0 19.2/0.8 24.7 42.8 2.8
4.3 17.6 14.3 (17.0) 0.0 25.0/1.0 24.1 50.5 3.7
5.2 16.5 13.9 (156.3) 0.0 29.3/1.3 22.4 56.2 4.7
(d.) Combination filter (solvent accessibility + resodue identify)
1.1 25.2 12.9 (21.8) 0.0 18.6/0.5 39.0 26.1 1.0
2.1 20.7 14.4 (20.7) 0.0 31.0/1.0 30.8 36.7 1.9
3.1 17.9 13.5 (17.1) 0.0 39.9/1.5 26.2 44.2 2.7
4.1 15.9 12.8 (14.6) 0.0 45.4/2.1 21.8 49.8 3.6
5.2 13.9 11.7 (13.1) 0.0 50.0/2.7 18.7 53.0 4.6
  1. 1 Statistics describing the accuracy of the accessibility-filtered prediction on the SCOP superfamily dataset. 2 Accuracy is defined as the percentage of correct catalytic residue predictions out of the total number of predictions for the entire collapsed dataset. In all cases, the random expectation is 0.9%. 3 Average value (and standard deviation) of accuracy calculated on a per protein basis. 4 The probability that the null hypothesis is correct calculated from the binomial distribution. 5 The true positive rate is the percent correct of the total number of catalytic residues within the CSA; similarly, the false positive rate is the percent incorrect predictions of the total number of noncatalytic residues. 6 The percent of proteins with at least one correct prediction. 7 The expected percent of proteins with at least one correct assuming a random model.