Skip to main content

Table 3 Number of classification matches at various rates of false positives in the 'borderline' DC1.1993 dataset

From: Improving classification in protein structure databases using text mining

Errors

CATH superfamily classification matches (TP)

  

TEXT

SSAP

SSAP + TEXT

False Positive Rate

Number of errors

Coverage

Cutoff

Coverage

Cutoff

Coverage

Cutoff

10-5

31

8; 0.04

77.70

16; 0.09

79.94

98; 0.58

0.9808

10-4

306

96; 0.57

48.86

229; 1.36

79.40

585; 3.48

0.6792

10-3

3060

707; 4.21

20.75

1677; 10.00

76.66

2571; 15.33

0.2982

10-2

30598

3036; 18.10

7.83

5808; 34.64

71.24

6901; 41.16

0.0706

  1. Coverage is the fraction of true classification matches and is shown as actual numbers and as a percentage of total TP (%). Scores range between 1 and 100, 30 and 80, and 0 and 1 for the TEXT, SSAP and SSAP + TEXT classifiers, respectively. Total comparisons: 3076606, positive matches: 16765.