Skip to main content

Table 3 The overlap of the pairs that are the most difficult and the easiest to classify correctly by the collection of kernels using cross-validation (CV) and cross-learning (CL) settings

From: A detailed error analysis of 13 kernel methods for protein-protein interaction extraction

Difficulty class

Corpus

 

Total

 Difficulty

GT

 Class/setting

AIMed

BioInfer

HPRD50

IEPA

LLL

#

%

difficult

unknown

D CV

537

1 077

41

82

39

1776

10.4

  

D CL

628

1 003

35

99

37

1802

10.6

  

D =DCVDCL

105

530

8

28

0

671

3.9

  

p-value

10−10

10−281

1 0−2

10−8

1.0

  
 

positive

PD CV

162

281

20

32

17

512

12.2

  

PD CL

142

319

15

26

16

518

12.3

  

PD =PDCVPDCL

61

111

2

9

7

190

4.5

  

p-value

10−60

10−95

1 0−1

10−7

10−6

  
 

negative

ND CV

463

610

37

50

39

1199

9.3

  

ND CV

557

644

32

37

28

1298

10.1

  

ND =NDCVNDCL

184

295

12

19

11

521

4.0

  

p-value

10−76

10−204

10−6

10−15

10−4

  

easy

unknown

E CV

2137

1870

85

83

36

4211

24.7

  

E CL

777

2563

45

95

73

3558

20.8

  

E =ECVECL

464

1017

23

20

4

1528

8.9

  

p-value

10−45

10−184

10−7

10−3

1.0

  
 

positive

PE CV

104

301

26

48

36

515

12.3

  

PE CL

115

364

29

27

22

557

13.3

  

PE =PECVPECL

49

147

6

10

7

219

5.2

  

p-value

10−59

10−136

1 0−3

10−7

1 0−2

  
 

negative

NE CV

2105

1752

59

94

23

4033

31.3

  

NE CL

593

2548

32

87

21

3281

25.5

  

NE =NECVNECL

440

1014

21

27

8

1510

11.7

  

p-value

10−88

10−215

10−12

10−7

10−5

  
  1. We also indicated the size of each set, because they vary depending on the size of success level classes. Abbreviations D, E, PD, ND, PE, and NE refer to the set of difficult (unknown class label), easy (unknown class label), positive difficult, negative difficult, positive easy and negative easy pairs, respectively; GT means ground truth. We highlighted with bold the number pairs in the intersection of CV and CL settings. We show the p-value of Fisher’s independence χ2-test rounded to the closest factor of 10. Bold typesetting indicates that the size of the overlap is too low.