Skip to main content

Table 5 In order to measure which method best predicts the individual class values made by a test judge between two methods, we apply the signed rank test. We also count query document pairs where the predicted probability of the class value is bigger for each method (and also ties). An asterisk marks th better result when the difference has a p-value less than 0.05 by the signed rank test. The optimal parameters are the single parameter optimizations of Table 1.

From: Improving a gold standard: treating human relevance judgments of MEDLINE document pairs

Judge

M 4 vs M 5

 

M 4

M 5

=

0

1992

3008*

0

1

2546

2454*

0

2

2864*

2136

0

3

2598

2402*

0

4

2148

2851*

1

5

2247

2753*

0

6

2527

2473*

0

7

3392*

1608

0

8

3798*

1202

0

9

2676

2324*

0

10

2802*

2198

0

11

2084

2916*

0

12

2938*

2062

0

Total

34612

30387

1