Skip to main content

Table 2 Log of Probability Measures for all the methods using rigorous values. The best performance in each row is marked with an asterisk

From: Improving a gold standard: treating human relevance judgments of MEDLINE document pairs

Judge

M 1

M 2

M 3

M 23

M 4

M 5

0

-8897

-8884

-8384

-8704

-8501

-8202*

1

-7103

-7085

-7006

-6843

-6940

-6690*

2

-6900

-6884

-6889

-6687

-6701

-6371*

3

-6806

-6729

-6699

-6493

-6734

-6192*

4

-7694

-7637

-7501*

-7560

-8121

-9350

5

-7131

-7045

-6912

-6872*

-7259

-7514

6

-7044

-6993

-6884

-6814*

-7026

-7237

7

-7110

-7149

-7035

-6876

-6557

-6446*

8

-7354

-7521

-7374

-7266

-6559*

-6838

9

-7122

-7040

-7100

-6911*

-7004

-7125

10

-8032

-8128

-7862

-7881

-7576

-7545*

11

-7281

-7123

-7071

-7008*

-7450

-7593

12

-8153

-8305

-8044

-8047

-7694*

-8056

Ave

-7433

-7425

-7289

-7228

-7240

-7320