Data set | BBQ | Proposed method |
---|
χ2
p-value | Calibrated points | Range | χ2
p-value | Calibrated points | Range |
---|
Lung Cancer |
0.087
|
2
|
0.27
|
0.038
|
3
|
0.62
|
SPECT | <0.001
|
4
|
0.75
| <0.001
|
5
|
0.79
|
Parkinsons |
0.544
|
2
|
0.11
|
0.006
|
3
|
0.28
|
Arcene |
0.032
|
3
|
0.61
|
0.623
|
5
|
0.60
|
Suicide |
0.497
|
2
|
0.05
|
0.724
|
4
|
0.34
|
Arrhythmia |
0.389
|
2
|
0.26
|
0.012
|
4
|
0.43
|
Breast Cancer | <0.001 | 3 | 0.96 | <0.001 | 8 | 0.98 |
Contraception |
0.867
|
1
|
0.003
|
0.380
|
4
|
0.52
|
- The data sets with large overlaps in the score distributions are emphasized in boldface. The proposed method consistently achieves a larger number and more dynamic range of calibrated points. Note the Contraception data set has one calibration point on the reliability diagram, but a finite range. This is due to the number of calibration points being calculated from the number of (binned) points in the reliability diagram