Data set
|
BBQ
|
Proposed method
|
---|
χ2
p-value
|
Calibrated points
|
Range
|
χ2
p-value
|
Calibrated points
|
Range
|
---|
Lung Cancer
|
0.087
|
2
|
0.27
|
0.038
|
3
|
0.62
|
SPECT
|
<0.001
|
4
|
0.75
|
<0.001
|
5
|
0.79
|
Parkinsons
|
0.544
|
2
|
0.11
|
0.006
|
3
|
0.28
|
Arcene
|
0.032
|
3
|
0.61
|
0.623
|
5
|
0.60
|
Suicide
|
0.497
|
2
|
0.05
|
0.724
|
4
|
0.34
|
Arrhythmia
|
0.389
|
2
|
0.26
|
0.012
|
4
|
0.43
|
Breast Cancer
|
<0.001
|
3
|
0.96
|
<0.001
|
8
|
0.98
|
Contraception
|
0.867
|
1
|
0.003
|
0.380
|
4
|
0.52
|
- The data sets with large overlaps in the score distributions are emphasized in boldface. The proposed method consistently achieves a larger number and more dynamic range of calibrated points. Note the Contraception data set has one calibration point on the reliability diagram, but a finite range. This is due to the number of calibration points being calculated from the number of (binned) points in the reliability diagram