Skip to main content
Figure 1 | BMC Bioinformatics

Figure 1

From: Improved variance estimation of classification performance via reduction of bias caused by small sample size

Figure 1

Repeated random sampling with different test set sizes. Results from repeated random sampling where the size of the design sets was set to 30% of the total sample size and the size of the test set was varied from 5% to 70%. For each test set size the data was divided randomly into design and test sets 1,000 times, with the class proportions kept constant. The endpoints (dotted) of a two-sided 95% CIs, based on a histogram of 1,000 estimates, is displayed for the different values of the test set size, N t . Apparently the widths of the empirical CIs decrease as Nt increases. Also displayed are the estimated averages as a function of the test set size (solid). Since each CI is based on the histograms instead of estimates of the average and variance, note that the CIs are asymmetric with respect to the estimated average.

Back to article page