Skip to main content

Table 1 Algorithm to simulate the null cumulative distribution functions

From: Variable selection for binary classification using error rate p-values applied to metabolomics data

• Generate N IIUD[0,1] u n ’s

• Assign the first N 0 y n ’s as 0 and the remainder as 1

• Minimize \( \frac{w_0}{N_0}{\displaystyle {\sum}_n^N\left(1-{y}_n\right)I\left({u}_n>b\right)+\frac{w_1}{N_1}{\displaystyle {\sum}_{n=1}^N{y}_nI\left({u}_n\le b\right)}} \) by varying b over the midpoints of the increasingly ordered u n ’s to obtain \( {\widehat{er}}_{up}^{*} \)

• Repeat these steps M times to build up a file of iid copies of \( {\widehat{er}}_{up}^{*} \), say \( {\widehat{er}}_{up}^{*}(m),\;m=1,\dots, M \), whose empirical distribution function provides a simulation approximation of the null CDF

• If T of the \( {\widehat{er}}_{up}^{*}(m)\hbox{'}s \) fall below an actually observed \( {\widehat{er}}_{up}^{*} \) its associated p-value is approximately T/M. Approximations are more accurate for large M.