Number of selected genes. The figure shows details on how the number of selected genes affect aRand for the variance-based gene selection method STD for each of the six data sets Alizadeh, Finak, Galland, Herschkowitz, Jones and Ye. The distributions represented by the boxplots are based on 80 (64) cluster analyses for the gene selections choosing 500 genes or less (number in parenthesis is for 1000 genes or more). The cluster analyses consist of combinations of the following sub-processes: normalizations norm.pt, norm.pt.bkg, norm.glob and norm.glob.bkg; standardization and nor standardization; missing value imputation by ROW and SVD; clustering methods hclust.corr.ward, hclust.eucl.ward, hclust.manh.ward, kmeans and Mclust (for 500 genes or less). The horizontal lines show the median (dashed line) and 95-percentile (dotted line) for the distribution of aRand values for random classifications.