From: A comprehensive sensitivity analysis of microarray breast cancer classification under feature variability

Impact of preprocessing variability on feature selection for the Affymetrix datasets. Comparison of top-100 ranked features lists F100, m, kand F100, m', k, as obtained using different preprocessing strategies m and m', for different splits k. A) Percentage of the top-half of one list that is found in the other list, and vice vera. Each boxplot represents the distribution of such percentages over 50 splits, for a specific pair (m, m') (indicated on top of the figure). For each split, we determine the percentage of F50, m, kfound in F100, m', kand the percentage of F50, m', kfound in F100, m, k. Each distribution thus contains 50·2 = 100 points. All boxplots corresponding to the same preprocessing pair are colored similarly. In total there are 15 distinct pairs. The pairs are ordered by the observed median overlap over all six datasets. B) Distributions of the relative strength scores for top-ranked feature lists corresponding to the various preprocessing pairs. C) Relative strength of the top-100 multi-ranked gene lists with respect to the original rankings, for each preprocessing method and each Affymetrix dataset.

