Volume 13 Supplement 3
Win percentage: a novel measure for assessing the suitability of machine classifiers for biological problems
© Parry et al.; licensee BioMed Central Ltd. 2012
Published: 21 March 2012
Selecting an appropriate classifier for a particular biological application poses a difficult problem for researchers and practitioners alike. In particular, choosing a classifier depends heavily on the features selected. For high-throughput biomedical datasets, feature selection is often a preprocessing step that gives an unfair advantage to the classifiers built with the same modeling assumptions. In this paper, we seek classifiers that are suitable to a particular problem independent of feature selection. We propose a novel measure, called "win percentage", for assessing the suitability of machine classifiers to a particular problem. We define win percentage as the probability a classifier will perform better than its peers on a finite random sample of feature sets, giving each classifier equal opportunity to find suitable features.
First, we illustrate the difficulty in evaluating classifiers after feature selection. We show that several classifiers can each perform statistically significantly better than their peers given the right feature set among the top 0.001% of all feature sets. We illustrate the utility of win percentage using synthetic data, and evaluate six classifiers in analyzing eight microarray datasets representing three diseases: breast cancer, multiple myeloma, and neuroblastoma. After initially using all Gaussian gene-pairs, we show that precise estimates of win percentage (within 1%) can be achieved using a smaller random sample of all feature pairs. We show that for these data no single classifier can be considered the best without knowing the feature set. Instead, win percentage captures the non-zero probability that each classifier will outperform its peers based on an empirical estimate of performance.
Fundamentally, we illustrate that the selection of the most suitable classifier (i.e., one that is more likely to perform better than its peers) not only depends on the dataset and application but also on the thoroughness of feature selection. In particular, win percentage provides a single measurement that could assist users in eliminating or selecting classifiers for their particular application.
Machine classifiers and feature selection algorithms have been proposed for clinical diagnosis and prediction based on favorable comparisons to competing methods [1, 2]. For high-throughput biomedical data, feature selection is a necessary preprocessing or embedded step that can bias the comparison of classifiers. In an effort to compare classifiers fairly, we introduce the idea of classifier "suitability" to a particular application. Every classifier is more or less suited to model particular feature relationships. For example, linear classifiers anticipate modeling features exhibiting a mean shift between classes, whereas nonlinear classifiers can model more complex corner shapes or quadratic curves . Classifier suitability depends on two key aspects: (1) how frequently the feature relationships it models discriminate between classes in the data and (2) how thoroughly we explore the feature space to find those relationships. We propose "win percentage" as the probability that a classifier will perform better than its peers on a finite random sample of feature sets.
As an analytical tool to aid in our estimation of win percentage, we design a Monte Carlo wrapper (MCW) algorithm for feature selection that gives each classifier equal opportunity to find informative feature sets. MCW succeeds when its best-performing feature set is among a top-performing fraction of all possible feature sets. This fraction, combined with a tolerated failure rate, defines the number of random samples that MCW must explore. We show that the most suitable classifier for an application depends on how thoroughly we explore the feature space, and apply win percentage in the analysis of eight biomedical gene expression classification problemsa .
Determining the most suited classifier to a particular problem has applications in many domains but we are most interested in the translation of machine learning algorithms for clinical diagnosis and prediction. Ideally, an exhaustive search of all feature sets could identify the optimal feature set for each classifier. However, for high-throughput biomedical data many thousands of features make this infeasible and necessitate the use of feature selection methods. A multitude of computationally efficient, yet suboptimal, feature selection methods have been proposed [5, 6] but these have made comparing the resulting learning machines more difficult and potentially exclude otherwise suitable feature sets. Often, the same feature selection method precedes the comparison of all classifiers using cross-validation. However, the performance of a classifier depends on the feature selection method that precedes it. One way to deal with this inherent dependency is to consider a combinatorial approach of feature selection methods and classifiers, selecting combinations of both that perform well on cross-validation [7, 8]. Another way is to attempt to find a feature selection method that performs well for a variety of typical datasets . We simplify both approaches by considering a single unbiased feature selection method that gives every classifier equal chance to perform well. Instead of finding a classifier that performs well for a given feature selection method, we attempt to identify classifiers that fit the problem.
Feature selection methods can be categorized into filter- and wrapper-based approaches. Filter-based methods rank genes based on some measure of utility such as the difference between class means (e.g., t-test p-value or fold-change). This emphasis on class means favors linear classifiers that consider the mean as the single distinguishing characteristic among classes (e.g., nearest centroid). However, nonlinear classifiers have been shown to perform well for a variety of problems  and deserve equal treatment when it comes to feature selection. Wrapper-based feature selection attempts to find feature sets that perform well for a particular classifier using that classifier as a black box . Several heuristic wrapper-based feature selection methods are commonly used for nonlinear classifiers, such as sequential forward selection or backward elimination . However, these suffer from a nesting structure that causes all explored feature sets to contain highly overlapping feature membership. One way to give each classifier an equal chance of finding a suitable feature set is to conduct a randomized search of the feature space using a wrapper-based approach. The classification performance of each candidate classifier determines the quality of a feature set.
Randomized algorithms come in two basic varieties: those that provide the correct answer for a given input every time (Las Vegas), and those that may give different answers to the same problem on multiple runs (Monte Carlo) . Las Vegas algorithms have been proposed for feature selection [5, 12] but have fallen out of favor perhaps due to the relative success of faster heuristic methods. Monte Carlo feature selection has been used to select features that commonly appear in different cross-validation runs . Stochastic algorithms such as simulated annealing and genetic algorithms offer a compromise in that previous results guide the search but maintain randomness to avoid local optima .
Regardless of feature selection method, the utility of a particular classifier depends not only on its performance on a carefully selected feature set but also on the difficulty in discovering that feature set. That is, depending on computational resources and time, the most suitable classifier may change. By randomly sampling feature sets, we remove classifier bias and separate the comparison of classifiers from feature selection. Although this approach requires significantly more computational resources than heuristic methods, it provides a foundation for a fair comparison between classifiers.
First, we motivate our study by illustrating that each classifier appears to perform better than its peers for each dataset given the right feature set. Therefore, the difficulty in finding the right feature set must play a role in determining the suitability of a classifier. Second, we show that win percentage accomplishes this goal in a simple example, and demonstrate the correspondence between the continuous version of our win percentage and the discrete version. After demonstrating the utility of our approach using synthetic datasets from known distributions, we apply it to analyze datasets from the FDA MAQC-II Project .
Demonstrating the utility of each classifier for each dataset
Estimated classifier performance for breast cancer, pathological complete response
An illustrative example of win percentage
This example shows the fundamental difference between classifiers that we are modeling; some classifiers perform well on a wide variety of feature sets, whereas other classifiers perform well on the right set of features. A fair way to compare them is to consider how thoroughly we can explore the feature space given practical computing limitations.
To explore a wide variety of probability densities, we repeat the previous example 100 times and compare the theoretical and discrete estimate of the win percentage in Equation 7 and 11, respectively. We varied p(x|c) by drawing means from N(0.5,0.1), standard deviations from |N(0,0.1)|, and p(c) uniformly. The root mean square error (RMSE) across 100 random distributions, 100 repeated trials of drawing M = 1,000 samples, and 1 ≤ N ≤ 40 was 4.2%. Increasing M to 10,000 reduces the error to 1.0%. These results show a clear correspondence between the ideal case with known continuous distributions and the more practical discrete distributions. We next consider cases where the underlying distributions are not parametric.
Gene expression datasets
We apply the win percentage analysis on clinical gene expression microarray data. We constrain the feature set space to contain all Gaussian feature pairs, corresponding to our Gaussian candidate classifiers. Evaluating all pairs for all datasets required approximately 100 days of computation using MATLAB on 1.95 GHz servers with 20 GB of RAM. We use a discrete distribution of p(x,c) to compute win percentage.
The win percentage at N = 1010 reveals the top classifier considering all feature sets. However, these results are not statistically significant because they are based on the performance estimate from only one "best" feature set. Under the null hypothesis that all classifiers have equal chance to perform better, a repeat performance estimate would likely identify a different classifier. Focusing on win percentages outside the shaded statistically insignificant region, we find significant win percentages for smaller N. If we are content with a feature set performing among the top 0.05% of all feature sets 99% of the time, we may focus our attention on N = 104. Exploring feature sets at this level of thoroughness, UDA performs near the top on five of the six non-control datasets.
For panels A, B, C, D, G, and H linear classifiers appear to perform better when exploring a small number of feature sets. For panels A, B, C, E, F, G, and H nonlinear classifiers perform significantly well for larger N. These data suggest nonlinear classifiers perform better when we explore the feature space more thoroughly, and linear classifiers perform better when we do not. On the other hand, the neuroblastoma data in panels E and F show that nonlinear classifiers also have significantly high win percentage for smaller values of N. The positive control in panel G shows the most striking result. LDA has significantly high win percentage for N ≤ 105. Surprisingly, the negative control has statistically significant win percentages for small N. This suggests that the null hypothesis that every classifier has equal chance to perform better than its peers on a given feature set is not true. Most striking is N = 1. In this case, millions of feature sets are analyzed and the null distribution would expect that every classifier perform better than its peers almost exactly 1/6 of the time. This is clearly not the case, suggesting that knowing the top performing classifier for one feature set may influence our expectation for other feature sets. We revisit this discrepancy in the discussion.
Insignificantly high win percentage helps eliminate some classifiers from consideration. For example for N > 1, UDA for dataset B; SDA for dataset C; QDA for dataset D; NC and SDA for dataset E; NC, DLDA, and SDA for dataset F; DLDA, SDA, and for dataset G; and DLDA and SDA for dataset H do not show a significantly high win percentage. Some classifiers clearly fail based on our significance test and may be considered unsuitable for some combinations of dataset and N.
Multiple sampling of microarray data
Random features required for ε and p
1 × 10-6
1 × 10-6
1 × 10-6
1 × 10-6
1 × 10-6
1 × 10-6
The suitability of a classifier for a dataset cannot be determined after feature selection. We show that given the right feature set, any of the six classifiers examined here could be judged as suitable. However, if we consider the difficulty of finding a good feature set for a classifier we may evaluate a classifier for a dataset rather than for a particular feature set. These ideas motivate our proposed win percentage measure for comparing the relative suitability of a classifier to a dataset. However, as an initial investigation there are several points that bear consideration for future study.
We provide examples to illustrate the potential usefulness of win percentage for analyzing and comparing classifier performance. Eventually, we would like to use win percentage to inform the model building process. One approach that seems promising is to use win percentage to assist practitioners in selecting or eliminating classifiers from consideration. After determining a suitable classifier, we could choose a tailored feature selection method within cross-validation to estimate its performance.
One key aspect of our approach is that we do not attempt to model the absolute performance of each classifier across the feature space. Win percentage only compares classifiers and does not comment on their absolute performance. In general, we would expect these classifiers to perform near chance on the random labels. However, we observe that the mean of X appears to exceed 0.5 on every dataset including the negative control. This bias can be explained by the selection of one best performance among the six candidate classifiers. The expected value of the largest sample among six random samples from a Gaussian distribution is μ + 1.27σ [14, 16]. Therefore, it is reasonable to expect the observed mean shifts. Future work might incorporate whether the top classifier performs better than chance on the dataset.
In estimating the performance of each classifier for a feature set, we use two iterations of three-fold cross-validation. Such a method is itself a randomized algorithm and multiple trials produce different results. In particular, our notion of "best" may be extended to include those classifiers that perform insignificantly differently from the best or "among the better" classifiers during cross-validation. This improvement would likely move all win percentage curves closer to 1/6 in Figure 5 and reduce the apparent significance of all results. In particular, it would partly address the apparent significance of the negative control (randomly labeled) dataset.
Intuitively, we would expect the negative control to exhibit win percentages that are likely to be drawn from the null distribution. For N = 1 in panel H of Figure 5, this is obviously not the case. Another contributing factor could be that the feature sets are not independent of each other. If a classifier performs better than its peers on a single feature, it would stand to reason that it is more likely to perform better than its peers on all feature sets containing that feature. If this is the case, it reduces the number of independent observations used to compute the null distribution. In the extreme, the win percentage computed from individual features is also exhibited by all feature pairs. In this case, the number of independent observations is reduced from C(F,2) combinations to merely F. We can easily adjust our critical win percentages by reducing M to F in Equation 10 and using Equations 12-14. By doing this, none of the win percentages for the random endpoint are significant. However, estimating the actual redundancy among feature sets for an arbitrary dataset proves difficult as does adjusting M. Future work could estimate the null distribution empirically by computing win percentage using multiple permutations of the class labels for each dataset. This computationally expensive approach could lend insight into the true null distribution and the effective number of independent feature sets implied by M in our theoretical null distribution.
Although we focused on a pair-wise analysis of the feature space, our proposed approach easily extends to higher dimensions. Whereas it is often impractical to estimate the performance of all feature triplets or quadruplets, these data suggest that sampling only 750N of these higher dimensional feature sets may be useful in comparing classifiers that explore N random feature sets. As the feature sets become larger, it may also be useful to define the probability of selecting each feature set. For example, one can favor features based on a preferred ranking criterion. Whereas heuristic methods quickly find local minima, the randomness in this approach makes a more thorough exploration of the feature space possible.
We propose a novel way to compare classifiers based on the probability that they will outperform their peers (win percentage) on a random sample of the feature sets. Unlike cross-validation that estimates classifier performance using random subsets of all samples, win percentage estimates classifier suitability using random subsets of all feature sets. First, we illustrate the utility of this approach using all Gaussian feature pairs. Then, we show that precise estimates (within 1%) can be achieved using a smaller random sample of all feature pairs.
We show that win percentage performs as expected on synthetic datasets and then apply it to real microarray data. We observe that the selection of the most suitable classifier does not only depend on the dataset but also on the thoroughness of feature selection. In addition, the results suggest that nonlinear classifiers perform better when the feature space is explored more thoroughly and linear classifiers perform better when it is not. Using a theoretical null distribution, we can exclude some classifiers from consideration because their win percentage falls within a statistically insignificant region.
In an effort to assess the suitability of a classifier to a dataset, we first attempt to find feature sets for which each classifier performs better than its peers. In order to compare multiple classifiers across all feature sets, we propose estimating the probability that each will perform better than its peers will, given an incomplete sample of feature sets. We refer to this probability as "win percentage." If a classifier performs well on one feature set and poorly on all others, the likelihood of winning will depend on the certainty in selecting that feature set. However, a classifier that performs well on a large variety of feature sets is more likely to win even when only a small group of feature sets is considered.
Randomized feature selection
Pseudocode for a Monte Carlo wrapper-based feature selection algorithm
1. xout = -∞
2. For i = 1 to N
S i = randomSubset(S)
(x i , C i ) = performance(S i )
If x i >xout,
Sout = S i , xout = x i , cout = randomElement(C i )
3. output Sout, xout, cout
Selecting the number of iterations for MCW
where we use the first-order Maclaurin approximation for ln(1-p)≈-p . This indicates a simple inverse relationship between number of random samples and size of the fraction of top feature sets.
Theoretical win percentage
where the integrand is the probability that classifier c performs best for a feature set with performance x. In terms of MCW, win(c) is the probability that MCW will output cout = c. Given a reasonable approximation to p(x,c) we can estimate which classifiers are more likely to win without needing to run MCW.
Discrete win percentage
Again, win(c) is the probability that MWC will output cout = c and C i is the set of classifiers with equal performance, x i . Win percentage provides the exact fraction of times classifier c performs better than its peers among all possible subsets S N used by MCW.
Although other randomized algorithms could be used to estimate a related definition of win percentage, we chose MCW because of its simplicity. For example, a Las Vegas algorithm continues to explore feature sets until a convergence criterion is met , resulting in an unpredictable N and complicating the resulting mathematical formulation.
Statistical significance of win percentage
Win percentage provides a statistic that potentially reveals which classifiers are more or less suited to a particular problem. We introduce a method for significance testing that helps determine which values for win percentage are unlikely to occur by chance. Specifically, we estimate the null distribution for win percentage and use it to calculate a p-value for each classifier's win percentage.
Using a desired false positive rate of 5%, we use a Bonferroni adjusted significance level of 0.01 (five degrees of freedom for six classifiers). Then, we compute the critical win percentages at the 0.5th and 99.5th percentile of the null distribution. We consider the win percentages between the two critical values to be statistically insignificant.
Classifiers and performance metric
Given a dataset of features and labeled samples, we estimate the distribution of best classifier performance using cross-validation. Each feature set produces one sample (x i ,C i ), where x i is the maximum cross-validation performance among candidate classifiers and C i is the set of classifiers achieving the highest performance. Although a number of methods for cross-validation could be used, we chose two iterations of three-fold cross-validation for its efficiency compared to more iterations or more folds. This allows us to explore a much larger feature space than more computationally complex performance estimates.
Relationships between candidate classifiers
Nearest centroid (NC, 5 d.f.)
Diagonal linear discriminant analysis(DLDA, 6 d.f.)
Linear discriminant analysis (LDA, 7 d.f.)
Spherical discriminant analysis (SDA, 6 d.f.)
Uncorrelated discriminant analysis (UDA, 8 d.f.)
Quadratic discriminant analysis (QDA, 10 d.f.)
When comparing classifier performance, the fraction of correctly classified test samples (accuracy) is a very common performance metric. Usually, the samples for training and testing are selected to have equal proportions in each class. In this work, we use the average of sensitivity and specificity, also known as binary AUC (area under the receiver operating characteristic curve using binary labels). When the class proportions are equal, this measure is equivalent to accuracy. However, in many biomedical applications including those in this paper, the class proportions are skewed. For these data, the average of sensitivity and specificity represents a class-balanced accuracy, i.e., the expected accuracy if the class proportions were balanced. In practical applications of biomedical classification, it may be desirable to favor sensitivity over specificity (or vice versa), justifying a weighted average of the two.
Datasets and feature space
Breast cancer, pathological complete response
5.4 × 107
Breast cancer, estrogen receptor status
5.2 × 107
Multiple myeloma, overall survival
1.4 × 108
Multiple myeloma, event-free survival
1.3 × 108
Neuroblastoma, overall survival
4.8 × 106
Neuroblastoma, event-free survival
4.9 × 106
Neuroblastoma, positive control (gender)
4.6 × 106
Neuroblastoma, negative control (random)
4.7 × 106
Although our approach is extensible to any probabilistic sampling of the feature space, for this investigation we limit ourselves to feature sets containing exactly two features. This allows us to compute an exact win percentage using the complete feature space and compare it to win percentage computed using only a fraction of the complete space. We could equally apply this methodology to sets of three or more features; however, we believe feature pairs illustrate the principal. We explore all features that exhibit a prescribed level of normality based on the standard error of the kurtosis of each feature. Thus, we explore the complete feature space of Gaussian feature pairs. Table 5 also lists the number of features passing the Gaussian test for each dataset and the total number of feature pairs evaluated.
Illustrating the utility of each classifier for each dataset
We propose win percentage as a way to assess the suitability of a classifier to a dataset. If the choice were obvious, there would be no need for such a measure. To illustrate the utility of each classifier for each datasets, we attempt to find gene pairs for which each classifier performs statistically significantly better than its peers. We use the results of the coarse cross-validation (two iterations of three-fold), to rank feature sets by top classifier performance. Then, we reanalyze the top 100 feature sets using a finer-grained 20 iterations of five-fold cross-validation in order to more precisely estimate performance and to find statistically significant differences. Because we use the same folds for every classifier, we use a paired t-test to compare the mean performance of the top classifier to each of the remaining classifiers. For each classifier, we select one feature set that demonstrates its utility, provide a scatter plot to show how the classifier fits the data, and report performance results as well as significance.
Sampling with replacement
In this work, we chose to draw random samples with replacement. Alternatively, we could draw randomly without replacement so that no feature set is drawn more than once within MCW. Although this is a slightly more efficient way to explore the feature space, sampling with replacement allows a simpler presentation and mathematical representation. The difference is subtle even when randomly sampling a subset of size M from a total set of size M. In this case, the expected fraction of unique samples is 63.2% . Using MCW, we typically draw subsets much smaller than the total feature space. The expected fraction of unique samples in a subset of size N from a set of size M is (1-(1-1/M) N )M/N. Conservative estimates of M = 1000000 and N = 100000 result in 95% unique samples, suggesting that we would need to draw ~5% more samples to achieve the same feature space coverage as a random sample without replacement.
a This work is based on an earlier work: Win percentage: A novel measure for assessing the suitability of machine classifiers for biological problems, in ACM International Conference on Bioinformatics and Computational Biology, (Aug. 1-3, 2011) © ACM, 2011.
The authors would like to thank Dr. Richard Moffitt and Dr. Todd Stokes for their insightful feedback and discussion about this work. This work was supported in part by grants from National Institutes of Health (Bioengineering Research Partnership R01CA108468, Center for Cancer Nanotechnology Excellence U54CA119338, 1RC2CA148265), and Georgia Cancer Coalition (Distinguished Cancer Scholar Award to Professor MD Wang), Microsoft Research and Hewlett Packard.
This article has been published as part of BMC Bioinformatics Volume 13 Supplement 3, 2012: ACM Conference on Bioinformatics, Computational Biology and Biomedicine 2011. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/13/S3.
- Altiparmak F, Gibas M, Ferhatosmanoglu H: Relationship preserving feature selection for unlabelled clinical trials time-series. First ACM International Conference on Bioinformatics and Computational Biology:2-4 August 2010; Niagara Falls. 2010, ACM, 7-16.Google Scholar
- Teng S, Luo H, Wang L: Random forest-based prediction of protein sumoylation sites from sequence features. First ACM International Conference on Bioinformatics and Computational Biology: 2-4 August 2010; Niagara Falls. 2010, ACM, 120-126.Google Scholar
- Hua J, Tembe WD, Dougherty ER: Performance of feature-selection methods in the classification of high-dimension data. Pattern Recognition. 2009, 42: 409-424. 10.1016/j.patcog.2008.08.001.View ArticleGoogle Scholar
- Parry RM, Phan JH, Wang MD: Win percentage: a novel measure for assessing the suitability of machine classifiers for biological problems. ACM International Conference on Bioinformatics and Computational Biology; Chicago. 2011, 29-38.Google Scholar
- Dash M, Liu H: Feature selection for classification. Intelligent Data Analysis. 1997, 1: 131-156. 10.1016/S1088-467X(97)00008-5.View ArticleGoogle Scholar
- Guyon I, Elisseeff A: An introduction to variable and feature selection. The Journal of Machine Learning Research. 2003, 3: 1157-1182.Google Scholar
- Chandra B, Gupta M: An efficient statistical feature selection approach for classification of gene expression data. J Biomed Inform. 2011, 44: 529-535. 10.1016/j.jbi.2011.01.001.View ArticlePubMedGoogle Scholar
- Gutkin M, Shamir R, Dror G, Rattray M: SlimPLS: a method for feature selection in gene expression-based disease classification. PloS One. 2009, 4: e6416-10.1371/journal.pone.0006416.PubMed CentralView ArticlePubMedGoogle Scholar
- Parry RM, Jones W, Stokes TH, Phan JH, Moffitt RA, Fang H, Shi L, Oberthuer A, Fischer M, Tong W, Wang MD: k-Nearest neighbor models for microarray gene expression analysis and clinical outcome prediction. Pharmacogenomics J. 2010, 10: 292-309. 10.1038/tpj.2010.56.PubMed CentralView ArticlePubMedGoogle Scholar
- Kohavi R, John GH: Wrappers for feature subset selection. Artificial Intelligence. 1997, 97: 273-324. 10.1016/S0004-3702(97)00043-X.View ArticleGoogle Scholar
- Horowitz E, Sahni S, Rajasekaran S: Computer Algorithms. 1998, New York: Computer Science PressGoogle Scholar
- Liu H, Setiono R: Feature selection and classification: a probabilistic wrapper approach. Industrial and Engineering Applications of Artificial Intelligence and Expert Systems. 1996, 419-424.Google Scholar
- Dramiński M, Rada-Iglesias A, Enroth S, Wadelius C, Koronacki J, Komorowski J: Monte Carlo feature selection for supervised classification. Bioinformatics. 2008, 24: 110-117. 10.1093/bioinformatics/btm486.View ArticlePubMedGoogle Scholar
- Miller BL, Goldberg DE: Genetic algorithms, selection schemes, and the varying effects of noise. Evol Comput. 1996, 4: 113-131. 10.1162/evco.19220.127.116.11.View ArticleGoogle Scholar
- Shi L, Campbell G, Jones WD, Campagne F, Wen Z, Walker SJ, Su Z, Chu TM, Goodsaid FM, Pusztai L: The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nat Biotechnol. 2010, 28: 827-838. 10.1038/nbt.1665.View ArticlePubMedGoogle Scholar
- Harter HL: Expected values of normal order statistics. Biometrika. 1961, 48: 151-165.View ArticleGoogle Scholar
- Miller BL, Goldberg DE: Genetic algorithms, tournament selection, and the effects of noise. Complex Systems. 1995, 9: 193-212.Google Scholar
- Gong Y, Yan K, Lin F, Anderson K, Sotiriou C, Andre F, Holmes FA, Valero V, Booser D, Pippen JE: Determination of oestrogen-receptor status and ERBB2 status of breast carcinoma: a gene-expression profiling study. Lancet Oncol. 2007, 8: 203-211. 10.1016/S1470-2045(07)70042-6.View ArticlePubMedGoogle Scholar
- Shaughnessy JD, Zhan F, Burington BE, Huang Y, Colla S, Hanamura I, Stewart JP, Kordsmeier B, Randolph C, Williams DR: A validated gene expression model of high-risk multiple myeloma is defined by deregulated expression of genes mapping to chromosome 1. Blood. 2007, 109: 2276-2284. 10.1182/blood-2006-07-038430.View ArticlePubMedGoogle Scholar
- Oberthuer A, Berthold F, Warnat P, Hero B, Kahlert Y, Spitz R, Ernestus K, Konig R, Haas S, Eils R: Customized oligonucleotide microarray gene expression-based classification of neuroblastoma patients outperforms current clinical risk stratification. J Clin Oncol. 2006, 24: 5070-5078. 10.1200/JCO.2006.06.1879.View ArticlePubMedGoogle Scholar
- Efron B, Tibshirani R: Improvements on cross-validation: the .632+ bootstrap method. J Am Stat Assoc. 1997, 92: 548-560. 10.2307/2965703.Google Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.