 Proceedings
 Open Access
 Published:
Building gene expression profile classifiers with a simple and efficient rejection option in R
BMC Bioinformatics volume 12, Article number: S3 (2011)
Abstract
Background
The collection of gene expression profiles from DNA microarrays and their analysis with pattern recognition algorithms is a powerful technology applied to several biological problems. Common pattern recognition systems classify samples assigning them to a set of known classes. However, in a clinical diagnostics setup, novel and unknown classes (new pathologies) may appear and one must be able to reject those samples that do not fit the trained model. The problem of implementing a rejection option in a multiclass classifier has not been widely addressed in the statistical literature. Gene expression profiles represent a critical case study since they suffer from the curse of dimensionality problem that negatively reflects on the reliability of both traditional rejection models and also more recent approaches such as oneclass classifiers.
Results
This paper presents a set of empirical decision rules that can be used to implement a rejection option in a set of multiclass classifiers widely used for the analysis of gene expression profiles. In particular, we focus on the classifiers implemented in the R Language and Environment for Statistical Computing (R for short in the remaining of this paper). The main contribution of the proposed rules is their simplicity, which enables an easy integration with available data analysis environments. Since in the definition of a rejection model tuning of the involved parameters is often a complex and delicate task, in this paper we exploit an evolutionary strategy to automate this process. This allows the final user to maximize the rejection accuracy with minimum manual intervention.
Conclusions
This paper shows how the use of simple decision rules can be used to help the use of complex machine learning algorithms in real experimental setups. The proposed approach is almost completely automated and therefore a good candidate for being integrated in data analysis flows in labs where the machine learning expertise required to tune traditional classifiers might not be available.
Background
Microarrays are one of the latest breakthroughs in experimental molecular biology. They allow to simultaneously monitoring the expression level of tens of thousands of genes. Arrays coupled with pattern recognition methods have been applied to studies in gene expression, genome mapping, transcription factor activity, toxicity, pathogen identification, and many other applications [1–10]
However, although in standard classification problems one has to classify a sample and assign it to one of a set of known classes, in a clinical diagnostics setup in which some classes (phenotypes) may be known but novel unknown classes (new phenotypes) may appear as well, one must be able to reject those samples that do not fit the trained model.
In this paper, we present a set of empirical decision rules designed to implement a rejection option in a set of multiclass classifiers widely used for the analysis of gene expression profiles. In particular, we will focus on the R Language and Environment for Statistical Computing (R for short in the remaining of this paper) [11].
The problem of implementing a rejection option in a multiclass classifier has not been widely addressed in the statistical literature with the exception of a few publications [12–15]. Chow [12] put forth the decision theoretic framework to rejection in pattern recognition. The overall idea is to estimate the class conditional probabilities for a sample and to reject it if the maximum probability is below a given threshold. This simple rejection rule is optimal when the class conditional probabilities can be estimated without errors, which is in contrast with several real setups [16]. Gene expression profiles suffer from the curse of dimensionality problem [17] that negatively reflects on the reliability of probability estimators. The number of available classes and the correct setup of the threshold are additional constraints that limit the reliability of this approach. An attempt to setup perclass thresholds has been proposed by Fumera et al. [18] to mitigate errors in probability estimation. However, the computational effort and the complexity of tuning the resulting classification system increases, limiting a widespread application in laboratory setups. Recently, oneclass classifiers gained attention in the implementation of rejection systems in gene expression profiles [19–22]. These algorithms base the prediction model on the concept of distance among samples rather than on the estimation of class conditional probabilities. They therefore overcome the limited reliability of available class probability estimators. However, increased number of classes, high dimensionality feature spaces such as the one of microarray datasets, noisy features, and quite often not enough samples, still limit their accuracy.
In this paper we will build a set of rejection rules able to work with the very simple and often unreliable class probability estimators provided with the multiclass classifiers implemented in R (see the Methods section for further details). The main contribution of the proposed rules is their simplicity. It makes possible an easy integration with available data analysis environments while maintaining, at the same time, good classification performance. Since in the definition of a rejection model tuning of the involved parameters is often a complex and delicate task, in this paper we exploit an evolutionary strategy to automate this process. This allows the final user to maximize the rejection accuracy with minimum manual intervention.
A complete experimental setup is presented to validate the proposed model on a challenging dataset of blood diseases. A set of three multiclass classifiers widely adopted in the analysis of gene expression profiles which are also available in R has been considered. Results are compared to those obtained building rejection options based on oneclass classifiers [23]. Results show that the proposed decision rules can be efficiently used as a powerful rejection method, outperforming most of the considered oneclass classifiers.
Results and discussion
Experimental setup
The results of this paper have been validated on a dataset of gene expression profiles from complementary DNA (cDNA) microarrays related to very similar phenotypes. Only a reduced subset of genes allows for discrimination (Table 1). This peculiarity increases the complexity of the classification allowing us to better validate the proposed method. It is worth mentioning here that, in all experiments, the trainingset does not include any sample from the testset. This is a given requirement to avoid overoptimistic results and therefore to honestly evaluate the classifiers performances.
The dataset includes a total of 7 phenotypes. Samples have been downloaded from the cDNA Stanford Microarray database [24]. All genes without a valid UnigeneID have been discarded. The expression level of each gene is measured as the logratio between the Cy5 and the Cy3 channel of the array: .
Four sets of samples have been downloaded from a large set of experiments aiming at performing Lymphoma Classification [25, 26]:

Diffuse Large BCell Lymphoma (DLBCL): a nonHodgkin lymphoma disease,

BCell Chronic Lymphocytic Leukemia Wait&Watch (CLLww),

BCell Chronic Lymphocytic Leukemia (CLL), and

Follicular Lymphoma (FL): independent lymphonode samples on LymphoChip microarrays [27].
The remaining three phenotypes in the dataset are:

Acute Lymphoblastic Leukemia (ALL),

Core Binding Factor Acute Myeloid Leukemia (CBFAML): subgroups characterized by shorter overall survival [28],

Acute Myeloid Leukemia 2 dataset (AML2): peripheralblood samples or bone marrow samples of intermediaterisk AML with a normal karyotype.
Three multiclass classifiers often used in gene expression profile analysis have been considered in this study: k–Nearest Neighbors (kNN), feedforward Neural NETwork with a single hidden layer (NNET), and Random Forests (RF). All algorithms are available in R. A detailed description of how data have been processed and how the prediction models for the different classifiers have been trained is available in the Methods section.
Class probability estimates analysis and decision rules
The process of detecting samples to reject in a multiclass classification system can be modeled as a binary classification test discriminating between samples that belong to one of the known classes (target samples) and samples that do not belong to any of them (reject samples). The outcome of the test is measured in terms of:

true positives (TP): target samples correctly accepted,

true negatives (TN): reject samples correctly rejected,

false positives (FP): reject samples erroneously accepted, and

false negatives (FN): target samples erroneously rejected.
The number of TP, TN, FP, and FN adds up to 100% of the dataset. The accuracy in which target samples are assigned to the corresponding class is out of the scope of this work and depends on the accuracy of the selected multiclass classifier.
The multiclass classifiers considered in this paper (RF, NNET and kNN) do not natively implement a rejection option. Discarding reject samples by setting a single threshold on the class probability estimates is inaccurate since class probability estimates show small differences between target and reject samples (refer to the Methods section for specific details on how class probability estimates have been computed). However, this information can still be used for discrimination if coupled with well tuned decision rules.
In order to perform a preliminary qualitative analysis of how class probability estimates change between target and reject samples, we performed a set of multiclass classification experiments generating different splits of the considered dataset (in terms of targets/reject samples and test/training data). For each split, the multiclass classifiers have been trained on a subset of the considered phenotypes, using the remaining data as a set of samples to reject. Figure 1 reports, for each classifier, two density plots that show how the value of class probability estimates of target and reject samples distribute in the performed experiments. In the MAX plot the considered random variable is the highest class probability estimate of each classified sample, split into target samples (solid line) and reject samples (dashed line); in the DIFF plot the considered random variable is the difference between the two highest probability estimates of each sample, again considering target and reject samples. The density functions have been estimated from the experimental data by performing a Gaussian kernel density estimation using the density() command of R.
Although the plots of Figure 1 may seem to suggest a strong overlap between the distributions of target samples (solid lines) and reject samples (dashed lines), a certain amount of separation is still visible. This is particularly evident in the case of RF, that shows a quite visible distinction both in the MAX and in the DIFF plots. In particular, in the DIFF plot of RF, target samples (solid line) have a max around 0.8, far from the max of reject samples (dashed line) that falls around 0.1. This means that, for a target sample, the difference between the two top rated classes is very high (around 0.8 in most of the cases). Instead, reject samples show a very low difference between probability estimates of the two top ranked classes, revealing the inability of the classifier to clearly select a target class. kNN and NNET show smaller separation; however, experimental results will show that a partial discrimination is still possible.
From this preliminary analysis, it seems reasonable that, for the three considered classifiers, the class probability estimates provided by R could potentially be used for detecting reject samples. The idea exploited in this paper to design a set of decision rules for detecting reject samples is to split the MAX plot into three distinct areas: (i) max area, (ii) decision area, and (iii) reject area, delimited by two rejection thresholds T_{ max } and T_{ rej } (T_{ max } >T_{ rej }), as shown in Figure 2 for RF.
Figure 3 describes the overall decision process applied to each sample that must be classified. max_{1} and max_{2} denote the two highest class probability estimates for the considered sample.
If the highest class probability estimate (max_{1}) is lower than T_{ rej }, then the sample falls in the reject area and is rejected to maximize the number of TN (Rule R1 Figure 3, rows 12). Similarly, if max_{1} is higher than T_{ max }, the sample falls in the max area and can be accepted to maximize the number of TP. The class with probability estimate equal to max_{1} is predicted (Rule R2Figure 3, rows 45). The first part of this decision process is very similar to the single threshold method proposed in [12].
Whenever neither R1 or R2 are satisfied, i.e., max_{1} falls between T_{ rej } and T_{ max } (decision area) there are two possible conditions based on the analysis of the difference between max_{1} and max_{2} (DIFF plot of Figure 2):

1.
if max_{ 1 } – max_{2} >T_{ diff }, the sample can be accepted and the class with probability estimate equal to max_{1} is predicted (Rule R3.1Alg. 3, rows 78). T_{ diff } is the minimum difference between the probability estimates of the two top ranked classes that allows us to use max_{1} to perform a reliable classification;

2.
if max_{ 1 } – max_{2} <T_{ diff } , the value of max_{2} is considered. Two cases are possible:

if max_{2} is higher than T_{ rej }, i.e., both max_{1} and max_{2} fall in the decision area, the prediction is considered uncertain (Rule R3.2Alg. 3, rows 1011). In this case the classifier does not produce any classification result. This rule prevents from providing a result when the distinction between two classes is not sufficient to correctly discriminate. In alternative, multiple classification results can be provided to alert the user that the confidence in the prediction is low;

if max_{2} is lower than T_{ rej }, the sample is rejected (Rule R3.3Alg. 3, row 13). This rule mitigates the noise in those samples that fall at the border of the reject area.
The three rejection thresholds (T_{ max }, T_{ rej }, and T_{ diff }) can be empirically chosen analyzing the density plots of Figure 2:

if the MAX plot shows a clear separation between target and reject samples, T_{ max } can be placed in such a way to maximize the number of TP immediately detected by rule R2;

similarly to T_{ max }, T_{ rej } can be placed to maximize the number of TN detected by rule R1;

the definition of T_{ diff } is performed looking at the DIFF plot. A good heuristic is to consider the point where the two curves intersect.
Manually setting the three thresholds is very complex and may easily lead to a high error rate. When the plots do not show a clear separation between target and reject samples, the choice of the thresholds involves a tradeoff between increasing the sensitivity, and lowering the specificity of the classifier. This is a complex optimization task.
All thresholds must be setup only considering information extracted from the considered training data. To tackle the complexity of this process, and to allow the automatic tuning of all rejection parameters, a threshold setup algorithm based on a Covariance Matrix Adaptation Evolutionary Strategy (CMAES) has been developed. The full description of this algorithm is available in the Methods section.
Architecture of the multiclass classifier with rejection option
The proposed decision rules can be easily integrated within the multiclass classification flow provided by R or other similar computational environments. Figure 4 shows the computational flow of the resulting system. As usual when working with classification algorithms, a training set containing known samples is used to train the prediction model then used to classify a set of test (unknown) samples.
Compared to a traditional multiclass classification flow, the proposed system includes two additional modules required to: 1) setup the rejection thresholds, and 2) apply the decision rules.
Setting up the rejection thresholds requires collecting a statistically significant set of class probability estimates for both target and reject samples on which to compute the density plots of Figure 2. At this stage, in which the model is trained and real reject samples are not available, this information can only be collected starting from samples in the training set by setting up several crossvalidation experiments on different folds of these training data. Figure 5 outlines the way this module operates. Let us denote with T the full training set and with T_{ i } a portion of it including only those samples related to a specific phenotype. If k classes are included in T, k subsets of experiments are generated by iteratively excluding one of the classes T_{ i } from T to form a new target class T′. The removed samples are used as a set of reject samples denoted with R (Figure 5, rows 13).
For each subset (Figure 5, row 4), m folds are generated by removing x samples from each subclass contained in T′, and x samples from R. Each fold will therefore generate a testset (Test*) of x · (k – 1) target samples , and x reject samples. To avoid overoptimistic results, target samples of the testset are removed from T′ forming a new training set Train* (Figure. 5, rows 58). Each fold is then used for an independent multiclass classification experiment to obtain the class probability estimates of each test sample in Test*. After running all k · m experiments, the CMAES analyzes the collected probability estimates and provides a set of optimal thresholds (refer to the Methods section for a complete description of this step).
Validation and discussion
The proposed rejection rules have been tested on different groups of experiments based on different configurations of the considered dataset in terms of target and reject samples. The rejection accuracy has been compared with the one of a set of selected oneclass classifiers. Five oneclass classification algorithms have been considered in this comparison:

Parzen oneclass classifier (ParzenOC),

kNN oneclass classifier (kNNOC),

KMeans oneclass classifier (kMeansOC),

Principal Component Analysis (PCA) oneclass classifier (PCAOC), and

SVDD, a SVM based oneclass classifier (SVMOC).
All oneclass classifiers have been implemented and optimized using Matlab’s DD_tools [29], a standard implementation used in several publications on microarray analysis [19–22].
Two methods of using oneclass classifiers have been considered. Let us suppose to have a target class including k different subclasses (phenotypes). The oneclass classification problem can be solved by training either k different oneclass classifiers (one for each subclass) with samples rejected if rejected by all classifiers, or a single oneclass classifier trained with samples of all k subclasses. The first approach will be referred to as Multi OneClass with Voting (MOCV) while the second approach will be referred to as Single OneClass with Multiple Targets (SOCMT).
Four groups of experiments each denoted as G_{ k } (k ∈ [3, 6]) have been generated. In the group G_{ k } the target class includes k out of the 7 available phenotypes. Samples in the remaining 7 – k classes must be rejected. For each group different random combinations of the k classes included in the target profile are considered and for each combination several random splits of data into test and trainingset are generated for a total amount of 40 experiments for each group. For all experiments, the testset includes a balanced number of target and reject samples.
For each experiment data are classified with MOCV, SOCMT, and the three considered multiclass classifiers paired with the decision rules presented in this paper. Rejection thresholds have been automatically computed according to the process described in Figure 5.
Table 2 summarizes the results of the classification. It provides for each classifier (rows), and for each group of experiments (column groups), the average sensitivity (Sens) and specificity (Spec) with the corresponding Confidence Intervals (CI) computed with 95% Level of Confidence (LOC). RF coupled with the decision rules is the classifier that globally better performs with respect to any other available option (its performance is highlighted in bold in Table 2). The other two multiclass classifiers (NNET and kNN) are also comparable or in some cases better than oneclass classifiers. This result can be better appreciated looking at Table 3 reporting the average accuracy improvement of the proposed approach over the two possible configurations of oneclass classifiers. Accuracy is computed as the percentage of samples correctly classified (TP + TN) over the total amount of classified samples. Averages are computed over the 40 experiments of the corresponding group. Looking at the performance of RF (highlighted in bold) one can notice a significant improvement of the accuracy in all experiments compared to oneclass classifiers. The table also highlights how the other two multiclass classifiers have performances comparable to oneclass classifiers in most of the experiments.
A final confirmation of the improvement introduced by the presented approach can be appreciated in the Receiver Operating Characteristic (ROC) curves of Figure 6. For each group of experiments the related ROC curve compares the average performance of the three multiclass classifiers coupled with the decision rules and the two best oneclass classifiers (ParzenOC and SVMOC). In the case of multiclass classifiers the ROC curve is plotted by changing the value of the three rejection thresholds in order to explore as much as possible the space of possible solutions, while, in the case of oneclass classifiers, it is obtained by changing the considered rejection rate. In all experiments RF improves the accuracy of the oneclass classifiers while kNN and NNET provide an accuracy that is comparable to those of oneclass classifiers. This result is obtained using a very simple computational model compared to the one required to setup a oneclass classification model.
All proposed results have been obtained by computing the rejection thresholds using the CMAES with the SS1 objective function (see the Methods section). The diagram of Figure 7 shows the average accuracy obtained during the crossvalidation experiment used to setup the rejection thresholds. Proposed results are for RF coupled with the rejection rules considering the different CMAES objective functions. The dashed diagonal lines represent isoaccuracy lines, with accuracy that decreases from the topleft corner to the bottomright corner. The graph shows that the three functions SS, SS1, and SS2 provide the better accuracy with SS1 providing the best results.
The value of the objective function associated with the thresholds computed by the CMAES can be used as an indicator of the reliability of the trained model. Whenever an objective function is equal to 0, it means perfect discrimination among target samples and reject samples. Values greater than 0 indicate reduced accuracy. This is confirmed by the results of Table 4. It reports for each classifier and for each group of experiments the average accuracy and the average value of the SS1 objective function associated with the computed thresholds. The numbers clearly show how RF, that is the one with better accuracy, has a lower value of the objective function compared kNN and NNET, thus suggesting a more reliable model.
Conclusions
Life sciences are undergoing a true revolution as a result of the emergence and growing impact of a series of new disciplines/tools including genomics, transcriptomics, proteomics and metabolomics. These new disciplines are devoted to the examination of the entire systems of genes, transcripts, proteins and metabolites present in a given cell or tissue type. New technologies allow now to collect huge amounts of data dramatically modifying the way scientific research is carried out. The focus is shifting from the study of ”isolated realities” to the understanding of whole biological systems and the interactions between the huge number of their individual components. From the beginning of this revolution, machine learning immediately appeared as a natural tool for sorting, analyzing, and extracting useful information from these large amounts of data. Unfortunately, some peculiar characteristics of biological data, such as the large number of variables and the low number of samples, challenged even the most robust machine learning algorithms, especially when considering their use in a real clinical setup. This paper shows how the use of simple decision rules can be used to add to stateoftheart multiclass classifiers a rejection option able to discard samples that do not belong to any of the trained classes. Traditionally, this operation is performed by other rejection methods, like oneclass classifiers, which do not perform very well on microarray data. The proposed solution has several advantages:

it can be easily plugged into an environment widely spread in several research groups;

it is simple and does not require high computational resources;

in general it performs better than other available solutions such as those based on oneclass classifiers;

it automatically tunes all parameters for the rejection model, requiring minimum intervention from the user.
Methods
Multiclass classifier setup in R
The three considered multiclassifiers (RF, kNN and NNET) have been trained in R resorting to the Classification And REgression Training (CARET) package, a set of functions that attempt to streamline the process for creating predictive models in R [30]. There are many different modeling functions in R. Some have different syntax for model training and/or prediction. CARET provides a uniform interface to these functions allowing for standardization of common tasks.
Parameter tuning of each classifier in CARET is done via resampling; every candidate model is evaluated many times using crossvalidation. The number of candidate models is set through the setTuneLength parameter. In our experiments we used setTuneLength=5. This value represents a good compromise in terms of computational time of the training phase and accuracy of the results for our experimental setup. Resampling has been performed according to the following parameters that can be set in CARET using the trainControl function:

method = "LGOCV": evaluation of each candidate model is performed using leavegroupoutcrossvalidation,

number = 30: number of resampling for LGOCV,

p = 0.98: percentage of samples in the training set of each resampling.
Data have been preprocessed before performing classification for normalization and dimensionality reduction.
Near zero variance (NZV) features have been removed from the dataset resorting to the nearZeroVar() function of CARET. The resulting data have been then processed using the preProcess() function of CARET. This function allows us to create an object able to perform centering, scaling and dimensionality reduction using PCA. Normalization is performed based on training data, only.
At this stage the dataset is ready for classification. The classification model for each classifier is built using the train function in CARET, and the class probability estimates for each sample in the test set are computed resorting to the extractProb() function. The way class probabilities are estimated by the extractProb() function is classifier dependent:

kNN performs classification based on the k closest training examples. Majority voting is used to predict the target class. The class probability estimate for a class is the number of training neighbors belonging to the class over the k considered neighbors.

RF is an ensemble classifier that consists of many decision trees. It predicts the class that is the mode of the classes predicted by individual trees. Similarly to kNN the class probability estimate for a class is the number of individual decision trees predicting the class over the total number of decision trees in the forest.

NNET predictions are performed by evaluating the values returned by each of the output neurons (one for each available class). The output layer typically outputs the value of a sigmoid function of the linear combination of hidden layer values representing a posterior probability. This value is used as class probability estimate.
Threshold setup modules
A short overview of the CMAES
The covariance matrix adaptation evolution strategy (CMAES) is an optimization method first proposed by Hansen, Ostermeier, and Gawelczyk [31] in mid 90s, and further developed in subsequent years [32, 33]. It is particularly suited for solving optimization problems where no preliminary hypothesis on the solution can be derived. It is therefore a good choice for our specific problem, in which we want to compute the optimal rejection thresholds with a complete automated approach. In our specific application provided solutions are represented by a vector x = (T_{ max }, T_{ rej }, T_{ diff }) containing the three rejection thresholds.
In the CMAES, iteration steps are called generations due to its biological foundations. The value of a generic algorithm parameter y during generation g is denoted with y^{(g)}. The mean vector m^{(g)} ∈ ℝ^{n} represents the favorite, mostpromising solution so far. The step size σ^{(g)} ∈ ℝ_{+} controls the step length, and the covariance matrix C^{(g)} ∈ ℝ^{n×n} determines the shape of the distribution ellipsoid in the search space. Its goal is, loosely speaking, to fit the search distribution to the contour lines of the objective function f to be minimized. C^{(0)} = I
In each generation g, λ new solutions are generated by sampling a multivariate normal distribution with mean 0 (see equation 1).
Where the symbol · ~ · denotes the same distribution on the left and right side.
After the sampling phase, new solutions are evaluated and ranked. x_{i:λ} denotes the i^{th} ranked solution point, such that f(x_{1:λ}) ≤ … ≤ f(x_{λ:λ}). The µ best among the λ are selected and used for directing the next generation g + 1. First, the distribution mean is updated (see equation 2).
In order to optimize its internal parameters, the CMAES tracks the socalled evolution paths, sequences of successive normalized steps over a number of generations. is the conjugate evolution path. is the expectation of the Euclidean norm of a distributed random vector, used to normalize paths. is usually denoted as variance effective selection mass. Let c_{ σ } < 1 be the learning rate forcumulation for the rankone update of the covariance matrix; d_{ σ } ≈ 1 be the damping parameter for step size update. Paths are updated according to equations 3 and 4.
is the evolution path, . Let c_{ c } < 1 be the learning rate for cumulation for the rankone update of the covariance matrix. Let µ_{cov} be parameter for weighting between rankone and rankµ update, and c_{cov} ≤ 1 be learning rate for the covariance matrix update. The covariance matrix C is updated (equations 5 and 6).
where OP (X) = XX^{T} = OP(–X).
Most noticeably, the CMAES requires almost no parameter tuning for its application. The choice of strategy internal parameters is not left to the user, and even λ and µ default to acceptable values. Notably, the default population size λ is comparatively small to allow for fast convergence. Restarts with increasing population size have been demonstrated [34] to be useful for improving the global search performance, and it is nowadays an option included in the standard algorithm.
Objective functions
Four objective functions have been evaluated in the optimization process, all of them trying to optimize the outcome of the classification process. The optimization process stops when it reaches one of three possible conditions:

1.
f reaches a predefined lower bound. This represents the best condition corresponding to the identification of the optimum solution;

2.
The value of f for the current population does not change more than a given value δ. The CMAES reached a local optimum that cannot be further improved with the current population;

3.
The value of f for the last p populations does not change more than a given value α <δ. Again the CMAES reached a local optimum. In this case despite the solution can be still slightly improved, globally, the increment is not significant and therefore it is not worth continuing the optimization.
Sensitivity and specificity are common indicators of the efficiency of a binary classification test that can be exploited in the definition of the objective function. They are here computed taking into account that the outcome of the classification rule may also produce uncertain results:
Based on these two indicators we tested three objective functions defined as follows:
As required by the CMAES that is designed to minimize the objective function, greater values of sensitivity and specificity decrease the value of the objective function. The first function considers the contribution of sensitivity and specificity separately, thus allowing for solutions where mostly only one of the two indicators is maximized. The second and the third functions try to leverage this problem by forcing the optimization towards results where both sensitivity and specificity are equally maximized. In particular SS 1 seems to be the best objective function able to take into account the relationship between sensitivity and specificity
Similarly to sensitivity and specificity, the FScore_{ β } is another statistical indicator of the outcome of a binary test considering the precision p and the recall r of the test. Again when computing p and r uncertain results should be considered as follows:
The FScore_{ β } has been exploited as objective function of the CMAES as follows:
FS 1 and FS 0.5 come from the FS_{ β } where β is set to 1 and 0.5 respectively. Experimental results demonstrated that this function is quite inefficient since it tends to privilege increments of TP penalizing TN.
Abbreviations
 ALL:

acute lymphoblastic leukemia
 AML2:

acute myeloid leukemia 2
 CARET:

classification and regression training
 CBFAML:

core binding factor acute myeloid leukemia
 cDNA:

complementary DNA
 CI:

confidence intervals
 CMAES:

covariance matrix adaptation evolutionary strategy
 CLLww:

Bcell chronic lymphocytic leukemia wait&watch
 CLL:

Bcell chronic lymphocytic leukemia
 DNA:

deoxyriboNucleic acid
 DLBCL:

diffuse large Bcell lymphoma
 FL:

follicular lymphoma
 FP:

false positives
 FN:

false negatives
 kMeansOC:

kMeans oneclass classifier kNN, knearest neighbors
 kNNOC:

kNN oneclass classifier LOC, level of confidence
 LGOCV:

leave group out cross validation MOCV, multi oneclass with voting
 NNET:

neural network
 NZV:

near zero variance
 ParzenOC:

Parzen oneclass classifier PCA, principal component analysis
 PCAOC:

PCA oneclass classifier ROC, receiver operating characteristic
 RF:

random forests
 Sens:

sensitivity SOCMT, single oneclass with multiple targets
 Spec:

specificity SVDD, support vector data description
 SVM:

support vector machine
 SVMOC:

SVM oneclass classifier TP, true positives
 TN:

true negatives.
References
 1.
Ko D, Windle B: Enriching for correct prediction of biological processes using a combination of diverse classifiers. BMC Bioinformatics 2011, 12: 189. 10.1186/1471210512189
 2.
Selvaraj S, Natarajan J: Microarray data analysis and mining tools. Bioinformation 2011, 6(3):95–9. 10.6026/97320630006095
 3.
Toloşi L, Lengauer T: Classification with correlated features: unreliability of feature ranking and solutions. Bioinformatics 2011.
 4.
Dalton LA, Dougherty ER: Application of the Bayesian MMSE estimator for classification error to geneexpression microarray data. Bioinformatics 2011.
 5.
Vanneschi L, Farinaccio A, Mauri G, Antoniotti M, Provero P, Giacobini M: A comparison of machine learning techniques for survival prediction in breast cancer. BioData Min 2011, 4: 12. 10.1186/17560381412
 6.
Gibson G: Microarray analysis. PLoS Biology 2003, 1: 28–29.
 7.
Larranaga P, Calvo B, Santana R, Bielza C, Galdiano J, Inza I, Lozano JA, Armananzas R, Santafe ad Perez AG, Robles V: Machine learning in bioinformatics. Briefings in Bioinformatics 2006, 7: 86–112. 10.1093/bib/bbk007
 8.
Yue H, Eastman P, Wang B, Minor J, Doctolero M, Nuttall R, Stack R, Becker J, Montgomery J, Vainer M, Johnston R: An evaluation of the performance of cDNA microarrays for detecting changes in global mRNA expression. Nucl. Acids Res 2009, 29(8):E41–1.
 9.
Statnikov A, Wang L, Aliferis C: A comprehensive comparison of random forests and support vector machines for microarraybased cancer classification. BMC Bioinformatics 2008, 9: 319. 10.1186/147121059319
 10.
Benso A, Di Carlo S, Politano G: A cDNA microarray gene expression data classifier for clinical diagnostics based on graph theory. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2011, 8(3):577–591.
 11.
R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria; 2010.http://www.Rproject.org . ISBN 3900051070
 12.
Chow C: On optimum recognition error and reject tradeoff. IEEE Transactions on Information Theory 1970, 16: 41–46. 10.1109/TIT.1970.1054406
 13.
Ripley BD: Pattern Recognition and Neural Networks. Cambridge University Press; 1996.
 14.
Freund Y, Mansour Y, Schapire RE: Generalization bounds for averaged classifiers. The annals of statistics 2004, 32: 1698–1722. 10.1214/009053604000000058
 15.
Bartlett PL, Wegkamp MH: Classification with a reject option using a hinge loss. J. Mach. Learn. Res 2008, 9: 1823–1840.
 16.
Tax D, Duin R: Growing a multiclass classifier with a reject option. Pattern Recognition Letters 2008, 29(10):1565–1570. 10.1016/j.patrec.2008.03.010
 17.
Duda RO, Hart PE, Stork DG: Pattern Classification. 2nd edition. WileyInterscience; 2000.
 18.
Fumera G, Roli F, Giacinto G: Reject option with multiple thresholds. Pattern Recognition 2000, 33: 2099–2101. 10.1016/S00313203(00)000595
 19.
Spinosa E, de Carvalho A: Combining oneclass classifiers for robust novelty detection in gene expression data. Advances in Bioinformatics and Computational Biology 2005, 3594/2005: 54–64.
 20.
Yun X, Brereton RG: Diagnostic pattern recognition on geneexpression profile data by using oneclass classification. Journal of chemical information and modeling 2005, 45(5):1392–1401. 10.1021/ci049726v
 21.
Juszczak P, Tax DM, Kalska EP, Duin RP: Minimum spanning tree based oneclass classifier. Neurocomputing 2009, 72(7–9):1859–1869. 10.1016/j.neucom.2008.05.003
 22.
Gesù V, Bosco G, Pinello L: A one class classifier for signal identification: a biological case study. In KES ’08: Proceedings of the 12th international conference on KnowledgeBased Intelligent Information and Engineering Systems Part III. Berlin, Heidelberg: SpringerVerlag; 2008:747–754.
 23.
Markou M, Singh S: Novelty detection: a review  part 1: statistical approaches. Signal Processing 2003, 83: 2481–2497. 10.1016/j.sigpro.2003.07.018
 24.
cDNA Stanford’s Microarray database[http://smd.stanford.edu/]
 25.
Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson JJ, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, Staudt LM: Distinct types of diffuse large Bcell lymphoma identified by gene expression profiling. Nature 2000, 403(6769):503–511. 10.1038/35000501
 26.
Palmer C, Diehn M, Alizadeh A, Browncorresponding PO: Celltype specific gene expression profiles of leukocytes in human peripheral blood. BMC Genomics 2006., 7(115):
 27.
Bohen SP, Troyanskaya OG, Alter O, Warnke R, Botstein D, Brown PO, Levy R: Variation in gene expression patterns in follicular lymphoma and the response to rituximab. Proc Natl Acad Sci U S A 2003, 100(4):1926–1930. 10.1073/pnas.0437875100
 28.
Bullinger L, Rucker FG, Kurz S, Du J, Scholl C, Sander S, Corbacioglu A, Lottaz C, Krauter J, Frohling S, Ganser A, Schlenk RF, Dohner K, Pollack JR, Dohner H: Geneexpression profiling identifies distinct subclasses of core binding factor acute myeloid leukemia. Blood 2007, 110(4):1291–1300. 10.1182/blood200610049783
 29.
Tax DMJ: DDTools, the data description toolbox for matlab.[http://prlab.tudelft.nl/davidtax/dd_tools.html]
 30.
Kuhn M: Building predictive models in R using the caret package. Journal of Statistical Software 2008, 28(5):1–26.
 31.
Hansen N, Ostermeier A, Gawelczyk A: On the adaptation of arbitrary normal mutation distributions in evolution strategies: the generating set adaptation. In Proceedings 6th International Conference on Genetic Algorithms. Morgan Kaufmann; 1995:312–317.
 32.
Hansen N, Ostermeier A: Completely derandomized selfadaptation in evolution strategies. Evolutionary Computation 2001, 9: 159–195. 10.1162/106365601750190398
 33.
Hansen N, Müller SD, Petrosnf PK: Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMAES). Evolutionary Computation 2003, 11: 1–18. 10.1162/106365603321828970
 34.
Auger A, Hansen N: A restart CMA evolution strategy with increasing population size. Proc. IEEE Congress Evolutionary Computation 2005, 2: 1769–1776.
Acknowledgements
This article has been published as part of BMC Bioinformatics Volume 12 Supplement 13, 2011: Tenth International Conference on Bioinformatics – First ISCB Asia Joint Conference 2011 (InCoB/ISCBAsia 2011): Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/14712105/12?issue=S13.
Author information
Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors contributions
AB coordinated the overall design and implementation of the method, and performed the experimental results analysis. SD participated in the design of the decision rules, in the definition of the evolutionary method for the threshold computation and defined the experimental validation. GP conceived and implemented the proposed decision rules and carried out the classification experiments. AS defined and implemented the evolutionary method for the computation of the rejection thresholds, and performed the related experiments for its application on the selected case study. HH performed the statistical analysis for the validation of the results. All authors contributed to draft the manuscript. All authors read and approved the final manuscript.
Rights and permissions
About this article
Cite this article
Benso, A., Di Carlo, S., Politano, G. et al. Building gene expression profile classifiers with a simple and efficient rejection option in R. BMC Bioinformatics 12, S3 (2011). https://doi.org/10.1186/1471210512S13S3
Published:
Keywords
 Random Forest
 Covariance Matrix Adaptation Evolution Strategy
 Rejection Threshold
 Rejection Rule
 Class Probability Estimate