 Methodology article
 Open Access
Error margin analysis for feature gene extraction
 Chi Kin Chow^{1},
 Hai Long Zhu^{1}Email author,
 Jessica Lacy^{2} and
 Winston P Kuo^{2}
https://doi.org/10.1186/1471210511241
© Chow et al; licensee BioMed Central Ltd. 2010
 Received: 12 June 2009
 Accepted: 11 May 2010
 Published: 11 May 2010
Abstract
Background
Feature gene extraction is a fundamental issue in microarraybased biomarker discovery. It is normally treated as an optimization problem of finding the best predictive feature genes that can effectively and stably discriminate distinct types of disease conditions, e.g. tumors and normals. Since gene microarray data normally involves thousands of genes at, tens or hundreds of samples, the gene extraction process may fall into local optimums if the gene set is optimized according to the maximization of classification accuracy of the classifier built from it.
Results
In this paper, we propose a novel gene extraction method of error margin analysis to optimize the feature genes. The proposed algorithm has been tested upon one synthetic dataset and two real microarray datasets. Meanwhile, it has been compared with five existing gene extraction algorithms on each dataset. On the synthetic dataset, the results show that the feature set extracted by our algorithm is the closest to the actual gene set. For the two real datasets, our algorithm is superior in terms of balancing the size and the validation accuracy of the resultant gene set when comparing to other algorithms.
Conclusion
Because of its distinct features, error margin analysis method can stably extract the relevant feature genes from microarray data for highperformance classification.
Keywords
 Support Vector Machine
 Synthetic Dataset
 Error Margin
 Test Algorithm
 Recursive Feature Elimination
Background
Gene expression data commonly involve thousands of genes at, tens or hundreds of samples. In order to reduce the computation cost and complexity of the classification, feature extraction on gene expression pattern is necessary. The objective of feature gene extraction process is to select the gene set that can be used to effectively and stably discriminate distinct types of disease statuses, e.g. tumors and normals.
According to the terminology proposed in [1], one of the major approaches available in feature selection is filter model. It uses statistical techniques over the training patterns to "filter out" irrelevant features. Yet the "filtering" process can be further divided to forward selection and backward elimination. In forward selection [2], variables are progressively incorporated into larger and larger subsets, whereas in backward elimination, one starts with the set of all variables and progressively eliminates the least relevant ones. In the field of bioinformatics, there is a belief that the class of a gene expression pattern, either normal or cancerous, correlates to the amount of changes in expression levels of feature genes. Thus, inversely, the gene level difference between normalclass patterns and cancerclass patterns is a promising guidance to identify feature gene. The pvalue in ttest between normalclass and cancerclass patterns is a more reliable guidance as it considers not only the level difference but also the significance of the difference. In [3], a gene is regarded as feature if the corresponding pvalue is higher than a predetermined cutoff value. Cao et al. [4] defined the relevance of a gene as the sensitivity of the output to the inputs in terms of the partial derivative. Guyon et al. in [5] defined the relevance of a gene in terms of its contribution to the cost function in Support Vector Machine (SVM). The corresponding gene ranking method names Recurrsive Feature Elimination (RFE). Several modifications on RFE, such as SQRTRFE and Entropybased RFE [6], were proposed to speed up the rank list construction process. Since the importance of variables is not assessed in the context of which other variables are not yet included, weaker subsets found by forward selection. Backward elimination method may outsmart it by eliminating the least promising variables and meanwhile providing the best classification from dependent variables (the variables that together perform best classification).
Wrapper is another approach to feature gene selection. In this approach, a feature gene set is found by optimizing certain measure quantities. Examples of these quantities include crossvalidation [7] and bootstrap [8]. Shevade and Keerthi in [9] extracted feature gene by optimizing a SVMliked energy function. Zhu et al. [10] presented a Markov blanketembedded genetic algorithm (MBEGA) for gene selection problem. They used memetic operators to add or delete features (or genes) from a Genetic Algorithm (GA) solution in order to speedup the GA convergence. Hong and Cho [11] enhanced the population divergence of a GAbased wrapper model by explicit fitness sharing. They also modified the representation of chromosome in GA to suit for large scale feature selection. Li et al. [12] presented a statistical approach for feature gene selection. Many subsets of genes that can well classify the training samples are identified; using GA, and the most frequently appeared genes in the subsets are then presumed as feature genes. Raymer et al. [13] reported a feature extraction algorithm to which feature selection, feature extraction, and classifier training are performed simultaneously, using a GA with the objective function involving training accuracy and the number of feature genes. Huerta et al. [14] suggested combining GA with SVM for the classification of microarray data. GA was used to evolve gene subsets, whereas SVM was used to evaluate the fitness values of the gene subsets in terms of classification accuracy. Shen et. al. in [15] reported a similar feature gene selection algorithm. It combined a discrete Particle Swarm Optimization (PSO) for search and SVM for fitness evaluation.
GiladBachrach et al. [16] introduced a margin based feature selection criterion and applied it to measure the quality of a gene subset. A gene subset is said as optimal if the corresponding classifier has maximum error margin.
Most of the proposed feature selection algorithms [9–15] presume that the performance of feature gene set is associated with the training accuracy of the classifier built from it. However, since the number of training patterns related to the pattern dimension is small, training accuracy is not a representative performance measure. Alternatively, validation accuracy is a more objective and reliable performance measure. Though validation accuracy is never known in the training process, one can divide a training set of n samples into m nonoverlapping subsets of roughly equal size; m  1 of these subsets are combined as new training set and the remaining 1 subset is as validation set. The corresponding error is socalled crossvalidated (CV) error. As noted by Ambroise and McLachlan [17], CV error may introduce a bias to the feature gene selection process. In addition, they proposed to tackle it (i.e. obtain an almost unbiased estimate) by a twolayered crossvalidation approach. On the other hand, the validation accuracy relates to the generalization of a classifier whilst the generalization of a classifier is commonly measured from its error margin. It is reasonable to hypothesize that validation accuracy is proportional to the width of error margin. And it is worth to represent the performance of a feature gene set by its error margin.
In this paper, we proposed a novel feature gene extraction scheme, namely ErrorMargin Analysis (EMA). EMA, as the name suggests, equates the performance of a feature gene set to error margin instead of classification accuracy. EMA starts from building an error margin curve representing error margin versus the number of mostly relevant genes. Afterwards, an analysis on the curve is performed to identify the optimal feature gene set. The proposed approach differs from [5] in the senses that the selection criterion is marginbased and parameterless. It is also in contrast to [16], in which the feature genes are preferred to solely maximizing the error margin. Though [18] considers error margin in measuring the performance of a feature gene set, proper selections of penalty coefficient and the size value are critical. In summary, EMA has an advantage over [7–15] in measuring the performance of a feature gene set. Additionally, it is superior to [3–5] in the sense that the number of the feature genes extracted EMA is parameterindependent, whereas others are according to parameter settings.
EMA is based on two assumptions. It is assumes that 1) genes are independently expressed; 2) the distributions of gene expression are in Gaussian.
The rest of this paper is organized as follows: We first present an analysis on the relation between error margin and the number of feature genes. Afterwards, we proposed a novel feature gene extraction algorithm based on the error margin analysis. The experimental results are then reported and conclusions are drawn.
Results
Datasets
In this section, the performance of EMA is evaluated on three datasets:
i. Synthetic dataset
The distribution of gene expressions in the synthetic dataset.
y = 1  y = 1  

i ∈ [1, 20]  p_{ i }(x) = G(x  μ_{ i }, σ_{ i }) where μ_{ i }= 0.3  i 0.05/20, σ_{ i }= 0.15  i 0.05/20  p_{ i }(x) = G(x  α_{ i }, β_{ i }) where α_{ i }= 0.7  i 0.05/20, β_{ i }= 0.15  i 0.05/20 
i ∈ [21, 500] 

ii. Gastric cancer dataset [19]
This dataset shows expression levels of 123 samples (Osaka University Medical School Hospital). A hundred and twelve of them are normalclass patterns and the remaining twelve patterns are cancerousclass. It is available at the link: http://lifesciencedb.jp/cged/
iii. Oral cancer multiple datasets
We have available four microarray datasets; the first was measured with HGU133 Plus2 and it has 11 normal and 50 cancerous samples, the second is from a HGU133A and it has 22 normal and 22 cancerous samples, the third set comes from a HGFocus and has only 22 cancerous samples and the fourth has 12 normal and 26 cancerous samples and measured also with HGU133 Plus2. All the chips are manufactured by Affymetrix (Santa Clara, CA).
Algorithms for Comparison
To evaluate the impact of EMA, we compare its performance with five algorithms. The designs and settings of EMA and the algorithms for comparison are summarized below.
Test algorithm 1  SVM with Feature Gene Extraction by Error Margin Analysis (SVMema)
SVMema estimates the number of feature genes f_{0} through the analysis on error margin. Given the gene relevance list, SVMema constructs the corresponding error margin curve and f_{0} is estimated as the critical point of the curve.
Test algorithm 2  SVM with ttest based feature gene extraction (SVMttt)
In SVMttt [3], the relevance of a gene is measured on its pvalue in ttest. A gene is indicated as a feature if its relevance is higher than a given cutoff pvalue.
Test algorithm 3 SVM with Recursive Feature Elimination (SVMrfe)
The gene relevance list is computed according to recursive feature elimination (RFE) [5]. At each iteration, RFE figures out and removes the least contributed gene from a set of considered genes. The iteration is repeated until all genes are removed from the set. The relevance of a gene is represented as the iteration index which it is removed. The curve representing the crossvalidation error versus the number of mostly relevant features f is fitted by an exponential function g(f). The optimal number of feature genes is obtained as the value to which the change of g(f) is just smaller than threshold.
Test algorithm 4  SVM with Marginbased Selection Criterion (SVMmsc)
SVMmsc [16] performs selection by search the feature gene set that maximizing a marginbased criterion.
Test algorithm 5  Bayesian Logistic Regression (BLogReg)
BLogReg [20] is a gene selection algorithm based on sparse logistic regression (SLogReg). The regularization parameter arising in SLogReg is eliminated, via Bayesian marginalization, without a significant effect on predictive performance. The source code of BLogReg is taken from [21].
Test algorithm 6  STW feature selection using generalized logistic loss (STW)
STW [22] was implemented exactly the same as SVMRFE except that the hinge loss in SVMRFE is replaced with the generalized logistic loss.
For SVMema, the parametric model G(.) for the estimation of LOOErM curve is chosen as secondorder polynomial. The cutoff pvalue of SVMttt is assigned as 0.005. For SVMrfe, as suggested in [6], the threshold for obtaining the optimal number of feature is 0.0001 and the error is based on 3fold experimental structure. The results of BLogReg and STW are obtained under the default parameters assigned in the corresponding source codes.
Experiment Settings
For the Synthetic dataset, five hundreds patterns are generated in each run. Twenty five of them form training pattern set and the remaining four hundreds and seventyfive patterns form validation pattern set for performance measure. In each of the pattern sets, half of the patterns belong to negative class and another half belong to positive class.
For the Gastric cancer dataset, suppose n_{} is the number of normalclass patterns and n_{+} is number of cancerclass patterns in T, and r is the sampling rate, we randomly pick rn_{+} positiveclass patterns and rn_{} negativeclass patterns in T to form the training set. The remaining (1r)n_{+} positiveclass patterns and the remaining (1r)n_{} negativeclass patterns in T forms the validation set. The simulation is repeated with the sampling rate rising from 0.3 to 0.6.
For the Oral Cancer multiple datasets, the first three datasets form a superset O. Suppose n_{} is the number of normalclass patterns and n_{+} is number of cancerclass patterns in O, and r is the sampling rate, we randomly pick rn_{+} positiveclass patterns and rn_{} negativeclass patterns in O to form the training set. Meanwhile, the fourth dataset is regarded as the validation set. The corresponding accuracy represents the generalization ability of a test algorithm on the oral cancer classification problem. The simulation is repeated with the sampling rate rising from 0.1 to 0.7.
To provide a fair and repeatable comparison amongst the test algorithms, the performance of each test algorithm on a particular simulation is evaluated based on statistics obtained from 100 independent runs. For the Synthetic dataset, the patterns in both training set and validation set are randomly generated for each run. For the Gastric cancer dataset, the substituted random number is regenerated for each particular invalid expression in each pattern. For Oral cancer multiple datasets, the patterns in the training set are randomly repicked for each run. All test algorithms are implemented in MATLAB language.
Simulation Results
Synthetic dataset
The statistics of the numbers of feature genes extracted by the test algorithms: Synthetic dataset.
Mean  Std.  Median  Min.  Max.  

SVMema  17.90 (16.63)  1.40  18.00  15.00  23.00 
SVMttt  43.27 (19.00)  4.72  43.00  31.00  56.00 
SVMrfe  65.38 (19.00)  4.62  66.00  50.00  75.00 
SVMmcs  448.01 (18.93)  36.92  455.50  356.00  500.00 
BLogReg  2.45 (1.00)  0.56  2.00  2.00  4.00 
STW  45.42 (8.51)  6.24  47.00  16.00  48.00 
The statistics of the validation accuracies of the test algorithms: Synthetic dataset.
Mean  Std.  Median  Min.  Max.  

SVMema  100.00%  0.00%  100.00%  100.00%  100.00% 
SVMttt  100.00%  0.00%  100.00%  100.00%  100.00% 
SVMrfe  100.00%  0.00%  100.00%  100.00%  100.00% 
SVMmcs  100.00%  0.00%  100.00%  100.00%  100.00% 
BLogReg  50.00%  0.02%  50.00%  50.00%  50.22% 
STW  93.79%  7.71%  97.56%  64.44%  100.00% 
Gastric cancer dataset
Oral cancer multiple datasets
Discussion
The average hitting rate r_{ h }and the average redundancy rate r_{ r }of the test algorithms: Synthetic dataset.
SVMema  SVMttt  SVMrfe  SVMmcs  BLogReg  STW  

r _{ h }  83.15%  95.00%  95.00%  94.65%  5.00%  42.55% 
r _{ r }  7.09%  56.09%  70.94%  95.77%  59.19%  81.27% 
For the cases of two real datasets, Figure 1 and Figure 3 indicate the number of feature genes extracted by different algorithms. We found that SVMema, SVMrfe and SVMmcs are insensitive to the sampling rate, for which the numbers of feature genes just slightly increase along with the sampling rate r. Though SVMema and SVMmcs both employ error margin on their gene selection criterions, SVMema consistently result in much less number of feature genes. As indicated in previous sections, irrelevant genes may also contribute to the error margin. The maximization approach of SVMmcs tends to extract as more genes as possible. Thus, SVMmcs overextracts feature genes in order to achieve larger error margin. Seen from Figure 1 and Figure 3, the numbers of feature genes extracted by SVMmcs are unusually large: For the Gastric cancer dataset, the minimal number is 2002, for which nearly 99% genes are regarded as feature. For the Oral cancer data, the number is more than 5000, in which nearly 87% genes are considered as features. Comparing to the results of SVMema, the number of feature genes extracted by SVMmcs is around 35 times and 149 times more than that of SVMema for the Gastric cancer dataset and the Oral cancer datasets respectively. The reason of this difference is that EMA is able to decompose the contributions of the feature genes from those of the background genes. This also indicates that purely maximizing error margin is not a practical selection criterion.
While comparing the validation accuracies amongst the test algorithms, SVMttt and SVMmcs should be ignored as their high accuracies are archive by overextracting feature genes. Seen from the results shown in Figure 2 and Figure 4, the performance of SVMema is better than that of SVMrfe in terms of not only the validation accuracy but also the number of feature genes. SVMema is also superior to BLogReg and STW. This superiority of SVMema suggests that 1) marginbased criterion is more suitable to represent the performance of a feature gene set; and 2) this criterion is more robust than those of BLogReg and STW in the sense that BLogReg and STW may underestimate the number of feature genes.
Conclusions
This paper proposes a feature extraction algorithm of error margin analysis that uses marginbased criterion to measuring the quality of a feature set. Error margin is a better indicator than training accuracy in representing the generalization ability of a classifier. However, maximizing the error margin may lead to overextraction of features. Therefore, we propose to make a tradeoff between the performance and the number of features, which is done by analyzing the curve of error margin. Under the assumptions on gene independency and on gene distribution, the analysis shows that the error margin of only involving the relevant genes grow faster than that of involving random genes. Based on this observation, we model the extraction process as an estimation of critical point in the error margin curve of error margin versus the number of mostly relevant genes. Compared with existing algorithms that use either marginbased selection criterion or "filtering" approach, our algorithm has distinct advantage, which has been proven from theoretical framework.
 1)
Error margin is a more representative measure to the generalization ability of a classifier than training accuracy;
 2)
Solely maximizing error margin may lead to overextraction of features;
 3)
SVMema can make right balance between the performance and the size of resultant feature gene set.
Possible future works include 1) an analysis on the error margin curve when the gene distribution is nonGaussian, 2) deriving a more accurate parametric model for the margin curve segments w_{ I }and w_{ R }and 3) an extension to the analysis on error margin of nonlinear classifier.
Methods
ErrorMargin as an Indicator to Feature Genes
Gene expression level difference and the pvalue of the expression level are promising relevance measures of a gene. The gene rank list sorted according to these measures provides guidance for feature gene selection. On the other hand, margins play a crucial role in modern machine learning research. It represents the generalization ability of a classifier or the confidence of the decision made by the classifier. It is valuable to investigate the possibility of which uses error margin as a criterion to decide how many genes should be selected from the list. In this section, an analysis on the relation between error margin and the number of mostly relevant genes is presented.
where {h_{ i }} are constants, {x_{j, i}} and hence {v_{ j }} are random variables.
In the rest of this paper, the analysis considers the minimal error margin amongst of the cancerclass patterns {v_{ j }} for j ∈ C_{+}.
Figure 6 shows the original 2dimensional feature space X. The white ellipse represents the region of normalclass patterns whilst the greyfilled ellipse represents the region of the cancerclass patterns. The center of the normalclass patterns is , whereas the center of the cancerclass patterns is . The dotted line represents the decision hyperplane obtained by SVM. Figure 7 shows the translated feature space Z. The centers of the normalclass patterns and of the cancerclass pattern are translated to [0, 0] and respectively.
where
where and .
The analysis on the relation between error margin and the number of mostly relevant genes can be divided into three cases:
Case 1: Linearly separable training set with zero gene variance
where η(.) is monotonic increasing and depends on on n_{+}. The details of eq. (12) can be found in Appendix II.
In this paper, the term error margin curve W(i) refers to the curve representing error margin versus the number of mostly relevant genes, i.e. W(i) = .
Case 2: Linearly separable training set with nonzero gene variance
Case 3: Linearly nonseparable training set
In case of linearly nonseparable training set, the softmargin idea choose a decision hyperplane that the classification accuracy is as high as possible, while still maximizing the error margin of the correctlyclassified pattern set V' ∈ . Thus, the error margin in this case is measured from V'. Since the excluded patterns from V' are those with minimal (and negative) error margin v_{ i }, it is expected that 1) the mean of V' is larger than that of V and 2) the variance of V' is smaller than that of V. Under a practical assumption that the gene distributions in V' are also Gaussian, the softmargin idea brings the error margin analysis of linearly nonseparable training set back to the case of linearly separable pattern set.
In summary, when a training set is linearly separable and σ_{ i }= 0 for all i, the critical point of the error margin curve is definitely the boundary point between relevant and irrelevant gene sets. However, if 1) σ_{ i }> 0 for at least one gene and/or 2) the training set is linearly nonseparable, oscillation is introduced to the curve and blunts the critical point. For such case, feature gene extraction is modeled as the estimation of critical point of the error margin.
Feature Gene Extraction by ErrorMargin Analysis
In this section, we report a novel feature gene extraction algorithm, namely Feature Gene Extraction by Error Margin Analysis (EMA). Based on the error margin analysis presented in the previous section, the feature gene extraction can be modeled as the search for the critical point of the error margin curve.
In order to moderate the dependency of error margin on pattern set, LeaveOneOut Error margin (LOOErM) is used. LOOErM, as the name suggests, leaves a single pattern from the training set and compute the error margin of the decision hyperplane defined by the remaining patterns. This is repeated such that each pattern in the training set is left once. For a training set S consisting of n patterns, n error margins {g_{ j }}_{j∈[1, n]}are obtained. The LOOErM of S is defined as the average of {g_{ j }}. Algorithm A1 summarizes the procedure of LOOErM.
Algorithm A1: LeaveOneOut Error Margin
Input: 1) Pattern set S = {[x_{ j } y_{ j }}]}_{j ∈ [1, n]}, 2) the index set of the considered genes F
1. For j : = 1 to n
1.1 Define the pattern subset Z = {[x_{ k }(i)_{i∈F} y_{ k }]}_{k≠j}
1.2 Train SVM on Z: the corresponding decision hyperplane denotes by H_{ j }(z): ⟨h·z⟩ + b where ⟨a·b⟩ is the dotproduct of the vectors a and b.
1.3 Compute the error margin g_{ j }of H_{ j }:
2. Next j
3.
Output: the leaveoutone error margin
Seen from eq. (17), ε naturally depends on c, α_{ R }and α_{ I }. In other words, the performance of an arbitrary critical point c = f can be represented by the error . Given that G(.) is sufficient to model W_{ R }and W_{ I }, the optimal critical point f_{ 0 }of the error margin curve is defined as the critical point where the estimation error of (i) is minimum, i.e. .
Given a training set S = {[x_{ j }= [x_{ j,1 }, x_{ j,2 },..., x_{ j,d }] ∈ ℜ^{ d } y_{ j }∈ {1, 1}]}_{j∈[1, n]}, we first rank the genes according to their relevancies. We denote L = {ϕ_{k}}_{k = 1,2,..., d}as the gene relevance list to which the relevance of the ϕ_{a}^{th} gene is larger or equals to that of the ϕ_{b}^{th} gene for all a <b. The list L is then used to rearrange S as {[x_{ j }(L)  y_{ j }]}_{j∈[1, n]}. Afterwards, we compute the error margin curve W(i) = where is the LOOErM computed from Algorithm A1 with F = {1, 2,..., i}.
In this paper, G(.) is chosen to be a polynomial function. The corresponding estimation error ε_{ f }for an arbitrary critical point c = f can be obtain by the least square method. The details of the method can be found in Appendix III. As benefitted from the priorknowledge that the number of feature genes is commonly lower than a predetermined value f_{ max }, say for example f_{ max }= 100, we only need to examined the estimation errors up to first f_{ max }mostly relevant genes, i.e. {ε_{ f }} for f ∈ [1, f_{ max }]. The optimal critical point f_{ 0 }is estimated as the one with minimum estimation error, i.e. , and the index set of the feature gene F_{ 0 }is . Algorithm A2 summarizes the procedure of Feature Gene Extraction by ErrorMargin Analysis.
Algorithm A2: Feature Gene Extraction by ErrorMargin Analysis
Input: 1) Pattern set S = {[x_{ j }= [x_{ j,1 }, x_{ j,2 },..., x_{j, d}] ∈ ℜ^{ d } y_{ j }∈ {1, 1}]}_{j∈[1, n]}, 2) maximum number of considered genes f_{ max }, 3) parametric error margin model G(.)
/* Construct the gene relevance list L : BEGIN */
where Ω(A, B) is the pvalue of two point sets A and B, C_{} contains the indices of all normalclass patterns in S and C_{+} contains the indices of all cancerclass patterns in S.
2. Define the gene relevance list L = {ϕ_{ j }}_{j = 1,2,..., d}where the relevance of the ϕ_{a}^{th} gene is larger or equals to that of the ϕ_{b}^{th} gene, i.e. for all a <b.
3. Rearrange the gene order of S according to L: S ← {[x_{ j }(L)  y_{ j }]}_{j∈[1, n]}
/* Construct the gene relevance list L : END */
/* Construct the LOOErM curve { }: BEGIN */
4. For i : = 1 to f_{ max }
4.1 Compute by Algorithm A1 where the set F used in the algorithm is defined as {1, 2,..., i}.
5. Next i
/* Construct the LOOErM curve { }: END */
/* Search for the critical point of the LOOErM curve: BEGIN */
6. For f : = 1 to f_{ max }
6.1 Compute the estimation error . If G(.) is a polynomial function, the optimal α_{ R }and α_{ I }can be found by the method listed in Appendix III.
7. Next f
8. Compute the optimal critical point f_{ 0 }as arg
/* Search for the critical point of the LOOErM curve: END */
Output: The index set of the feature genes
Appendix I
Linearity of Gene Patterns
Eq. (20) infers that S can be linearly separated by the hyperplane Th_{ I }. In conclusion, S is linearly separable if the transformation matrix T exists.
Existence of the transformation matrix
According to eq. (18), T is defined as the right inverse of P, which can be decomposed as P^{ T }(PP^{ T })^{1}. Thus, T exists if P has the rank m.
When the number of genes d is much larger than the number of training patterns n, i.e. d >> n, the probability of that T exists is higher. Reminding that gene pattern analysis deals with small sample size and high sample dimension, the existence of T can be easily archived. Thus, gene patterns are reasonably assumed to be linearly separable. Additionally, since support vector machine guarantees that the decision hyperplane has maximum error margin, linear SVM model is ideal for gene pattern classification.
Appendix II
Appendix III
Suppose G(.) is a γorder polynomial, the estimations of w_{ R }and w_{ I }are G(x  α_{ R }= [A, B]) = and respectively
The optimal parameter vector Ψ = [A B C]^{T} is computed as Ψ = M^{ 1 }Y and the optimal value of D can be found by the eq. (25).
Declarations
Acknowledgements
The methodology part of this paper is supported by the nichearea fund 1BB56 of the Hong Kong Polytechnic University. The Experiment part of this paper (for Oral cancer) is supported by the Harvard Catalyst/The Harvard Clinical and Translational Science Center (with NIH Award #UL1 RR 025758 and financial contributions from Harvard University and its affiliated academic health care centers).
Authors’ Affiliations
References
 John GH, Kohavi R, Peger KP: Irrelevant features and the subset selection problem. Proceedings of the 11th Int Conf on Mach Learning 1994, 121–129.Google Scholar
 Xiong M, Li W, Zhao J, Jin L, Boerwinkle E: Feature (Gene) Selection in Gene ExpressionBased Tumor Classification. Molecular Genetics and Metabolism 2001, 73: 239–247. 10.1006/mgme.2001.3193View ArticlePubMedGoogle Scholar
 Man TK, Chintagumpala M, Visvanathan J, Shen JK, Perlaky L, Hicks J, Johnson M, Davino N, Murray J, Helman L, Meyer W, Triche T, Wong KK, Lau CC: Experssion Profiles of Osteosarcoma That Can Predict Response to Chemotherapy. Cancer Research 2005, 65(18):8142–8150. 10.1158/00085472.CAN050985View ArticlePubMedGoogle Scholar
 Cao L, Seng CK, Gu Q, Lee HP: Saliency Analysis of Support Vector Machines for Gene Selection in Tissue Classification. Neural Computing & Applications 2003, 11: 244–249. 10.1007/s0052100303623View ArticleGoogle Scholar
 Guyon I, Weston J, Barnhill S, Vapnik V: Gene selection for cancer classification using support vector machines. Machine Learning 2002, 46: 389–422. 10.1023/A:1012487302797View ArticleGoogle Scholar
 Furlanello C, Serafini M, Merler S, Jurman G: Entropybased gene ranking without selection bias for the predictive classification of microarray data. BMC Bioinformatics 2003, 4: 54. 10.1186/14712105454View ArticlePubMedPubMed CentralGoogle Scholar
 Kohavi R: A study of crossvalidation and bootstrap for accuracy estimation and model selection. Proceedings of the 15th Int Joint Conf on Artif Intell 1995, 1137–1143.Google Scholar
 Efron B, Tibshirani R: An introduction to the bootstrap. Chapman & Hall, New York; 1993.View ArticleGoogle Scholar
 Shevade SK, Keerthi SS: A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics 2003, 19(17):2246–2263. 10.1093/bioinformatics/btg308View ArticlePubMedGoogle Scholar
 Zhua Z, Onga YS, Dasha M: Markov blanketembedded genetic algorithm for gene selection. Pattern Recognition 2007, 40: 3236–3248. 10.1016/j.patcog.2007.02.007View ArticleGoogle Scholar
 Hong JH, Cho SB: Efficient hugescale feature selection with speciated genetic algorithm. Pattern Recognition Letters 2006, 27: 143–150. 10.1016/j.patrec.2005.07.009View ArticleGoogle Scholar
 Li L, Weinberg CR, Darden TA, Pedersen LG: Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics 2001, 17(12):1131–1142. 10.1093/bioinformatics/17.12.1131View ArticlePubMedGoogle Scholar
 Raymer ML, Punch WF, Goodman ED, Kuhn LA, Jain AK: Dimensionality Reduction Using Genetic Algorithms. IEEE Trans on Evolutionary Computation 2000, 4(2):164–171. 10.1109/4235.850656View ArticleGoogle Scholar
 Huerta EB, Duval B, Hao JK: A Hybrid GA/SVM Approach for Gene Selection and Classification of Microarray Data. EvoWorkshops LNCS 2006, 3907: 34–44.Google Scholar
 Shen Q, Shi WM, Kong W, Ye BX: A combination of modified particle swarm optimization algorithm and support vector machine for gene selection and tumor classification. Talanta 2007, 71: 1679–1683. 10.1016/j.talanta.2006.07.047View ArticlePubMedGoogle Scholar
 GiladBachrach R, Navot A, Tishby N: Margin Based Feature Selection  Theory and Algorithms. Proc of the 21th Int Conf on Machine Learning 2004, 43–50.Google Scholar
 Ambroise C, McLachlan GJ: Selection bias in gene extraction on the basis of microarray geneexpression data. Proceedings of the National Academy of Sciences USA 2002, 99(10):6562–6566. 10.1073/pnas.102102699View ArticleGoogle Scholar
 Oh IS, Lee JS, Moon BR: Hybrid Genetic Algorithms for Feature Selection. IEEE Trans on Pattern Analysis and Machine Intelligence 2004, 26(11):1424–1437. 10.1109/TPAMI.2004.105View ArticleGoogle Scholar
 Oba S, Kato K, Ishii S: Multiscale clustering for gene expression data. Proc of the 5th IEEE Symposium on Bioinformatics and Bioengineering 2005, 210–217. full_textView ArticleGoogle Scholar
 Cawley GA, Talbot NLC: Gene selection in cancer classification using sparse logistic regression with Bayesian regularization. Bioinformatics 2006, 22: 19. 10.1093/bioinformatics/btl386View ArticleGoogle Scholar
 Link to the source code of BLogReg[http://theoval.cmp.uea.ac.uk/~gcc/cbl/blogreg/]
 Park C, Koo JY, Kin PT, Lee JW: STW feature selection using generalized logistic loss. Computational Statistics and Data Analysis 2008, 53: 3709–3718. 10.1016/j.csda.2007.12.011View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.