- Open Access
Fuzzy support vector machine: an efficient rule-based classification technique for microarrays
© Hajiloo et al; licensee BioMed Central Ltd. 2013
- Published: 1 October 2013
The abundance of gene expression microarray data has led to the development of machine learning algorithms applicable for tackling disease diagnosis, disease prognosis, and treatment selection problems. However, these algorithms often produce classifiers with weaknesses in terms of accuracy, robustness, and interpretability. This paper introduces fuzzy support vector machine which is a learning algorithm based on combination of fuzzy classifiers and kernel machines for microarray classification.
Experimental results on public leukemia, prostate, and colon cancer datasets show that fuzzy support vector machine applied in combination with filter or wrapper feature selection methods develops a robust model with higher accuracy than the conventional microarray classification models such as support vector machine, artificial neural network, decision trees, k nearest neighbors, and diagonal linear discriminant analysis. Furthermore, the interpretable rule-base inferred from fuzzy support vector machine helps extracting biological knowledge from microarray data.
Fuzzy support vector machine as a new classification model with high generalization power, robustness, and good interpretability seems to be a promising tool for gene expression microarray classification.
- Support Vector Machine
- Feature Selection Method
- Fuzzy Support Vector Machine
- Mercer Kernel
- Diagonal Linear Discriminant Analysis
Gene expression microarrays
Almost every cell of the body contains a full set of chromosomes and the same set of genes. But each cell finds its unique properties at a specific time by the fraction of genes which are turned on (expressed) in it. In each cell, transcription of the genetic information contained within the DNA into messenger RNA (mRNA) molecules is called gene expression. These mRNAs are later translated into the proteins that perform most of the critical functions of cells. Gene expression is a complicated process that allows a cell to respond both to environmental needs and to its own changing needs . Upon completion of the Human Genome Project, human genome was sequenced and genes within the genome were identified . Moreover, tools such as microarrays enabled scientists to study changes in expression level of a large set of genes simultaneously. These studies can lead to easier diagnosis or prognosis of complex diseases such as cancer and discovery of better drugs.
As it is almost impossible to study gene expression of a large number of genes provided by microarrays with traditional techniques, scientists use novel techniques such as machine learning algorithms to study microarrays. Microarray classification is a challenging task because: 1) there are a large number of genes and a small number of samples in microarray experiments and this high dimensionality leads to overfitted models, 2) various uncertainties and noises are associated with microarray experiments, and 3) biomedical community usually prefers biologically relevant models like rule-based models that can reveal interconnection of genes and black-box models with high generalization power are not necessarily the favorite options in this context . Many machine learning algorithms are designed and applied for microarray classification problem which some of them provided fruitful results but due to the challenges of dealing with microarrays researchers are still working on designing more accurate, robust, and interpretable models for microarray classification.
Since microarrays have a large set of features but a small set of samples, feature selection methods are applied on them to eliminate noisy and irrelevant features. Furthermore, feature selection decreases the chance of overfitting and increases the chance of producing more understandable results. Two broad categories of feature selection techniques are filter and wrapper methods . Filter methods are also categorized into univariate and multivariate approaches. The univariate filter methods first assign a score calculated by a score metric to each gene and then sort genes according to their score. A set of top-ranked genes are selected as the output of these feature selection techniques. On the other hand, the multivariate filter methods like multiple hypothesis testing models [5, 6] consider the interconnection between features and are slower than univariate approaches. Wrapper methods pick subsets of genes from the initial set of genes and test the quality of these subsets by building a classifier on each subset. The subset of genes which brings the highest accuracy is selected as the output of wrapper feature selection techniques. It is important to note that univariate filter methods are usually faster than the multivariate filter methods as they ignore interconnection of genes and the multivariate filter methods are usually faster than wrapper methods as they ignore interactions with the classifier, but wrapper methods and multivariate filter methods consider interconnection of genes and have the chance of finding the best subset that work together in determination of the class label of the given problem.
Supervised learning is a process in which the class labels of a set of instances are given and by applying a learning method, we build a classifier which can be used later in determining the class label of new instances. There are different ways of evaluating the performance of a classifier like using a separate test data, cross-validation, bootstrap sampling, and sub-sampling as surveyed and described in [7, 8]. Lu and Han , Pirooznia, Yang, Yang, and Deng , and Dudoit, Fridlyand, and Speed  have surveyed supervised learning methods applied on microarrays. These studies emphasize that in microarray's domain simple robust rule-based models with high generalization ability and less sensitivity to noise are preferred. Various classification models have been applied on microarrays like support vector machine (SVM) [11, 12], artificial neural network (ANN) , decision trees [10, 14], k nearest neighbors (KNN) , diagonal linear discriminant analysis (DLDA) . Here, we review these methods in order to better understand the requirements of a good classifier in the microarray's context.
SVM in a binary classification problem finds maximum margin classifier among many different classifiers that can separate instances of two classes. The points that lie closest to this max-margin hyperplane are called the support vectors. The hyperplane can be defined using these points alone and the classifier only makes use of these support vectors to classify test samples. In the case that data points are not linearly separable in the input space, one solution is to map the input space into a higher dimensional feature space called kernel space . SVM was used for microarray classification by Mukherjee et al.  and Furey et al. . According to these studies, SVM produces accurate classifiers and is one the best classifiers used in microarray's domain. The reasons of its good performance are its ability in dealing with high dimensional problems, ability in dealing with noisy features, scalability, and its high generalization power.
ANN is an interconnected group of nodes that uses a computational model for information processing. It changes its structure based on external or internal information that flows through the network. ANN can be used to model a complex relationship between inputs and outputs and find patterns in data. This tool has been applied for microarray classification by Khan, Wei, and Ringnér  and showed good results in comparison with the other methods; however as it performs classification in a black box manner, it does not provide any insight on how the genes are correlated or which set of genes is more effective for classification.
A decision tree like classification and regression trees (CART)  consists of a set of internal nodes and leaf nodes. The internal nodes are associated with a splitting criterion which consists of a splitting attribute and one or more splitting predicates defined on this attribute. The leaf nodes are labelled with a single class label. The construction of the decision tree is usually a two-phase process. In the first phase, the growing phase, an overgrown decision tree is built from the training data. The splitting criterion at each internal node is chosen to split the data sets into subsets that have better class separability, thus minimizing the misclassification error. In the second phase, the pruning phase, the tree is pruned using some heuristics to avoid overfitting. Decision trees have been widely used in microarray studies. They have many advantages that make them a suitable choice for microarray classification like scalability, fast setup, independence of input parameters, and generating rule-based models. Disadvantage of decision trees that appears in high dimensional problems like microarray classification is that they become overfitted easily [10, 14]. One way to overcome this limitation of decision trees is to build an ensemble model over them like the approach used in Li et al. .
In KNN classifier, class label of an unseen instance is determined by a majority voting on the class label of its k nearest neighbors and correlation or Euclidean distance is usually used to calculate distance of two samples. KNN was used for the purpose of microarray classification by Dudoit, Fridlyand, and Speed . They have shown that nearest neighbor as a simple similarity based method has low error rate in microarray classification as it is not so sensitive to noisy samples but still has the drawback of working in a black-box manner.
LDA has been one of the most popular methods used in classification problems. The basic idea of LDA is to project high-dimensional data onto a low-dimensional space such that the data points are reshaped to maximize the class separability. The optimal projection of classical LDA is obtained by maximizing the between-class distance while minimizing the within-class distance . In high dimensional problems with few samples like the case of microarrays a great challenge to classical LDA is the singularity and instability of the within-group sum of squares matrix or the sample covariance matrix. Therefore, the covariance matrix will be singular if the total number of observations in the training data (N) is less than the number of features (P) or might be very unstable and noisy if N is not significantly larger than P. It has been shown that the performance of LDA in high-dimensional situations is far from optimal. Variants of LDA are used for pattern classification like diagonal LDA (DLDA). DLDA is the same as LDA except that the covariance matrices are assumed to be diagonal. The DLDA algorithm has shown a good performance on gene expression microarray data . Unlike classical LDA, DLDA does not explicitly require the number of features to be less than the size of the training set in theory.
But, as more features are added to a given model, the complexity of the model is increased and as a consequence the overfitting chance and the misclassification rate of new samples increases .
Our goal is to develop an accurate and robust microarray classifier which outputs a useful rule-base suitable for information extraction as well. To do so we apply an additive fuzzy system to kernel machines, and demonstrate that, under a general assumption on membership functions, an additive fuzzy rule-based classification system can be constructed directly from the given training samples using the support vector learning approach.
Here, we explain the feature selection methods, the supervised learning method, and the model evaluation strategy used in this study.
The first step in microarray classification is to apply a feature selection method on microarray data to get rid of irrelevant genes and reduce the dimension of feature space. Here, we use a common filter feature selection technique (signal to noise ratio (SNR) ) and a common wrapper feature selection technique (support vector machine-recursive feature elimination (SVM-RFE)) [20, 21].
SVM-RFE is a backward feature ranking wrapper feature selection method which considers interconnection of genes. SVM-RFE builds a support vector machine model on the whole set of genes and then eliminates one gene which its squared weight computed by optimizer is the smallest. Elimination of genes continues in this way until we get a group of genes with the highest classification power.
The second step in microarray classification is taken in order to build a reliable model on training samples which can correctly predict class label of test samples. Here, we apply fuzzy support vector machine algorithm for supervised learning [22, 23]. This method has combined the good generalization performance and ability to work in high dimensional spaces of support vector machine algorithm with high interpretability of fuzzy rule based models. A model with these properties is expected to overcome the challenges of dealing with microarrays very well.
μ(x) = max(1 - d|x|,0), d > 0
μ(x) = e-d|x|, d > 0, d > 0
Given a supervised learning problem in which training input is described by
In order to assess the performance of the proposed model and compare the accuracy of this classifier with the accuracy of the state-of-the-art classifiers, 10-fold cross validation strategy is used.
The proposed methods have been evaluated by three publicly available microarray datasets, which are leukemia dataset , prostate cancer dataset , and colon cancer dataset . The following is a brief introduction about these datasets, while more detailed information can be found from, respectively, the indicated data resources.
Leukemia dataset: this dataset was originally used by Golub et al. . It contains the expression levels of 6817 gene probes of 72 patients, among which, 47 patients suffer from the acute lymphoblastic leukemia (ALL) and 25 patients suffer from the acute myeloid leukemia (AML).
Prostate cancer dataset: this microarray dataset is originally provided by Singh et al. . The dataset A is used in the experiments in this paper. The dataset contains 102 samples, 52 are prostate tumor samples and 50 normal samples, and each array has 12,600 gene probes.
Colon cancer dataset: this microarray dataset was first used by Alon et al. . The dataset contains 62 samples, 40 colon cancer cases and 22 healthy controls, and each array has 2000 gene probes.
Hypotheses and experimental design
In this section we review the four hypotheses that we defined to explore power of fuzzy support vector machine model in microarray studies.
Hypothesis 1: how is the performance of fuzzy support vector machine model in comparison with the performance of common classification models used on microarrays like SVM, ANN, decision trees like CART, KNN, and DLDA?
Experiment 1: we train all these classifiers on the original microarray datasets without doing any kind of feature selection and compare the accuracy of these models.
Experiment 2: we apply SNR method for gene selection and then we train all these classifiers and compare test accuracy of these models.
Hypothesis 2: does fuzzy support vector machine model benefit from feature selection?
Experiment 3: we first set up fuzzy support vector machine classifier without taking the feature selection step and then we set up fuzzy support vector machine classifier with taking the feature selection step by use of both SNR and SVM-RFE methods and then compare the accuracy of these models.
Hypothesis 3: is the output rule-base generated by fuzzy support vector machine model useful for extracting meaningful information for biomedical researchers?
Experiment 4: as an instance of the rule-base generated by fuzzy support vector machine, we first apply SNR feature selection method on leukemia dataset and select three genes and then we set up fuzzy support vector machine classifier on those genes. We explore some characteristics of the model by taking a deeper look at the rule-base.
Hypothesis 4: is fuzzy support vector machine model a robust one for microarray classification?
We have two parameters in fuzzy support vector machine classification model: (1) C which is the misclassification cost of support vector machine classifier and (2) the number of selected features. In order to explore the sensitivity of the model to these parameters we have designed two experiments.
Experiment 5: we use SNR gene selection method to select 3 genes and after that we set up fuzzy support vector machine models on these genes but by changing the value of C parameter in fuzzy support vector machine and we capture accuracy of the models. We have changed the value of C between 1, 10, 100, 1000, and 10000.
Experiment 6: we select features by applying both SNR and SVM-RFE feature selection methods and after that we set up fuzzy support vector machine models on selected features. We repeat this experiment by selecting different number of features. The number of selected features is 2, 3, 5, 7, 10, 15, 30, 50, 100, and 150.
Accuracy of fuzzy support vector machine model versus accuracy of common classification models used on microarrays when no feature selection step is taken.
Fuzzy Support Vector Machine (FSVM)
Support Vector Machine (SVM)
Artificial Neural Network (ANN)
Decision Tree (CART)
K Nearest Neighbor (KNN, K = 3)
Diagonal Linear Discriminant Analysis (DLDA)
Accuracy of fuzzy support vector machine model versus accuracy of common classification models used on microarrays when SNR is used for feature selection
Fuzzy Support Vector Machine (FSVM)
Support Vector Machine (SVM)
Artificial Neural Network (ANN)
Decision Tree (CART)
K Nearest Neighbor (KNN, K = 3)
Diagonal Linear Discriminant Analysis (DLDA)
Comparison of performance of fuzzy support vector machine model with and without taking feature selection step
Fuzzy Support Vector Machine without Feature Selection
Fuzzy Support Vector Machine with SNR
Fuzzy Support Vector Machine with SVM-RFE
Rule-base of fuzzy support vector machine on leukemia dataset
Specification of genes of rule-base of fuzzy support vector machine on leukemia dataset
Min Expression Level
Max Expression Level
• This rule-base contains seven rules. Therefore, only seven samples have a Lagrange multiplier other than zero when support vector machine optimizer is applied.
• This rule-base has three parts as antecedent. That is because the model is trained over three genes.
• The consequent of each rule shows the multiplication of the class label of the sample in the value of the Lagrange multiplier of the sample as mentioned in method section. This number can be used as the power of each rule. Rules with larger consequents are of greater importance and we expect to have greater coverage.
As a result, we can see that out of huge sized microarray with 6817 genes and 38 samples, we have created a model that decides class label of a test sample by just taking into account 3 of those genes and 7 of those samples with high accuracy. It is important to emphasize that fuzzy support vector machine is not the only interpretable model suggested for microarray classification and there are other interpretable models like the iGEC proposed by Ho, Hsieh, Chen, and Huang  for microarray classification.
In this paper, we have analyzed the prediction power of fuzzy support vector machine classifier on three different microarray classification problems. The experimental results show that fuzzy support vector machine has several advantages for microarray classification. First, fuzzy support vector machine combined with a feature selection method has higher generalization ability than common classification models for microarrays like support vector machine, artificial neural network, decision trees, k nearest neighbor, and diagonal linear discriminant analysis. Second, this model is robust in terms of the value of the parameter C and is not sensitive to the changes of the number of selected features. Third, the output rule-base of this model is a useful tool for extracting biological knowledge from microarray data. The authors believe that an accurate and interpretable model like fuzzy support vector machine finds high interest among biologist, physicians, and bioinformaticians who work on microarray classification problem.
This publication is prepared based on the following master's dissertation: M. Hajiloo: Using Statistical Pattern Recognition on Gene Expression Data for Cancer Prediction, M.Sc. Dissertation, Department of Computer Engineering, Sharif University of Technology, August 2009.
The publication costs for this article were shared by all authors.
This article has been published as part of BMC Bioinformatics Volume 14 Supplement 13, 2013: Selected articles from the 9th Annual Biotechnology and Bioinformatics Symposium (BIOT 2012). The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/14/S13
- Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999, 286: 531-537. 10.1126/science.286.5439.531.View ArticlePubMedGoogle Scholar
- Collins SF, Morgan M, Patrinos A: The Human Genome Project: lessons from large-scale biology. Science. 2003, 300: 286-90. 10.1126/science.1084564.View ArticlePubMedGoogle Scholar
- Lu Y, Han J: Cancer classification using gene expression data. Information Systems. 2003, 28 (4): 243-268. 10.1016/S0306-4379(02)00072-8.View ArticleGoogle Scholar
- Saeys Y, Inza I, Larranaga P: A review of feature selection techniques in bioinformatics. Bioinformatics. 2007, 23 (19): 2507-2517. 10.1093/bioinformatics/btm344.View ArticlePubMedGoogle Scholar
- Friguet C, Kloareg M, Causeur D: A factor model approach to multiple testing under dependence. Journal of the American Statistical Association. 2009, 104 (488): 1406-1415. 10.1198/jasa.2009.tm08332.View ArticleGoogle Scholar
- Leek JT, Storey JD: A general framework for multiple testing dependence. Proceedings of the National Academy of Sciences. 2008, 105: 18718-18723. 10.1073/pnas.0808709105.View ArticleGoogle Scholar
- Boulesteix AL, Strobl C, Augustin T, Daumer M: Evaluating microarray- based classifiers: an overview. Cancer Informatics. 2008, 6: 77-97.PubMed CentralPubMedGoogle Scholar
- Ambroise C, McLachlan GJ: Selection bias in gene extraction in tumour classification on basis of microarray gene expression data. Proceedings of the National Academy of Sciences. 2002, 99 (10): 6562-6566. 10.1073/pnas.102102699.View ArticleGoogle Scholar
- Pirooznia M, Yang JY, Yang MQ, Deng Y: A comparative study of different machine learning methods on microarray gene expression data. BMC Genomics. 2008, 9 (Suppl 1): S13-10.1186/1471-2164-9-S1-S13.PubMed CentralView ArticlePubMedGoogle Scholar
- Dudoit S, Fridlyand J, Speed TP: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association. 2002, 97 (457): 77-87. 10.1198/016214502753479248.View ArticleGoogle Scholar
- Mukherjee S, Tamayo P, Slonim D, Verri A, Golub T, Messirov JP: Support vector machine classification of microarray data. AI memo. CBCL paper 182. 2000, MIT Press, Cambridge, MAGoogle Scholar
- Furey TS, Cristiani N, Duffy N, Bednarski DW, Schummer M, Haussler D: Support vector machine classification of cancer tissue samples using microarray expression data. Bioinformatics. 2000, 16: 906-914. 10.1093/bioinformatics/16.10.906.View ArticlePubMedGoogle Scholar
- Khan J, Wei JS, Ringnér M: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine. 2001, 7: 673-679. 10.1038/89044.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang H, Yu C, Singer B, Xiong M: Recursive partitioning for tumor classification with gene expression microarray data. Proceedings of the National Academy of Sciences. 2001, 98: 6730-6735. 10.1073/pnas.111153698.View ArticleGoogle Scholar
- Vapnik V: The Nature of Statistical Learning Theory. 1995, New York: Springer- VerlagView ArticleGoogle Scholar
- Breiman L, Friedman JH, Olshen RA, Stone CJ: Classification and regression trees. Monterey, California: Wadsworth and Brooks/Cole Advanced Books and Software. 1984Google Scholar
- Li J, Liu H, Ng SK, Wong L: Discovery of significant rules for classifying cancer diagnosis data. Bioinformatics. 2003, 19 (2): 93-102.Google Scholar
- Hastie T, Tibshirani R, Friedman JH: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2001, SpringerView ArticleGoogle Scholar
- Xu P, Brock GN, Parrish RS: Modified linear discriminant analysis approaches for classification of high-dimensional microarray data. Comput Stat Data Anal. 2009, 53: 1674-1687. 10.1016/j.csda.2008.02.005.View ArticleGoogle Scholar
- Tang Y, Zhang YQ, Huang Z: FCM-SVM-RFE gene feature selection algorithm for leukemia classification from microarray gene expression data. Proceedings of The 14th IEEE International Conference on Fuzzy Systems (FUZZ'05). 2005, 97-101.Google Scholar
- Guyon I, Weston J, Barnhill S, Vapnik V: Gene selection for cancer classification using support vector machines. Machine Learning. 2002, 46: 389-422. 10.1023/A:1012487302797.View ArticleGoogle Scholar
- Chen Y, Wang J: Support vector learning for fuzzy rule-based classification systems. IEEE Transactions on Fuzzy Systems. 2003, 11 (6): 716-728. 10.1109/TFUZZ.2003.819843.View ArticleGoogle Scholar
- Lin CF, Wang SD: Fuzzy support vector machine. IEEE Transactions on Neural Networks. 2002, 13 (2): 464-471. 10.1109/72.991432.View ArticlePubMedGoogle Scholar
- Takagi T, Sugeno M: Fuzzy identification of systems and its applications to modeling and control. IEEE Transactions on Systems, Man, and Cybernetics. 1985, 15: 116-132.View ArticleGoogle Scholar
- Dubois D, Prade H: Operations on fuzzy numbers. International Journal of Systems Science. 1978, 9 (6): 613-626. 10.1080/00207727808941724.View ArticleGoogle Scholar
- Cristianini N, Shawe-Taylor J: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. 2000, Cambridge, U.K.: Cambridge University PressView ArticleGoogle Scholar
- Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D'Amico AV, Richie JP: Gene expression correlates of clinical prostate cancer behaviour. Cancer Cell. 2002, 1: 203-209. 10.1016/S1535-6108(02)00030-2.View ArticlePubMedGoogle Scholar
- Alon U, Barkai N, Notterman D, Gish K, Ybarra S, Mack D, Levine A: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of National Academy of Science USA. 1999, 96: 6745-6750. 10.1073/pnas.96.12.6745.View ArticleGoogle Scholar
- Ho SY, Hsieh CH, Chen HM, Huang HL: Interpretable gene expression classifier with an accurate and compact fuzzy rule base for microarray data analysis. Biosystems. 2006, 85 (3): 165-176. 10.1016/j.biosystems.2006.01.002.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.