Combining Pareto-optimal clusters using supervised learning for identifying co-expressed genes

Background The landscape of biological and biomedical research is being changed rapidly with the invention of microarrays which enables simultaneous view on the transcription levels of a huge number of genes across different experimental conditions or time points. Using microarray data sets, clustering algorithms have been actively utilized in order to identify groups of co-expressed genes. This article poses the problem of fuzzy clustering in microarray data as a multiobjective optimization problem which simultaneously optimizes two internal fuzzy cluster validity indices to yield a set of Pareto-optimal clustering solutions. Each of these clustering solutions possesses some amount of information regarding the clustering structure of the input data. Motivated by this fact, a novel fuzzy majority voting approach is proposed to combine the clustering information from all the solutions in the resultant Pareto-optimal set. This approach first identifies the genes which are assigned to some particular cluster with high membership degree by most of the Pareto-optimal solutions. Using this set of genes as the training set, the remaining genes are classified by a supervised learning algorithm. In this work, we have used a Support Vector Machine (SVM) classifier for this purpose. Results The performance of the proposed clustering technique has been demonstrated on five publicly available benchmark microarray data sets, viz., Yeast Sporulation, Yeast Cell Cycle, Arabidopsis Thaliana, Human Fibroblasts Serum and Rat Central Nervous System. Comparative studies of the use of different SVM kernels and several widely used microarray clustering techniques are reported. Moreover, statistical significance tests have been carried out to establish the statistical superiority of the proposed clustering approach. Finally, biological significance tests have been carried out using a web based gene annotation tool to show that the proposed method is able to produce biologically relevant clusters of co-expressed genes. Conclusion The proposed clustering method has been shown to perform better than other well-known clustering algorithms in finding clusters of co-expressed genes efficiently. The clusters of genes produced by the proposed technique are also found to be biologically significant, i.e., consist of genes which belong to the same functional groups. This indicates that the proposed clustering method can be used efficiently to identify co-expressed genes in microarray gene expression data. Supplementary Website The pre-processed and normalized data sets, the matlab code and other related materials are available at .


Background
The progress in the field of microarray technology has made it possible to simultaneously study the expression levels of a large number of genes across different experimental conditions. Microarray technology has applications in the areas of medical diagnosis, bio-medicine, gene expression profiling, etc [1][2][3][4]. Usually, the gene expression values during a biological experiment are measured at different time points. A microarray gene expression data, consisting of g genes and h time points, is typically organized in a 2D matrix E = [e ij ] of size g × h.
Each element e ij gives the expression level of the ith gene at the jth time point. Clustering [5], an important microarray analysis tool, is used to identify the sets of genes with similar expression profiles. Clustering methods partition a set of n objects into K groups based on some similarity/dissimilarity metric where the value of K may or may not be known a priori. Unlike hard clustering, a fuzzy clustering algorithm produces a K × n membership matrix U(X) = [u kj ], k = 1, ..., K and j = 1, ..., n, where u kj denotes the probability of assigning pattern x j to cluster C k . For probabilistic non-degenerate clustering, 0 <u kj < 1 and , 1 ≤ j ≤ n [6].
Genetic algorithms [7] have been effectively used to develop efficient clustering techniques [8,9]. These techniques use a single cluster validity measure as the fitness function to reflect the goodness of an encoded clustering. However, a single cluster validity measure is seldom equally applicable for different kinds of data sets. This article poses the problem of fuzzy partitioning as one of multiobjective optimization (MOO) [10][11][12][13]. Unlike single objective optimization, in MOO, search is performed over a number of, often conflicting, objective functions. The final solution set contains a number of Pareto-optimal solutions, none of which can be further improved on any one objective without degrading it in another. A Nondominated Sorting GA-II (NSGA-II) [13] based multiobjective fuzzy clustering algorithm has been adopted that optimizes the Xie-Beni (XB) index [14] and the fuzzy Cmeans (FCM) [6] measure (J m ) simultaneously [11]. A characteristic of any MOO approach is that it often produces a large number of Pareto-optimal solutions, from which selecting a particular solution is difficult. The existing methods use the characteristics of the Pareto-optimal surface or some external measure for this purpose. However, these approaches almost always pick up one solution from the Pareto-optimal set as the final solution, although evidently all the solutions in this set have some information that is inherently good for the problem in hand. Motivated by this observation, this article describes a novel method to obtain the final solution while considering all the Pareto-optimal solutions by utilizing the input data as a guiding factor. The approach is to integrate the multiobjective clustering technique with a support vector machine (SVM) [15] based classifier to obtain the final solution from the Pareto-optimal set. The procedure involves utilizing the points which are given a high membership degree to a particular class by a majority of the non-dominated solutions. These points are taken as the training points to train the SVM classifier. The remaining points are then classified by the trained SVM classifier to yield the class labels for these points.
Many approaches that solve clustering problems with machine learning algorithms, such as Artificial Neural Networks, Genetic Algorithms, Simulated Annealing etc., can be found in the literature. In [16], an unsupervised self organizing neural network based hierarchical clustering algorithm for gene expression data has been developed. The unsupervised neural network grows adopting the topology of a binary tree. The algorithm combines the advantages of both hierarchical clustering and Self Organizing Map (SOM). In [17], an unsupervised clustering technique based on self-optimizing neural network has been presented. The algorithm is able to find out the most differentiating features for training data and recursively divides them into subgroups. The division of the data is recursively performed till the differences among the subgroups become imperceptible. In [18], a multiple-level hybrid classifier, which combines the supervised decision tree classifiers and unsupervised Bayesian clustering to detect intrusions has been proposed. Clustering using Genetic Algorithms (GA) [8][9][10][11][12] and Simulated Annealing (SA) [19][20][21][22][23] have widely been studied in the literature. The clustering method proposed in this article differs from those mentioned above in the sense that in this algorithm, a novel approach to boost the clustering performance of the multiobjective genetic fuzzy clustering by integrating it with a supervised learning approach is proposed. In this regard, a fuzzy majority voting technique followed by SVM classification is applied on the resultant set of non-dominated solutions in order to obtain the final solution.
The performance of the Multiobjective GA (MOGA) based fuzzy clustering followed by SVM classification (MOGA-SVM) has been demonstrated on five real-life gene expression data sets, viz., Yeast Sporulation, Yeast Cell Cycle, Arabidopsis Thaliana, Human Fibroblasts Serum and Rat CNS data. The superiority of the proposed technique, as compared to MOGA clustering [11], a crisp version of MOGA-SVM, termed as MOGA crisp -SVM, FCM algorithm [6], single objective GA (SGA) [9], hierarchical average linkage clustering, Self Organizing Map (SOM) clustering [24] and Chinese Restaurant Clustering (CRC) [25], is demonstrated both quantitatively and visually. The use of different SVM kernels has been explored. The superiority of the MOGA-SVM clustering technique has been proved to be statistically significant through statistical tests. Finally a biological significance test has been conducted to establish that the proposed technique produces functionally enriched clusters.

Results and Discussion
The performance of the proposed MOGA-SVM clustering has been evaluated on five publicly available real life gene expression data sets, viz., Yeast Sporulation, Yeast Cell Cycle, Arabidopsis Thaliana, Human Fibroblasts Serum and Rat CNS data. First, the effect of the parameter β (majority voting threshold) on the performance of MOGA-SVM clustering has been examined. Thereafter, we examined the use of different kernel functions and compared their performances. The performance of the proposed technique has also been compared with those of fuzzy MOGA clustering (without SVM) [10,11], FCM [6], single objective genetic clustering scheme which minimizes XB validity measure (SGA) [9], average linkage method [26], SOM [24] and CRC [25]. Moreover, a crisp version of MOGA-SVM clustering (MOGA crisp -SVM) is considered for comparison in order to establish the utility of incorporating fuzziness. Unlike fuzzy MOGA-SVM, which uses the FCM based chromosome update, in MOGA crisp -SVM, chromosomes are updated using the Kmeans like center update process and the crisp versions of J m and XB indices are optimized simultaneously. To obtain the final clustering solution from the set of nondominated solutions, similar procedure as in fuzzy MOGA-SVM is followed. Note that in the case of MOGA crisp -SVM, as membership degrees are either 0 or 1, hence the membership threshold parameter α is not required.
The statistical and biological significance of the clustering results have also been evaluated.

Effect of Majority Voting Threshold β
In this section we have analyzed how the parameter β (majority voting threshold) affects the performance of the proposed MOGA-SVM clustering technique. The algorithm has been executed for a range of β values starting from 0.1 to 0.9 with a step size of 0.05 for all the data sets. The results reported in this section are for the Radial Basis Function (RBF) [15,27]. Experiments with other kernel functions are also found to provide similar behavior. For each value of β, the average value of the silhouette index (s(C)) scores over 20 runs has been considered. The parameter α (membership threshold) has been kept constant at 0.5. The variation of average s(C) scores for different values of β are demonstrated in Fig. 1 for the five data sets.
It is evident from Fig. 1 that for all the data sets, MOGA-SVM behaves similarly in terms of variation of average s(C) over the range of β values. The general trend is that first the average s(C) scores get improved with increasing β value, then remains almost constant in the range of around 0.4 to 0.6, and then deteriorates with further increase in β value. This behavior is quite expected, as for small value of β, the training set will contain lot of lowconfidence points, which causes the class boundaries to be defined incorrectly for SVM. On the other hand, when β value is very high, the training set is small and contains only a few high confidence points. Thus the hyperplanes between the classes cannot be properly defined. In some range of β (around 0.4 to 0.6), a tradeoff is obtained between the size of the training set and its confidence level. Hence in this range, MOGA-SVM provides the best s(C) index scores. With this observation, in all the experiments hereafter, β value has been kept constant at 0.5.

Performance of MOGA-SVM for Different Kernels
Four kernel functions, viz., linear, polynomial, sigmoidal and RBF are considered in this article. In this section, a study has been made on how the different kernel functions perform for the five data sets. Table 1 reports the s(C) scores (averaged over 20 runs) produced by MOGA-SVM with the four different kernel functions for the five data sets. The average s(C) scores provided by MOGA (without SVM) over 20 runs is also reported for each data set. Moreover, the number of clusters K (corresponding to the solution providing the best silhouette index score) found for the different data sets has been shown.
As is evident from the table, irrespective of the kernel function considered, use of SVM provides better s(C) score compared to the MOGA(without SVM). This is expected since the MOGA-SVM techniques provide equal importance to all the non-dominated solutions, rather than a single one. Thus through fuzzy voting, the core group of genes for each cluster is identified and the class labels of the remaining genes are predicted by the SVM. It can also be noticed from the table that the silhouette index produced by the RBF kernel is greater than those produced by the other kernels. This is because RBF kernels are known to perform well in case of spherical shaped clusters, which is very common in case of gene expression data sets. Henceforth, MOGA-SVM will indicate MOGA-SVM with RBF kernel only. Table 2  also provides reasonably good s(C) index scores, but is outperformed by MOGA-SVM for all the data sets. This indicates the utility of incorporating fuzziness in MOGA clustering. Interestingly, while incorporation of SVM based training improves the performance of MOGA clustering, the latter also provides, in most cases, better s(C) values than SGA and the other non-genetic approaches. Only for Yeast Sporulation and Arabidopsis Thaliana data sets, the results for MOGA (without SVM) are slightly inferior to those of SOM and CRC, respectively. However, the performance of the proposed MOGA-SVM is the best for all the data sets.

Comparative Results
MOGA has determined 6, 5, 4, 6 and 6 number of clusters for the Sporulation, Cell Cycle, Arabidopsis, Serum and Rat CNS data sets, respectively. This conforms to the findings in the literature [28][29][30][31]. Hence it is evident from the table that while MOGA (without SVM) and MOGA crisp -SVM (RBF) are generally superior to the other methods, MOGA-SVM is the best among all the competing methods for all the data sets considered here.
To demonstrate visually the result of MOGA-SVM clustering, Figs. 2, 3, 4, 5, 6 show the Eisen plot and cluster profile plots provided by MOGA-SVM on the five data sets, respectively. For example, the 6 clusters of the Yeast Sporulation data are very prominent as shown in the Eisen plot ( Fig. 2(a)). It is evident from the figure that the expression profiles of the genes of a cluster are similar to each other and they produce similar color patterns. The cluster profile plots ( Fig. 2(b)) also demonstrate how the expression profiles for the different groups of genes differ from each other, while the profiles within a group are reasonably similar. Similar results are obtained for the other data sets also.
The proposed technique performs better compared to the other clustering methods mainly because of the following reasons: first of all, this is a multiobjective clustering method. Simultaneous optimization of multiple cluster validity measures helps to cope with different characteristics of the partitioning and leads to higher quality solutions and an improved robustness towards the different data properties. Secondly, the strength of supervised learning has been integrated with the multiobjective clustering efficiently. As each of the solutions in the final nondominated set contains some information about the clustering structure of the data set, combining them with the help of majority voting followed by supervised classification yields a high quality clustering solution. Finally, incorporation of fuzziness makes the proposed technique better equipped in handling overlapping clusters.

Statistical Significance Test
To establish that MOGA-SVM is significantly superior compared to the other algorithms, a non-parametric sta-   index scores produced over 20 runs of the corresponding algorithm. The median values of each group for all the data sets are reported in Table 3.
As is evident from Table 3, the median values of s(C) scores for MOGA-SVM are better than those for the other algorithms. To establish that this goodness is statistically significant, Table 4 reports the p-values produced by Wilcoxon's rank sum test for comparison of two groups (group corresponding to MOGA-SVM and a group corresponding to some other algorithm) at a time. As a null hypothesis, it is assumed that there are no significant difference between the median values of two groups. Whereas, the alternative hypothesis is that there is significant difference in the median values of the two groups. All the p-values reported in the table are less than 0.05 (5% significance level). This is strong evidence against the null hypothesis, indicating that the better median values of the performance metric produced by MOGA-SVM is statistically significant and has not occurred by chance.

Biological Significance
The biological relevance of a cluster can be verified based on the statistically significant Gene Ontology (GO) annotation database http://db.yeastgenome.org/cgi-bin/GO/ goTermFinder. This is used to test the functional enrichment of a group of genes in terms of three structured, controlled vocabularies (ontologies), viz., associated Yeast Sporulation data clustered using MOGA-SVM clustering method biological processes, molecular functions and biological components. The degree of functional enrichment (pvalue) is computed using a cumulative hypergeometric distribution. This measures the probability of finding the number of genes involved in a given GO term (i.e., function, process, component) within a cluster. From a given GO category, the probability p of getting k or more genes within a cluster of size n, can be defined as [33]: where f and g denote the total number of genes within a category and within the genome, respectively. Statistical significance is evaluated for the genes in a cluster by computing the p-value for each GO category. This signifies how well the genes in the cluster match with the different GO categories. If the majority of genes in a cluster have the same biological function, then it is unlikely that this takes place by chance and the p-value of the category will be close to 0.
The biological significance test for Yeast Sporulation data has been conducted at the 1% significance level. For different algorithms, the number of clusters for which the most significant GO terms have a p-value less than 0.01 Yeast Cell Cycle data clustered using MOGA-SVM clustering method As an illustration, Table 5 reports the three most significant GO terms (along with the corresponding p-values) shared by the genes of each of the 6 clusters identified by MOGA-SVM technique (Fig. 2). As is evident from the table, all the clusters produced by MOGA-SVM clustering scheme are significantly enriched with some GO categories, since all the p-values are less than 0.01 (1% significance level). This establishes that the proposed MOGA-SVM clustering scheme is able to produce biologically relevant and functionally enriched clusters.

Conclusion
This article proposes a novel method for obtaining a final solution from the set of non-dominated solutions pro-Arabidopsis Thaliana data clustered using MOGA-SVM clustering method As a scope of further research, performance of other MOGA techniques, such as AMOSA [23] is to be tested. Also, combination of MOGA clustering with different popular supervised classification tools other than SVM can also be studied.

Multiobjective Optimization
The multiobjective optimization can formally be stated as There are a number of multiobjective optimization techniques available. Among them, the GA based techniques such as NSGA-II [13], SPEA and SPEA2 [35] are very popular. The multiobjective fuzzy clustering scheme [11] considered here uses NSGA-II as an underlying multiobjective framework for developing the proposed fuzzy clustering algorithm.

Multiobjective Fuzzy Clustering
This section briefly describes the NSGA-II based multiobjective fuzzy clustering scheme (MOGA) [11]. The algorithm MOGA uses real valued chromosomes that denote the co-ordinates of the cluster centers and each has length K × d, where K is the number of clusters and d is dimension of the data. Each chromosome in the initial population consists of the co-ordinates of K random points from the data set. Two cluster validity indices, Xie-Beni (XB) [14] and fuzzy C-means (FCM) measure (J m ) [6] are simultaneously optimized. For computing the objective functions, first the centers V = {v 1 , v 2 , ..., v K } encoded in a given chromosome are extracted. The fuzzy membership values u ik , i = 1, 2, ..., K, k = 1, 2, ..., n are computed using the following equation [6]: where D(v i , x k ) denotes the distance between ith cluster center and kth data point and m ∈ {1, ∞} is the fuzzy exponent. In this article, the Correlation based distance measure is used. Subsequently each cluster center v i , i = 1, 2, ..., K, is updated using the following equation [6]: The membership values are then recomputed using Eq.
(2). The XB index is defined as a function of the ratio of the total variation σ to the minimum separation sep of the clusters. Here σ and sep can be written as: and , ; , for  The XB index is then written as [14]: Note that when the partitioning is compact and the clusters are well separated, the value of σ should be low while sep should be high, thereby yielding lower values of the XB index. The objective is therefore to minimize it.
The other objective is the J m measure optimized by the FCM algorithm. This computes the global fuzzy variance of the clusters and this is expressed by the following equation [6]: J m is to be minimized to get compact clusters. XB and J m indices are to an extent contradictory in nature. XB index is responsible for both compactness and separation for the clusters, whereas J m only represents the global compactness of the clusters. For the purpose of illustration, Fig. 8 shows the Pareto front obtained by the multiobjec-tive fuzzy clustering for Yeast Sporulation data set. The Pareto front indicates that the two objective functions are in conflict with each other.
Crowded binary tournament selection [13] followed by conventional crossover and mutation operators is used here. NSGA-II uses the elitist model where the non-dominated solutions of the parent and child populations are propagated to the next generation in order to keep track of the best solutions obtained so far. The algorithm has been executed for a fixed number of generations. It produces a set of non-dominated solutions in the last generation.

Support Vector Machine
Support vector machine (SVM) classifiers are inspired by statistical learning theory and they perform structural risk minimization on a nested set structure of separating hyperplanes [15,27]. Fundamentally the SVM classifier is designed for two-class problems. Viewing the input data as two sets of vectors in a p-dimensional space, an SVM constructs a separating hyperplane in that space, the one which maximizes the margin between the two classes of points. To compute the margin, two parallel hyperplanes are constructed on each side of the separating one, which are "pushed up against" the two classes of points. Intui- Boxplots of the p-values of the most significant GO terms of all the clusters having at least one significant GO term as obtained by different algorithms for Yeast Sporulation data tively, a good separation is achieved by the hyperplane that has the largest distance to the neighboring data points of both the classes. The larger the margin or distance between these parallel hyperplanes, the better is the generalization error of the classifier. It can be extended to handle multi-class problems by designing a number of one-against-all or one-against-one two-class SVMs.
Kernel functions are used for mapping the input space to a higher dimensional feature space so that the classes become linearly separable. Use of four popular kernel functions has been studied in this article. These are: The extended version of the two-class SVM that deals with multi-class classification problem by designing a number of one-against-all two-class SVMs [27,36] is used here. For example, a K-class problem is handled with K two-class SVMs, each of which is used to separate a class of points from all the remaining points.

Proposed MOGA-SVM Clustering
This section describes the proposed scheme for integrating the multiobjective fuzzy clustering algorithm (MOGA) with the SVM classifier. The combined approach is called MOGA-SVM. The basic observation motivating MOGA-SVM is that if a subset of points are almost always clustered together by most of the non-dominated solutions, then they may safely be considered to be clustered properly. Hence these points may be used for training a classifier, which can thereafter be used for grouping the remaining low confidence points. In MOGA-SVM, all the final non-dominated solutions are given equal importance and a fuzzy majority voting technique is applied to identify the training set. Since SVM is considered one of the best state-of-art classifiers, it is used here for classification. The steps of MOGA-SVM are as follows: 1. Apply MOGA clustering on the given data set to obtain a set S = {s 1 , s 2 , ..., s N }, N ≤ P, (P is the population size) of non-dominated solution strings consisting of cluster centers.
2. Using Eq. (2), compute the fuzzy membership matrix U (i) for each of the non-dominated solutions s i , 1 ≤ i ≤ N.  4. Mark the points whose maximum membership degree (to cluster j, j ∈ {1, 2, ..., K}) is greater than a membership threshold α (0 ≤ α ≤ 1), for at least βN solutions, as training points. Here β (0 ≤ β ≤ 1) is the threshold of the fuzzy majority voting. These points are labeled with class j.
5. Train the multi-class SVM classifier (i.e., K one-againstall two-class SVM classifiers, K being the number of clusters) using the selected training points.
6. Predict the class labels for the remaining points (test points) using the trained SVM classifier.
7. Combine the label vectors corresponding to training and testing points to obtain the final clustering for the complete data set.
The sizes of the training and testing sets depend on the two threshold parameters α and β. Here α is the membership threshold, i.e., it is the maximum membership degree above which a point can be considered as a training point. Hence if α is increased, the size of the training set will decrease, but the confidence on the training points will increase. On the other hand, if α is decreased, the size of the training set will increase but the confidence of the training points will decrease. The parameter β determines the minimum number of non-dominated solutions that agree with each other in the fuzzy voting context. If β is increased, the size of the training set will decrease but it indicates that more number of non-dominated solutions agree with each other. On the contrary, if β is decreased, the size of the training set increases but it indicates a smaller number of non-dominated solutions have agreement among them. Hence both the parameters α and β are needed to be tuned in such a way so that a tradeoff is achieved between the size and confidence of the training set of SVM. To achieve this, after several experiments, we have set both the parameters to a value of 0.5.

Data Sets and Preprocessing
Yeast Sporulation This data set [29] consists of 6118 genes measured across 7 time points (0, 0.5, 2, 5, 7, 9 and 11.5 hours) during the sporulation process of budding yeast. The data set is then log-transformed. The Sporulation data set is publicly available at the website http://cmgm.stanford.edu/ pbrown/sporulation. Among the 6118 genes, the genes whose expression levels did not change significantly during the harvesting have been ignored from further analysis. This is determined with a threshold level of 1.6 for the root mean squares of the log2-transformed ratios. The resulting set consists of 474 genes.

Yeast Cell Cycle
The Yeast Cell Cycle data set was extracted from a data set that shows the fluctuation of expression levels of approximately 6000 genes over two cell cycles (17 time points). Out of these 6000 genes, 384 genes have been selected to be cell-cycle regulated [37]. This data set is publicly available at the following website: http://faculty.washing ton.edu/kayee/cluster. The final non-dominated Pareto-optimal front obtained by MOGA clustering for Yeast Sporulation data set Figure 8 The final non-dominated Pareto-optimal front obtained by MOGA clustering for Yeast Sporulation data set.

Rat CNS
The Rat CNS data set has been obtained by reverse transcription-coupled PCR to examine the expression levels of a set of 112 genes during rat central nervous system development over 9 time points [30]. This data set is available at http://faculty.washington.edu/kayee/cluster.
All the data sets are normalized so that each row has mean 0 and variance 1.

Performance Metrics
For evaluating the performance of the clustering algorithms silhouette index [40] is used. Moreover, two cluster visualization tools, namely, Eisen plot and cluster profile plot, have been utilized.

Silhouette Index
Silhouette index [40] is a cluster validity index that is used to judge the quality of any clustering solution C. Suppose a represents the average distance of a point from the other points of the cluster to which the point is assigned, and b represents the minimum of the average distances of the point from the points of the other clusters. Now the silhouette width s of the point is defined as: silhouette index s(C) is the average silhouette width of all the data points (genes) and it reflects the compactness and separation of clusters. The value of silhouette index varies from -1 to 1 and higher value indicates better clustering result.

Eisen Plot
In Eisen plot [2] (see Fig. 2(a) for an example), the expression value of a gene at a specific time point is represented by coloring the corresponding cell of the data matrix with a color similar to the original color of its spot on the microarray. The shades of red represent higher expression levels, the shades of green represent lower expression levels and the colors towards black represent absence of differential expression. In our representation, the genes are ordered before plotting so that the genes that belong to the same cluster are placed one after another. The cluster boundaries are identified by white colored blank rows.

Cluster Profile Plot
The cluster profile plot (see Fig. 2(b) for an example) shows for each cluster the normalized gene expression values (light green) of the genes of that cluster with respect to the time points. Also, the average expression values of the genes of a cluster over different time points are plotted as a black line together with the standard deviation within the cluster at each time point.

Input Parameters
The values of the different parameters of MOGA and single objective GA are as follows: number of generations = 100, population size = 50, crossover probability = 0.8 and mutation probability = 0.01. Both α and β are set to 0.5.
The parameter values have been set after several experiments. The fuzzy exponent m is chosen as in [41,42], and the values of m for the data sets Sporulation, Cell Cycle, Arabidopsis, Serum and Rat CNS are obtained as 1.34, 1.14, 1.18, 1.25 and 1.21, respectively. The fuzzy C-means algorithm has been run for 200 iterations unless it converges before that. Each algorithm has been executed for different number of clusters and the solution giving the best silhouette index score is considered.