Volume 12 Supplement 1
Selected articles from the Ninth Asia Pacific Bioinformatics Conference (APBC 2011)
Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery
- Henry Han^{1, 2}Email author and
- Xiao-Li Li^{3}
DOI: 10.1186/1471-2105-12-S1-S7
© Han and Li; licensee BioMed Central Ltd. 2011
Published: 15 February 2011
Abstract
Background
Although high-throughput microarray based molecular diagnostic technologies show a great promise in cancer diagnosis, it is still far from a clinical application due to its low and instable sensitivities and specificities in cancer molecular pattern recognition. In fact, high-dimensional and heterogeneous tumor profiles challenge current machine learning methodologies for its small number of samples and large or even huge number of variables (genes). This naturally calls for the use of an effective feature selection in microarray data classification.
Methods
We propose a novel feature selection method: multi-resolution independent component analysis (MICA) for large-scale gene expression data. This method overcomes the weak points of the widely used transform-based feature selection methods such as principal component analysis (PCA), independent component analysis (ICA), and nonnegative matrix factorization (NMF) by avoiding their global feature-selection mechanism. In addition to demonstrating the effectiveness of the multi-resolution independent component analysis in meaningful biomarker discovery, we present a multi-resolution independent component analysis based support vector machines (MICA-SVM) and linear discriminant analysis (MICA-LDA) to attain high-performance classifications in low-dimensional spaces.
Results
We have demonstrated the superiority and stability of our algorithms by performing comprehensive experimental comparisons with nine state-of-the-art algorithms on six high-dimensional heterogeneous profiles under cross validations. Our classification algorithms, especially, MICA-SVM, not only accomplish clinical or near-clinical level sensitivities and specificities, but also show strong performance stability over its peers in classification. Software that implements the major algorithm and data sets on which this paper focuses are freely available at https://sites.google.com/site/heyaumapbc2011/.
Conclusions
This work suggests a new direction to accelerate microarray technologies into a clinical routine through building a high-performance classifier to attain clinical-level sensitivities and specificities by treating an input profile as a ‘profile-biomarker’. The multi-resolution data analysis based redundant global feature suppressing and effective local feature extraction also have a positive impact on large scale ‘omics’ data mining.
Background
With the rapid developments in genomics, high-throughput microarray pattern analysis shows a great potential in cancer diagnosis for its efficiency and cost-effectiveness [1]. However, such a promising technology remains an important research field rather than an applicable clinical-routine. Aside intrinsic factors from microarray profiling technologies, a key issue preventing it from becoming a clinical paradigm is that the relatively low even poor sensitivities and specificities obtained from current pattern recognition methodologies are inadequate to provide a robust clinical support. Moreover, some pattern classification methods may perform reasonably well in some data sets but fail badly in others. Although there is an urgent need in clinical cancer research to develop high-performance pattern recognition methods in gene expression analysis, it is still a challenge in machine learning to attain high-accuracy classification for the special characteristics of gene expression profiles.
A gene expression profile can be represented by a p×n matrix after preprocessing, each column of which represents gene expression values of all biological samples at a gene; each row of which represents gene expression values of a single biological sample across a genome. The total number of genes is in the order of 10^{3}~ 10^{4}, and the total number of biological samples is on the magnitude of tens or hundreds. Since the number of variables (genes) is much greater than the number of samples (observations), some traditional pattern recognition methods (e.g., fisher discriminant analysis) may have instable solutions and lead to a low or poor classification performance. Alternatively, although there are a large number of genes in a profile, only a small portion of them have meaningful contributions to data variations. In addition, the high-dimensional data are not noise-free because preprocessing algorithms may not remove systematic noise contained in raw data completely. Obviously, the data redundancy and noise may inevitably affect the discriminative power of the classification algorithms applied to microarray data.
It is clear that feature selection play a critical role in gene expression analysis to decrease dimensionalities, remove noise, and extract meaningful features before performing classification. Feature selection algorithms usually can be categorized into three types: statistical test-based (e.g., two-sample t-tests), wrapper-based (e.g., SVM-based wrappers) [2], and transform-based feature selections. The transform-based feature selection may be mostly used data reduction techniques for their popularity and efficiency. They include principal component analysis (PCA) [3], independent component analysis (ICA) [4], nonnegative matrix factorization (NMF) [5, 6], etc, and their different extensions [7, 8].
However, these transform-based feature selection algorithms are generally good at selecting global features instead of local features. The global and local features contribute to the global and local characteristics of data and interpret global and local behavior of data respectively. Statistically, the global features consist of high-frequency signals and the local features consist of low-frequency signals. Unlike the global features, the local features are difficult to extract for most feature-selection algorithms, because the low-frequency signals have a lower likelihood to get involved in inferring the ‘new’ low-dimensional data, which are generally the linear combinations of all input variables, than the high-frequency signals. Finally, the low dimensional data obtained from the traditional feature selection methods may miss some local data characteristics described by the local features. For example, PCA is by-nature a global feature selection algorithm: each principal component contains some levels of global characteristics of data and receives contributions from all input variables in the linear combinations. In addition, changes in one variable will inevitably affect all loading vectors globally. However, local features may be a key to attaining high-performance gene expression pattern classification for its subtle data behavior capturing. For example, some benign tumor samples may display very similar global characteristics with malignant tumor samples but with different local characteristics. To attain high-performance diagnosis, it is essential to capture local data characteristics to distinguish these samples with similar global characteristics.
The main reason for these algorithms’ global-feature selection mechanism is because they all are single-resolution feature selection methods, where all features are indistinguishably displayed in a single-resolution despite the nature of their frequencies. It inevitably causes global features more likely to be selected than local features and prevents effective local data-characteristics capturing. Mathematically, all variables of the input data are involved in the linear combinations to compute principal components in PCA, independent components in ICA, and basis vectors in NMF respectively. Such a global feature selection mechanism will prevent high-accuracy genomic pattern recognition in the following classification because only the features interpreting global characteristics are involved in training a learning machine (e.g., SVM). The redundant global features may inevitably decrease the generalization of the learning machine and increase the risk of misclassifications or over-fitting. Finally, the learning machines integrated with the global feature-selection algorithms will display instabilities in classifications.
To avoid the global feature selection mechanism, it is desirable to distinguish (e.g., sort) features according to their frequencies rather than treat them uniformly, which makes the high-frequency signals dominate the feature selection and the low-frequency signals lose opportunities. A discrete wavelet transform (DWT) [9] can hierarchically organize data in a multi-resolution way by low and high pass filters. The low (high)-pass filters only pass low (high)-frequency signals but attenuate signals with frequencies higher (lower) than a cutoff frequency. Finally, the DWT coefficients at the coarse levels capture global features of the input signals and the coefficients at the fine levels capture local features of the signals, i.e., the low-frequency and high-frequency signals are represented by coefficients in the coarse and fine resolutions respectively. Obviously, the global feature selection mechanism can be relatively easy to overcome after such a ‘multi-resolution feature separation’, by selectively extracting local features and filtering redundant global features.
In this study, we propose a novel multi-resolution independent component analysis (MICA) to conduct effective feature selections for high dimensional heterogeneous gene expression data. Then, a multi-resolution independent component analysis based support vector machines (MICA-SVM) are proposed to achieve a high-performance gene expression pattern prediction. We demonstrate its superiority and stability by comparing it with existing state-of-the-art peers on six heterogeneous microarray profiles, in addition to extending MICA to linear discriminant analysis (MICA-LDA). We also develop a MICA-based filter-wrapper biomarker discovery algorithm to further demonstrate the novel feature selection algorithm’s effectiveness in biomarker capturing. Finally, we discuss potential extensions on the multi-resolution independent component analysis in microarray based molecular diagnosis and conclude this paper.
Methods
Multi-resolution independent component analysis is based on the discrete wavelet transform (DWT) and independent component analysis (ICA). A discrete wavelet transform decomposes input data in a multi-resolution form by using a wavelet and scaling function. The coefficients at the coarse and fine levels describe the global and local behavior of data respectively. Mathematically, DWT is equivalent to multiplying the input data by a set of orthogonal matrices block-wisely. On the other hand, ICA seeks to represent input data as the linear combination of a set of statistically independent components by minimizing their mutual information. Theoretically, it is equivalent to inverting the central limit theorem (CLT) by searching maximally non-normal projections of the original data distribution. More information about the DWT and ICA methods can be found in [4, 9].
Multi-resolution independent component analysis
The goal of the multi-resolution independent component analysis is to seek the statistically independent genomic patterns from a meta-profile computed by suppressing the coarse level coefficients (global features) and maintaining the fine level coefficients (local features) in the DWT of an input profile. As an approximation of the high dimensional input profile, the derived meta-profile captures almost all local features and keeps the most important global features. Unlike independent components in the classic ICA that are mainly retrieved from the global features for their high frequencies, the independent components calculated by our proposed MICA method are statistically independent signals, which contain contributions from almost all local features and the most important global features. As such, the latter is more representative in revealing the latent data structure than the former. Moreover, the redundant global feature suppressing brings MICA an automatic de-noising mechanism: since the coarse level coefficients (e.g., the first level coefficients) in DWT generally contain “contributions” from noise, suppressing coarse level coefficients not only filters unnecessary global features but also removes the noise. The MICA algorithm can be described as following steps.
1). Wavelet transforms
Given a gene expression profile with p samples across n genes , MICA conducts a L-level discrete wavelet transform for each sample to obtain a sequence of detail coefficient matrices and an approximation coefficient matrix , i.e., , where .
2). Feature selection
A level threshold is selected to suppress redundant global features and maintain local features as follows. If , 1) conduct principal component analysis for D_{ j } to obtain its PC matrix: and the corresponding score matrix . 2) reconstruct the original D_{ j } by using the first loading vector u_{ 1 } in the PC matrix as , where is a vector containing all ‘1’s. If , reconstruct and update each detail coefficient matrix D_{ j } by using the loading vectors with the 100% explained variance percentage and their corresponding vectors in the score matrix: . The explained variance percentage is the ratio between the accumulative variance from the selected data and the total data variance. For example, the explained variance percentage ρ_{ r } from those first r loading vectors is defined as , where λ_{ i } is the data variance from the i^{ th } loading vector. In the implementation, this step can be ‘lazily’ simplified as: keep all detail coefficient matrices intact to save computing resources.
3). Inverse discrete wavelet transforms
Conduct the corresponding inverse discrete wavelet transform using the updated coefficient matrices to get the meta-profile of X: , i.e., .
4). Independent component analysis
Conduct the classic independent component analysis for X^{ * } to obtain independent components and the mixing matrix: , where , and .
5). Subspace decomposition
The meta-profile X^{*} is the approximation of the original profile X by removing the redundant global features and retaining almost all local features by selecting features on behalf of their frequencies. It is easy to decompose each sample in the subspace spanned by all independent components . Each independent component is a basis in the subspace., i.e., , where the mixing matrix is , and the independent component matrix is . In other words, each sample can be represented as , where the meta-sample a_{ i } is the i^{ th } row of the mixing matrix recording the coordinate values of the sample x_{ i } in the subspace. As a low dimensional vector, the meta-sample a_{ i } retains almost all local features and the most outstanding global features of the original high-dimensional sample x_{ i }. Thus it can be called as a data-locality preserved prototype of x_{ i }.
Six gene-expression microarray profiles
Dataset | #Genes | #Samples | Technology |
---|---|---|---|
Stroma | 18995 | 13 inflammatory breast cancers (‘ibc’) + 34 non-inflammatory breast cancers (‘non-ibc’) | Oligonucleotide |
Breast_1 | 2000 | 53 controls + 163 cancers | Oligonucleotide |
Prostate | 12600 | 59 controls + 77 cancers | Oligonucleotide |
Glioma | 12625 | 28 glioblastomas + 22 anaplastic oligodendrogliomas | Oligonucleotide |
HCC | 7129 | 20 early intrahepatic recurrence + 40 non-early intrahepatic recurrence | Oligonucleotide |
Breast_2 | 24188 | 46 samples with distant metastasis within 5 year + 51 samples remain disease-free within 5 years | cDNA |
Multi-resolution Independent component analysis based support vector machines (MICA-SVM)
The normal of the maximum-margin hyperplane is calculated as . The decision rule is used to determine the class type of a testing sample x′, where are the corresponding meta-samples of samples , computed from MICA respectively. The function is a SVM kernel function that maps these meta-samples into a same-dimensional or high-dimensional feature space. In this work, we only focus on the linear kernel for its simplicity and efficiency in microarray pattern classifications. We will point out in the discussion section that most SVM-based learning machines would encounter overfitting under the standard Gaussian kernel (‘rbf’: radial basis function kernels).
Results
We have performed extensive experiments using six publicly available gene expression microarray profiles consisting of five oligonucleotide profiles [11–15] and one cDNA profile [16], in the experiment. Table 1 includes their detailed information. These profiles are heterogeneous data generated from different experimental conditions, different profiling technologies, or even processed by different preprocessing algorithms. For example, the stroma, prostate, glioma, and HCC data only go through basic log2 transforms while the breast_1 data is a dataset obtained by conducting two-sample t-tests from an original dataset going through delicate normalizations [12].
Cross validations
To address our algorithm’s superiority and reproducibility, we compare it with six comparison algorithms in terms of average classification rates, sensitivities, and specificities under the k-fold (k=10) and 100-trial of 50% holdout cross validations. The classification accuracy in the i^{ th } classification is the ratio of the correctly classified testing samples over total testing samples: , and the sensitivity and specificity are defined as the ratios: respectively, where tp (tn) is the number of positive (negative) targets correctly classified, and fp (fn) is the number of negative (positive) targets incorrectly classified respectively. In the 100-trial of 50% holdout cross validation (HOCV), all samples in the data set are pooled together and randomly divided into half to get training and testing data. Such a partition is repeated 100 times to get 100 sets of training and testing datasets. In the k-fold cross validation, an input dataset is partitioned into k disjoint equal or approximately equal proportions. One proportion is used for testing and the other k-1 proportions are used for training alternatively in the total k rounds of classifications. Compared with pre-specified training or testing data, the cross validations can decrease potential biases in algorithm performance evaluations.
Six comparison algorithms
The existing six comparison algorithms can be categorized into two types. The first type consists of standard support vector machines (SVM) [10] and linear discriminant analysis (LDA) [17], both of which are state-of-the-art classification algorithms. Especially, SVM is widely employed in gene expression pattern recognition for its popularity. The second type consists of four methods embedding transform-based feature selections in SVM and LDA: they are support vector machines with principal component analysis/independent component analysis/ nonnegative matrix factorization, and linear discriminant analysis with principal component analysis. We refer them as PCA-SVM, ICA-SVM, NMF-SVM, and PCA-LDA conveniently and their related implementation information can be found in Additional file 1.
We employ the wavelet ‘db8’ to conduct a 12-level discrete wavelet transform for each data set, and select a level threshold τ = 3 in MICA for all profiles. Although not an optimal level threshold for all data, it guarantees automatic de-noising and ‘fair’ algorithm comparisons. Moreover, we have found that the meta-samples obtained from MICA at τ = 3 can clearly distinguish two types of samples. Although other level threshold selections may be possible, any too ‘coarse’ (e.g.τ = 1) or too ‘fine’ (e.g.τ ≥ 9) level threshold selection may miss some important global or local features and affect following classifications.
Algorithm average performance comparisons (100 trials of 50% HOCV)
Dataset | Avg. classification rate ±std (%) | Avg. sensitivity ± std (%) | Avg. specificity ± std (%) |
---|---|---|---|
Stroma | |||
mica-svm | 98.26±02.25 | 100.0±00.00 | 93.89±08.11 |
svm | 73.83±07.02 | 92.87±06.58 | 25.45±15.92 |
pca-svm | 71.83±06.78 | 90.20±08.66 | 25.62±16.48 |
ica-svm | 71.48±06.78 | 90.04±09.05 | 25.06±17.87 |
nmf-svm | 68.39±08.67 | 86.30±11.93 | 23.69±11.93 |
pca-lda | 71.35±06.97 | 89.12±09.15 | 26.69±17.05 |
Breast_1 | |||
mica-svm | 99.04±00.99 | 99.49±01.18 | 97.73±02.95 |
svm | 86.40±02.87 | 92.43±02.76 | 68.78±11.53 |
pca-svm | 86.19±02.97 | 92.79±02.68 | 66.85±11.77 |
ica-svm | 86.27±02.99 | 92.80±02.82 | 67.11±12.37 |
nmf-svm | 85.44±02.42 | 93.52±02.91 | 61.29±09.10 |
pca-lda | 86.25±02.89 | 92.43±02.83 | 68.15±11.95 |
Prostate | |||
mica-svm | 99.69±00.67 | 99.88±00.64 | 99.44±01.38 |
svm | 91.16±02.58 | 89.53±04.53 | 93.42±04.57 |
pca-svm | 90.76±02.65 | 89.18±04.60 | 92.94±04.76 |
ica-svm | 61.43±08.54 | 78.75±23.15 | 41.09±28.88 |
nmf-svm | 71.03± 07.27 | 88.48±07.17 | 49.84±19.33 |
pca-lda | 90.47±03.46 | 89.46±04.81 | 91.87±05.76 |
Glioma | |||
mica-svm | 98.76±02.03 | 98.89±02.90 | 98.82±02.89 |
svm | 74.00±07.51 | 68.19±12.71 | 79.45±11.30 |
pca-svm | 72.60±06.81 | 69.05±14.38 | 76.25±11.69 |
ica-svm | 47.20±08.79 | 25.24±29.21 | 69.61±29.55 |
nmf-svm | 74.40±08.04 | 74.53±11.10 | 74.19±13.53 |
pca-lda | 73.96±07.02 | 68.38±12.41 | 79.18±12.39 |
HCC | |||
mica-svm | 98.30±02.30 | 99.23±02.02 | 96.97±06.05 |
svm | 61.53±07.75 | 75.04±12.40 | 37.15±18.27 |
pca-svm | 60.93±07.90 | 72.82±14.19 | 39.53±17.70 |
ica-svm | 58.73±07.29 | 72.37±12.51 | 29.72±15.93 |
nmf-svm | 61.30±08.91 | 71.17±13.47 | 43.47±16.67 |
pca-lda | 61.07±07.58 | 74.15±12.40 | 37.15±17.08 |
Breast_2 | |||
mica-svm | 97.23±03.20 | 97.79±03.90 | 96.93±05.17 |
svm | 63.04±05.48 | 65.81±11.20 | 61.59±13.17 |
pca-svm | 62.29±05.54 | 66.86±12.09 | 59.00±13.72 |
ica-svm | 62.27±05.59 | 67.39±11.51 | 58.28±13.81 |
nmf-svm | 62.77±06.60 | 66.92±10.68 | 59.57±13.69 |
pca-lda | 62.54±05.48 | 66.94±11.90 | 59.39±13.25 |
Algorithm average performance comparisons (10-fold CV)
Dataset | Avg. classification rate ± std (%) | Avg. sensitivity ± std (%) | Avg. specificity ± std (%) |
---|---|---|---|
Stroma | |||
mica-svm | 98.00 ± 06.32 | 100.0 ± 00.00 | 95.00 ± 15.81 |
pca-lda | 71.83 ± 11.15 | 94.17 ± 12.45 | 15.00 ± 33.75 |
svm | 74.83 ± 19.76 | 90.83 ± 14.93 | 35.00 ± 47.43 |
pca-svm | 71.00 ± 16.47 | 91.67 ± 18.00 | 15.00 ± 33.75 |
Breast_1 | |||
mica-svm | 99.52 ± 01.51 | 100.0 ± 00.00 | 98.00 ± 06.32 |
pca-lda | 88.51 ± 06.10 | 90.88 ± 07.60 | 81.00 ± 15.56 |
svm | 87.49 ± 06.85 | 91.91 ± 08.37 | 75.00 ± 22.62 |
pca-svm | 88.00 ± 04.99 | 91.47 ± 05.81 | 77.00 ± 15.27 |
Prostate | |||
mica-svm | 100.0 ± 00.00 | 100.0 ± 00.00 | 100.0 ± 00.00 |
pca-lda | 93.29 ± 05.50 | 90.71 ± 08.74 | 96.67 ± 07.03 |
svm | 94.12 ± 05.84 | 92.32 ± 06.63 | 96.33 ± 07.77 |
pca-svm | 93.35 ± 05.48 | 92.32 ± 08.87 | 95.00 ± 08.05 |
Glioma | |||
mica-svm | 100.0 ± 00.00 | 100.0 ± 00.00 | 100.0 ± 00.00 |
pca-lda | 76.33 ± 18.93 | 68.33 ± 36.39 | 81.67 ± 19.95 |
svm | 75.67 ± 19.82 | 66.67 ± 33.33 | 81.67 ± 19.95 |
pca-svm | 78.00 ± 17.98 | 68.33 ± 27.72 | 86.67 ± 17.21 |
HCC | |||
mica-svm | 100.0 ± 00.00 | 100.0 ± 00.00 | 100.0 ± 00.00 |
pca-lda | 68.33 ± 14.59 | 80.00 ± 15.81 | 45.00 ± 43.78 |
svm | 71.67 ± 15.81 | 82.50 ± 16.87 | 50.00 ± 33.33 |
pca-svm | 63.33 ± 17.21 | 77.50 ± 14.19 | 35.00 ± 33.75 |
Breast_2 | |||
mica-svm | 99.00 ± 03.16 | 100.0 ± 00.00 | 98.00 ± 06.32 |
pca-lda | 62.77 ± 20.39 | 59.33 ± 19.74 | 66.50 ± 28.87 |
svm | 67.61 ± 17.41 | 66.33 ± 25.26 | 69.00 ± 29.89 |
pca-svm | 62.94 ± 13.09 | 60.67 ± 23.19 | 65.00 ± 22.36 |
Multi-resolution independent component analysis based linear discriminant analysis
Optimal level threshold selections
A remaining question is how to determine the optimal level threshold in MICA so that the following SVM classifier achieves best performance. We employ the condition number of the independent component matrix Z in MICA to resolve it, where S_{max} and S_{min} are the maximum and minimum singular values of the matrix Z calculated from MICA. A smaller condition number indicates a more stable matrix that suggests a better status in global and local feature capturing. The level-threshold is counted ‘optimal’ if the condition number δ is the smallest. If the condition numbers from two level thresholds are same numerically, the lower level threshold (which is required to be > 1) is counted as the optimal one. For example, the smallest δ value is achieved at τ = 6 and τ = 7,8,9,10,11 respectively on the HCC data. We choose τ = 6 as the optimal threshold which is corresponding to the best average the average classification rate: 98.77% (STD: 2.26%) with average sensitivity: 99.44% (±2.11%) and specificity are 97.59% (±4.97%) respectively.
Although only wavelet ‘db8’ is employed in our experiments, there is no other specific requirement in MICA-SVM for a wavelet except it should be orthogonal. To compare effects of different wavelet selections on the algorithm performance, we select four family wavelets: ‘db8’, ‘sym8’, ‘coif4’, and ‘bior4.4’, in the classifications on the six profiles at the level threshold τ = 3. It seems that there is no obvious classification advantage from one wavelet over the other under the 10-fold CV, because the robust prior knowledge and less number of trials may have larger impact factors on the algorithm performance than a wavelet selection. However, we have found that the wavelet ‘db8’ show some advantages over the others under the 100 trials of 50% HOCV. In addition, it is interesting to see that the wavelets ‘coif4’ and ‘sym8’ have almost same-level performance, but the wavelet ‘bior4.4’ has a relatively low performance for the six profiles.
We further demonstrate the superiority of MICA-SVM by comparing it with three state-of-the-art partial least square (PLS) based regression methods, which can be found in the Additional file 3. Moreover, we present a novel algorithm stability analysis for the seven classifications and show the advantages of the MICA-SVM and MICA-LDA algorithms over the others (Please see the Additional file 4 for details).
MICA-based biomarker discovery
Three biomarkers discovered for the stroma data
Gene | Description | Bayes Factors | SVM-rates | MICA-coefficients |
---|---|---|---|---|
USP46 | It belongs to a large family of cysteine proteases that function as deubiquitinating enzymes. | 0.0093 | 0.8936 | 63.1453 |
FOSL2 | It encodes leucine zipper proteins that can dimerize with proteins of the JUN family, thereby forming the transcription factor complex AP-1. | 0.0418 | 0.8085 | 79.8313 |
RPL5 | It encodes a ribosomal protein that catalyzes protein synthesis. It can lower MDM2 and prevent preventing p53 ubiquitination and increase its transcriptional activity. | 0.5056 | 0.5957 | 81.8651 |
Discussion
It is worthy to note that independent component analysis is a necessary step to achieve a good classification performance. A similar multi-resolution principal component analysis based SVM algorithm is not able to reach comparable performance as our algorithm because of the loss of statistical independence in the feature selection. Also, MICA-SVM encounters overfitting as SVM, PCA-SVM, ICA-SVM classifiers under the standard Gaussian kernel (‘rbf’), where each learning machine can only recognize the majority type samples of the training data in classification despite the testing sample type. Moreover, we have tried kernel ICA [23] based support vector machines (KICA-SVM) in our experiments in addition to the previous nine comparison algorithms. However, The KICA-SVM classifier generally has a lower performance level than the standard SVM classifier. Furthermore, the KICA-SVM not only shows a strong instability in classification but also inevitably encounters overfitting under the standard Gaussian kernel like the other learning machines. It seems to suggest that kernel based data reduction may not be a desirable approach in effective feature selection for high dimensional heterogeneous gene profiles. Similar results can be also found in kernel PCA [24] based support vector machine (KPCA-SVM) classifications: a KPCA-SVM classifier is essentially the PCA-SVM classifier when its two kernels are selected as ‘linear’, otherwise, it encounters overfitting under the standard Gaussian kernel. In our ongoing project, in addition to further polishing our algorithm by comparing them with other state-of-the-art methods (e.g., SVM-RFE [2]), we are interested in theoretically validating the MICA-SVM‘s advantages over the classic SVM classifier from the viewpoint of Vapnik–Chervonenkis (VC) dimension theory [10].
Conclusions
In this study, we present a novel multi-resolution feature selection algorithm: multi-resolution independent component analysis for effective feature selection for high-dimensional heterogeneous gene expression profiles, propose a high-performance MICA-SVM classification algorithm, and demonstrate its superiority and stability by comparing it with the nine state-of-the-art algorithms. Our algorithm not only consistently demonstrates the high-accuracy or clinical-level cancer diagnosis by treating an input profile a whole biomarker but also shows effectiveness in meaningful biomarker discovery. It suggests a great potential to facilitate high-throughput microarray technology into a clinical routine, especially, current classification methods have relative low even poor performance on the gene expression data. In addition, the multi-resolution data analysis based redundant global feature suppressing and effective local feature extraction will have a positive impact on large scale ‘omics’ data mining. In our future work, we plan to further explore MICA-SVM’s potential in other platform gene expression data, SNP, and protein expression data classification.
Declarations
Acknowledgements
The authors want to thank three anonymous reviewers for their valuable comments in improving this manuscript.
This article has been published as part of BMC Bioinformatics Volume 12 Supplement 1, 2011: Selected articles from the Ninth Asia Pacific Bioinformatics Conference (APBC 2011). The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/12?issue=S1.
Authors’ Affiliations
References
- Wang Y, Klijn J, Zhang , Atkins , Foeken J: Gene expression profiles and prognostic markers for primary breast cancer. Methods Mol Biol 2007, 377: 131–138. full_textView ArticlePubMed
- Zhou X, Tuc D: MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data. Bioinformatics 2007, 23(9):1106–1114. 10.1093/bioinformatics/btm036View ArticlePubMed
- Jolliffe I: Principal component analysis. Springer Series in Statistics, 2nd ed., Springer, New York; 2002.
- Hyvärinen A: Fast and robust fixed-point algorithms for independent component analysis. IEEE Transactions on Neural Networks 1999, 10(3):626–634. 10.1109/72.761722View ArticlePubMed
- Lee D, Seung H: Learning the parts of objects by non-negative matrix factorization. Nature 1999, 401: 788–791. 10.1038/44565View ArticlePubMed
- Brunet J, Tamayo P, Golub T, Mesirov J: Molecular pattern discovery using matrix factorization. Proc Natl Acad Sci U S A 2004, 101(12):4164–4169. 10.1073/pnas.0308531101PubMed CentralView ArticlePubMed
- Gao Y, Church G: Improving molecular cancer class discovery through sparse nonnegative matrix factorization. Bioinformatics 2005, 21(21):3970–3975. 10.1093/bioinformatics/bti653View ArticlePubMed
- Han X: Nonnegative Principal component Analysis for Cancer Molecular Pattern Discovery. IEEE/ACM Trans Comput Biol Bioinform 2010, 7(3):537–549. 10.1109/TCBB.2009.36View ArticlePubMed
- Mallat S: A wavelet tour of signal processing. Acad. Press, San Diego; 1999.
- Vapnik V: Statistical Learning Theory. John Wiley & Son, Inc., New York; 1998.
- Boersma BJ, Reimers M, Yi M, Ludwig J, et al.: A stromal gene signature associated with inflammatory breast cancer. Int J Cancer 2008, 15(122(6)):1324–1332.
- Wang Y, Klijn JG, Zhang Y, Sieuwerts AM, Look MP, et al.: Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 2005, 25(365(9460)):671–679.View Article
- Singh D, Febbo P, Ross K, Jackson D, Manola J, Ladd C, et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 2002, 1(2):203–209. 10.1016/S1535-6108(02)00030-2View ArticlePubMed
- Nutt CL, Mani D, Betensky R, Tamayo P, Cairncross J, et al.: Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Research 2003, 63(7):1602–1607.PubMed
- Iizuka N, Oka M, Yamada-Okabe H, Nishida M, Maeda Y, et al.: Oligonucleotide microarray for prediction of early intrahepatic recurrence of hepatocellular carcinoma after curative resection. Lancet 2003, 361: 923–929. 10.1016/S0140-6736(03)12775-4View ArticlePubMed
- van’t Veer L, Dai H, Van De Vijver M, He Y, et al.: Gene Expression Profiling Predicts Clinical Outcome of Breast Cancer. Nature 2002, 415: 530–536. 10.1038/415530aView Article
- Martinez A, Kak A: PCA versus LDA. IEEE Transactions on Pattern Analysis and Machine Intelligence 2001, 23(2):228–233. 10.1109/34.908974View Article
- Holtkamp N, Ziegenhagen N, Malzer E, Hartman C, Giese A, et al.: Characterization of the amplicon on chromosomal segment 4q12 in glioblastoma multiforme. Neuro Oncol 2007, 9(3):291–297. 10.1215/15228517-2007-009PubMed CentralView ArticlePubMed
- Milde-Langosch K, Janke S, Wagner I, Schroder C, Streichert T, et al.: Role of Fra-2 in breast cancer: influence on tumor cell invasion and motility. Breast Cancer Res Treat 2008, 107(3):337–47. 10.1007/s10549-007-9559-yView ArticlePubMed
- Langer S, Singer CF, Hudelist G, Dampier B, Kaserer K, et al.: Jun and Fos family protein expression in human breast cancer: correlation of protein expression and clinicopathological parameters. Eur J Gynaecol Oncol 2006, 27(4):345–52.PubMed
- Yu K, Lee C, Tan PH, Tan P: Conservation of Breast Cancer Molecular Subtypes and Transcriptional Patterns of Tumor Progression Across Distinct Ethnic Populations. Clinical Cancer Research 2004, 10: 5508–5517. 10.1158/1078-0432.CCR-04-0085View ArticlePubMed
- Lacroix M, Toillon R, Leclercq G: p53 and breast cancer, an update. Endocrine-Related Cancer 2006, 13(2):293–325. 10.1677/erc.1.01172View ArticlePubMed
- Bach F, Jordan M: Kernel independent component analysis. Journal of Machine Learning and Research 2002, 3: 1–48. 10.1162/153244303768966085
- Schölkopf B, Smola A, Müller K: Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 1998, 10: 1299–1319. 10.1162/089976698300017467View Article
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.