Using the information embedded in the testing sample to break the limits caused by the small sample size in microarray-based classification
- Manli Zhu^{1} and
- Aleix M Martinez^{1, 2}Email author
https://doi.org/10.1186/1471-2105-9-280
© Zhu and Martinez; licensee BioMed Central Ltd. 2008
Received: 19 December 2007
Accepted: 14 June 2008
Published: 14 June 2008
Abstract
Background
Microarray-based tumor classification is characterized by a very large number of features (genes) and small number of samples. In such cases, statistical techniques cannot determine which genes are correlated to each tumor type. A popular solution is the use of a subset of pre-specified genes. However, molecular variations are generally correlated to a large number of genes. A gene that is not correlated to some disease may, by combination with other genes, express itself.
Results
In this paper, we propose a new classiification strategy that can reduce the effect of over-fitting without the need to pre-select a small subset of genes. Our solution works by taking advantage of the information embedded in the testing samples. We note that a well-defined classification algorithm works best when the data is properly labeled. Hence, our classification algorithm will discriminate all samples best when the testing sample is assumed to belong to the correct class. We compare our solution with several well-known alternatives for tumor classification on a variety of publicly available data-sets. Our approach consistently leads to better classification results.
Conclusion
Studies indicate that thousands of samples may be required to extract useful statistical information from microarray data. Herein, it is shown that this problem can be circumvented by using the information embedded in the testing samples.
Keywords
Background
The emergence of modern experimental technologies, such as DNA microarray, facilitates research in cancer classification. DNA microarrays offer scientist the ability to monitor the expression patterns of thousands of genes simultaneously, allowing them to study how these function and how they act under different conditions. This can lead to a more complete understanding of molecular variations, in addition to morphologic variations among tumors. A large number of studies have used microarrays to analyze the gene expression for breast cancer, leukemia, colon tissue, and others, demonstrating the potential power of microarray in tumor classification [1–7].
An important open problem in the analysis of gene expression data is the design of statistical tools that can cope with a large number of gene expression values per experiment (usually thousands or tens of thousands) and a relatively small number of samples (a few dozen). This imbalance between number of genes and samples, generally results in over-fitting [8], i.e., the problem where one can easily find a decision boundary which separates the training samples perfectly while performing poorly on independent testing feature vectors [9]. This problem has been cited as a major deterrent for the successful use of microarrays technology in prognosis and diagnosis in cancer research [8, 10, 11].
Over-fitting
Over-fitting can be solved by collecting more samples, but recent results predict hundreds, if not thousands, of samples would be necessary to resolve this issue [15, 16]. Unfortunately, in many studies, such a large number of samples is prohibitive, be it due to cost (time, economical) or limited access to patients in rarely occurring cancers.
The most common strategy to overcome these difficulties and avoid over-fitting is to reduce the dimensionality of the original space by choosing a subset of genes that can (theoretically) discriminate tumor tissue from normal; i.e., pre-selection of genes. These pre-selected genes may have explicitly biological meaning or implications in the molecular mechanism of the tumorigenesis [17, 18]. Their objective is to increase the classification accuracy, decrease the computation cost of the classifier and clarify the biological interpretation of cancers. A variety of gene selection algorithms have been proposed for this purpose [1, 14, 17, 18].
Unfortunately, a method for pre-selecting genes that works well on one data-set, will not generally work as expected on another [19]. Further, the results are many times unstable due to the limited amount of data used in pre-determining such a pool of genes [11]. Hence, the results can be biased toward the characteristics of our available data or, even, toward the way this data was collected [20]. This is one of the reasons why biomarkers (genetic markers) and other selection mechanisms do not always generalize to novel experiments [8]. To determine the (complex, underlying) biological process involved in the likelihood of developing a certain cancer, it is necessary to study the relation of each individual gene as well as their combinations, because when combined with others a gene can express itself.
Several methods, such as maximum likelihood [21] (ML), weighted voting [1] (WV), k-nearest neighbor [21] (k NN), Fisher's Linear Discriminant Analysis [13] (LDA) and Support Vector Machines [22, 23] (SVM) are, in principle, capable of dealing with a large number of genes (features), and many are known to generalize to new samples when the training set is very large [24]. However, when the number of features is very large and the number of samples small, these methods cannot avoid the over-fitting problem [9]. It remains a key open problem to define classification strategies that can be applied to a large number of genes while aiming to relieve the influence of over-fitting.
Current methods
Discriminant algorithms for tumor classification using microarray data were cited above. These correspond to the following.
k Nearest Neighbor (k NN) [21]
In many instances, it is reasonable to assume that observations which are close to each other in the feature space (under some appropriate metric) belong to the same class. The nearest neighbor (NN) rule is the simplest non-parametric decision procedure to adopt this form.
C(x) denotes the class label of the feature vector x, and d(·,·) is a distance measurement. Generally, the Euclidean distance is used and (hence) was the one considered in this paper. Notice that this NN-rule only uses the nearest neighbor for classification, while ignoring the remaining pre-labeled data points. If the number of pre-classified points is large, it makes sense to use the majority vote of the nearest k neighbors. This method is referred to as the k NN rule, and is attractive because it is known to generalize well [24].
Weighted Voting (WV) [1]
Fisher's Linear Discriminant Analysis (LDA) [13]
Support Vector Machines (SVM) [23]
The weight vector $a={\displaystyle {\sum}_{i=1}^{n}{\alpha}_{i}{y}_{i}{x}_{i}}$ is a linear combination of the training patterns.
Maximum Likelihood (ML)
Results
We first derive the details of the proposed approach and present each of the algorithm items. Extensive experimental validation is then presented in the testing section.
Algorithm
The key idea used in this paper, is to take advantage of the discriminant information embedded in the testing sample. Rather than looking for its closest match amongst all the training samples, we can use the information of the testing sample to improve the classification process, e.g., to find a better discriminant space in LDA.
The reason why classifiers built on training data generally work poorly on testing data is because the distribution of the training samples does not generally represent that of the testing [9]. In such cases, independent testing samples are treated as passive objects; i.e., it is assumed that the (discriminant) information encoded in the testing sample cannot be used because its class is unknown. Here, we show that it is actually possible to take advantage of the information embedded in the testing sample, changing the role of the testing sample from passive to active. We will accomplish this by assigning the test sample to each of the possible classes and then determining which of these "assignments" is the correct one. As mentioned above, this is possible because a discriminant approach will generally work best when the test sample is assumed to belong to the correct class. Earlier, we used intuitive argumentation to show this. We will now prove this result formally within the LDA framework, which will be used through out this paper as an illustrative solution (although our solution can be extended to work with other classifiers).
Discriminant power
DP measures how well the classes are separated in the subspace spanned by LDA's solution, v. Therefore, the larger the value of DP is, the better.
To better understand the role of the DP score, let us look back at feature extraction. A classical approach used by researchers to perform dimensionality reduction is the well-known Principal Components Analysis (PCA) algorithm. PCA is concerned with the selection of that linear combination of features (from the original feature representation) which carries most of the data (co)variance. This is readily accomplished by finding the eigenvectors of the covariance matrix Σ_{ X }, i.e., Σ_{ X }V = ΛV, where the columns in V are the eigenvectors and Λ is the diagonal matrix of corresponding eigenvalues. Σ_{ X }is, in effect, the metric we have decided to maximize.
Linear Discriminant Analysis (LDA) is in fact an extension of PCA. In LDA, one has two metrics, A and B. The first metric calculates within-class variances, the second is concerned with between-class variations. Thus, in LDA, the goal is to minimize the metric given by A while maximizing that given by B, e.g., A = Σ_{ X }and B = S_{ B }. This is then equivalent to the following eigenvalue problem A^{-1}BV = ΛV.
Unfortunately, this method does not work well when the two metrics disagree [25], that is, when the solution favored by the first metric A, does not agree with that of the second metric B. In this case, we say that the two metrics are in conflict. Under such circumstances, knowing which of the two metrics is right turns into a guessing game. Taking an average would even be worse, because generally one of the two metrics is correct [26].
Hence, our next goal is to determine which of the classes, where our test sample can be assigned, will provide the smallest conflict, that is, the largest discriminant score DP. That we will show how to efficiently do next.
Class fitting
In our framework, we first assign the test feature vector x to class i and then use LDA to obtain the discriminant subspaces v_{ i }, i = 1,...,C. The discriminant power indices DP_{ i }can be computed using (1).
This will indicate how well the data is separated when the test feature vector x is assumed to belong to class i. When x is assumed to belong to an incorrect class, LDA will find it difficult to discern that from the other samples, and DP_{ i }will be small. When the test sample is however assigned to the correct class, LDA will find it easier to discriminate between classes and the discriminant value (1) will increase. This means that our approach should reduce to assigning the test sample x to that class providing the maximum discriminant power when x is assigned to it. Unfortunately, this is not possible, because when the number of genes (features) p is much larger than the number of samples n, the value of DP_{ i }is always 1 regardless of the value of the parameter i. This is formally stated in the following.
Theorem 1. Let the number of features (e.g., genes) be $p\ge \frac{n-C}{C-1}$, where n is the number of samples, and C the number of classes. Then, the discriminant power DP for LDA's solution is always equal to one, DP= 1.
Final classification
The result above had a purpose beyond that of showing that the discriminant power defined in (1) is inappropriate when n ≤ p + 1. It is illustrative of the reasons why. First, note that all DP_{ i }are equal to one when n ≤ p + 1, because, in such cases, the projection of each individual class covariance matrix onto the one-dimensional solution found by LDA is always zero. In fact, this is possible because there is always a one-dimensional subspace where all the samples of the same class collapsed onto a single point. This subspaces is the intersection of the null spaces of every class covariance matrix, and was illustrated in Figs. 1 and 2(d)–(e).
Nonetheless, since the projected class covariance matrices are zero, the between-class variance itself provides the appropriate measure of separability. We thus denote the distance between classes as that defined by the projected between-class scatter, v^{ T }S_{ B }v.
where ${S}_{{B}_{k}}$ is the between-class scatter matrix obtained with all the training samples plus the testing sample x.
The lower-performance problem in between-class classification [7] is herein solved by taking advantage of the information embedded in the testing feature vector.
Testing
We have used a variety of databases to validate the algorithm and our claims. This will also serve to prove the superior performance of the proposed approach when compared to the state of the art.
Description of the data-sets
Breast cancer (BRCA1 and 2)
[4] present a database of human breast cancer with samples generated from 22 primary human breast tumors (7 BRCA1-mutation-positive, 8 BRCA2-mutation-positive and 7 samples from patients with none of the two gene mutations). The interest of the experiment is in determining whether hereditary breast cancers could be classified based solely on their gene-expression profile. The 22 samples are grouped in two ways. The first grouping labels the 22 tumor samples according to BRCA 1 mutation status (positive or negative), and the second grouping labels the samples according to BRCA 2 mutation status (positive or negative). There is a total of 3226 genes in this data-set.
PROS
This data set is developed to investigate whether gene expression difference is helpful to distinguish prostate cancers with common clinical and pathological features [27]. A total of 102 samples (50 normal and 52 prostate tumor) are included and each sample consists of expression values for 12600 genes. We have normalized the expression levels to a maximum value of 16, 000 and a minimum of 10 to eliminate outliers. The variation filter is then used to exclude genes showing small variation across samples. A 5-fold change variation (Max/Min) and absolute variation of 50 (Max-Min) is applied.
PROS-OUT
This data-set is to analyze whether the gene expression data alone can accurately predict patient outcome after prostatectomy [27]. Samples from 21 patients are evaluated with regard to recurrence following surgery. Eight patients had relapsed and thirteen patients did not for a period of 4 years after the surgical procedure. The same processing steps as PROS is used.
Lymphoma
Diffuse large B-cell lymphoma (DLBCL) is the most common lymphoid malignancy in adults, curable in less than 50% of patients. This data-set is constructed from a related germinal center B-cell, follicular lymphoma (FL) [5]. In DLBCL-FL, the microarray contains gene expression profiles for 77 patients (58 with DLBCL and 19 with FL) for a total of 6817 genes. Accepting the suggestion of [5], we use the value of 16, 000 as a ceiling and 20 as the lower threshold for the expression levels. The variation filter is used to exclude genes showing small variation across samples. Two types of variations are used here: fold-change and absolute variation, which correspond to max/min and max - min, respectively; where max and min refer to the maximum and minimum value of expression level for each particular gene across all samples. In particular, we used max/min < 3 and max - min < 100.
Leukemia
[1] define a data-set for the study of two types of acute leukemia – acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML). The microarrays contain 6817 genes. The data used in this paper consists of 38 bone marrow samples (27 ALL and 11 AML). The leave-one-out cross-validation test was used on this set. The same filtering procedure defined above was employed.
I2000
This data-set contains a total of 2000 gene expressions of 40 tumor and 22 normal colon tissue samples [3]. Following the suggestion of [4], we employed the following preprocessing: 1) compute the median of each array (an array corresponds to a specimen); 2) determine the median of the medians computed in step 1, which is labelled M; 3) for a given array, add or subtract an appropriate constant to each expression value to re-center the median of the array to be that given by M; 4) log-transform the entire data-set to make the data more Gaussian distributed.
Experimental results
Comparison of the results obtained with different classifiers in a variety of data-sets.
Data-set | genes | samples | DP | k NN | WV | LDA | SVM | ML-s | ML-d |
---|---|---|---|---|---|---|---|---|---|
BRCA1 | 3226 | 7 BRCA1-positive | 21/22 | 18/22 (1) | 18/22 | 18/22 | 18/22 | 19/22 | 16/22 |
15 BRCA1-negative | |||||||||
BRCA2 | 3226 | 8 BRCA2-positive | 21/22 | 21/22 (1) | 17/22 | 19/22 | 18/22 | 17/22 | 17/22 |
14 BRCA2-negative | |||||||||
PROS | 12600 | 52 tumor tissue | 93/102 | 90/102 (5) | 61/102 | 92/102 | 93/102 | 64/102 | 50/102 |
50 normal tissue | |||||||||
PROS-OUT | 12625 | 8 non-recurrence | 15/21 | 12/21 (1) | 12/21 | 13/21 | 14/21 | 13/21 | 13/21 |
13 recurrence | |||||||||
DLBCL-FL | 6817 | 52 DLBCL | 74/77 | 71/77 (7) | 63/77 | 74/77 | 74/77 | 65/77 | 58/77 |
25 FL | |||||||||
ALL-AML | 6817 | 27 AML | 38/38 | 37/38 (3) | 38/38 | 38/38 | 38/38 | 30/38 | 27/38 |
11 ALL | |||||||||
I-2000 | 2000 | 40 tumor colon tissue | 61/62 | 59/62 (3) | 58/62 | 61/62 | 61/62 | 59/62 | 58/62 |
22 normal colon tissue |
Classification accuracy of the proposed algorithm and alternatives on two subsets of the data in the leave-one-out test.
BRCA1 | BRCA2 | PROS | PROS-OUT | |||||
---|---|---|---|---|---|---|---|---|
DP | Others | DP | Others | DP | Others | DP | Others | |
More correlated | 18/18 | 16.67/18 | 17/17 | 16.5/17 | 59/60 | 53.3/60 | 12/15 | 11.83/15 |
Less correlated | 3/4 | 1.67/4 | 4/5 | 1.67/5 | 34/41 | 21.67/41 | 3/6 | 0.9/6 |
DLBCL-FL | ALL-AML | I2000 | ||||||
DP | Others | DP | Others | DP | Others | |||
More correlated | 62/62 | 58.33/62 | 38/38 | 34.67/38 | 58/58 | 57.83/58 | ||
Less correlated | 12/15 | 9.17/15 | 0/0 | 0/0 | 3/4 | 1.5/4 |
The experimental results reported thus far used real datasets to compare the classification capabilities of the proposed algorithm with those reported in the literature. The differences shown in Tables 1 and 2 are significant, because our method is able to provide the top classification accuracies in all cases. Yet, one may wonder how would our method perform if the availability of samples was larger. To further demonstrate the superiority of the proposed algorithm with those given in the literature, we now show an experimental comparison using synthetic datasets.
The above result is however quite simplistic, because the samples in each distribution were distributed according to a single Gaussian distribution. A more realistic scenario in bioinformatics is that where the samples in each class are generated by a mixture of Gaussians. To test this other case, we randomly generated 50 samples corresponding to two different classes. Each class was now defined by a mixture of four Gaussians, with their means and covariances randomly selected as above. The average results over a total of 100 runs are shon in Fig. 4(b). Again, we see that the proposed DP algorithm outperforms the others. Most importantly though, it is clear from Fig. 4(a–b) that the proposed algorithm is not very sensitive to an increase on the number of dimensions. This is a very important feature in studies of bioinformatics and further demonstrates the superiority of the DP algorithm over the state of the art.
Discussion
Analyzing data from small sample size sets is a recurring problem in biology. This is generally due to the limited amount of data available or to the difficulty or costs associated to obtaining additional data. Studies indicate that hundreds or thousands of samples would be required to extract useful statistical information from our data sets [15, 16]. Hence, innovative statistical methods like the one presented in this paper are of great relevance in many areas of biology.
This paper has shown derivations of an approach to deal with the small sample size problem within a linear discriminant analysis setting. Our framework can be readily extended to work within other classification approaches. It could also be combined with shrinkage [28], a mechanism to share information between genes, to improve on the analysis of our data. A key point is to realize that (in our framework) it is not necessary to learn the true, underlying distribution of each class. It suffices to find that (part of the) solution necessary to correctly classify the test sample. Part of this information is of course embedded in the test sample, and our approach takes advantage of this. While our results are most applicable to data-sets where the data in each class can be approximated by an underlying distribution, data-drive approaches may be preferred elsewhere. Our framework should then be extended into other algorithms such as non-parametric methods or SVM [23]. Extensions to deal with missing components [29, 30] can also be adapted to our framework. Also, some genome sequences are spherical. In these cases, our approach can be extended to work with spherical classifiers [31].
The approach proposed here can also be applied to many other problems in biology and medicine. For example, in the classification of nuclear magnetic resonance spectra, which is typically used to carry out metabolomics experiments. In this example, classification approaches like the ones describe din this paper are generally used [32]. Another application is in the use of cytotoxic chemotherapeutic drugs that target proliferating signature genes. This approach is generally used to stop further cell division and bring tumors under control. However, these drugs can also damage DNA of normal tissue. Developing solutions that only target those necessary genes is fundamental to the success of such therapies. This will involve the identification of biomarkers of proliferation associated to each of the cancers [33]. These analysis are also characterized by a disproportionate feature to sample ratio, resulting in over-fitting. This is especially true when proliferation is studied over a large number of cancers [34, 35]. In such studies it is almost always necessary to use all the data available to prevent missing useful biomarkers.
Declarations
Acknowledgements
The authors are partially supported by a grant from the National Institutes of Health, R01 DC 005241, and a grant from the National Science Foundation, IIS 0713055. This work was conducted while MZ was at The Ohio State University.
Authors’ Affiliations
References
- Golub T, Slonim D, Tamayo P, Huard C, Gaasenbeek M, Mesirov J, Coller H, Loh M, Downing J, Caligiuri M, Bloomfield C, Lander E: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999, 386: 531–537. 10.1126/science.286.5439.531View ArticleGoogle Scholar
- Pomeroy S, Tamayo P, Gaasenbeek M, Sturla L, Angelo M, McLaughlin M, Kim J, Goumnerova L, Black P, Lau C, Allen J, Zagzag D, Olson J, Curran T, Wetmore C, Biegel J, Poggio T, Mukherjee S, Rifkin R, Califano A, Stolovitzky G, Louis D, Mesirov J, Lander E, Golub T: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 2002, 415: 436–442. 10.1038/415436aView ArticlePubMedGoogle Scholar
- Alon U, Barkai N, Notterman D, Gish K, Ybarra S, Mack D, Levine A: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissue probe by oligonucleotide array. Proc Natl Acad Sci USA 1999, 96: 6745–6750. 10.1073/pnas.96.12.6745PubMed CentralView ArticlePubMedGoogle Scholar
- Radmacher M, Mcshane L, Simon R: A paradigm for class prediction using gene expression profiles. J Comput Biol 2002, 9: 505–511. 10.1089/106652702760138592View ArticlePubMedGoogle Scholar
- Shipp M, Ross K, Tamayo P, Weng A, Kutok J, Aguiar R, Gaasenbeek M, Angelo M, Reich M, Pinkus G, Ray T, Koval M, Last K, Norton A, Lister T, Mesirov J, Neuberg D, Lander E, Aster J, Golub T: Diffuse large b-cell lymphoma outcome prediction by gene expression profiles and supervised machine learning. Nature Medicine 2002, 8: 68–74. 10.1038/nm0102-68View ArticlePubMedGoogle Scholar
- van't Veer L, Dai H, Vijver M, He Y, Hart A, Mao M, Peterse H, Kooy K, Marton M, Witteveen A, Schreiber G, Kerkhoven R, Roberts C, Linsley P, Bernards R, Friend S: Gene expression profiling predicts clinical outcome of breast cancer. Nature 2002, 415: 530–536. 10.1038/415530aView ArticleGoogle Scholar
- Truntzer C, Mercier C, Esteve J, Gautier C, Roy P: Importance of data structure in comparing two dimension reduction methods for classification of microarray gene expression data. BMC Bioinformatics 2007, 8: 90. 10.1186/1471-2105-8-90PubMed CentralView ArticlePubMedGoogle Scholar
- Ransohoff D: Opinion – rules of evidence for cancer molecular-marker discovery and validation. Nat Rev Cancer 2004, 4: 309–314. 10.1038/nrc1322View ArticlePubMedGoogle Scholar
- Martinez A, Kak A: PCA versus LDA. IEEE Trans Pattern Anal Mach Intell 2001, 23(2):228–233. 10.1109/34.908974View ArticleGoogle Scholar
- Abdullah-Sayani A, Bueno-de Mesquita J, Vijver M: Microarray data analysis: from disarray to consolidation and consensus. Nature Clinical Practice Oncology 2006, 3(9):501–516. 10.1038/ncponc0587View ArticlePubMedGoogle Scholar
- S Michiels SK, Hill C: Prediction of cancer outcome with microarrays: A multiple random validation strategy. Lancet 2005, 365: 488–492. 10.1016/S0140-6736(05)17866-0View ArticlePubMedGoogle Scholar
- Efron B: The jackknife, the bootstrap and other resampling plans. Vermont: Soc. for Industrial & Applied Math; 1982.View ArticleGoogle Scholar
- Fisher R: The statistical utilization of multiple measurements. Annals of Eugenics 1938, 8: 376–386.View ArticleGoogle Scholar
- Dudoit S, Fridlyand J, Speed T: Comparison of discriminant methods for the classification of tumor using gene expression data. J Am Stat Assoc 2002, 97: 77–87. 10.1198/016214502753479248View ArticleGoogle Scholar
- Ein-Dor L, Zuk O, Domany E: Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc Natl Acad Sci USA 2006, 103: 5923–5928. 10.1073/pnas.0601231103PubMed CentralView ArticlePubMedGoogle Scholar
- Hua J, Xiong Z, Lowey J, Suh E, Dougherty E: Optimal number of features as a function of sample size for various classification rules. Bioinformatics 2005, 21: 1509–1515. 10.1093/bioinformatics/bti171View ArticlePubMedGoogle Scholar
- Guyon I, Weston J, Barnhill S: Gene selection for cancer classification using support vector machines. Mach Learn 2002, 46: 389–422. 10.1023/A:1012487302797View ArticleGoogle Scholar
- Xiong M, Li W, Zhao J, Jin L, Boerwinkle E: Feature (gene) selection in gene expression-based tumor classfication. Mol Genet Metab 2001, 73: 239–247. 10.1006/mgme.2001.3193View ArticlePubMedGoogle Scholar
- Ntzani E, Loannidis J: Predictive ability of dna microarray for cancer outcome and correlation: an empirical assessment. Lancet 2003, 362: 1439–1444. 10.1016/S0140-6736(03)14686-7View ArticlePubMedGoogle Scholar
- Miron M, Nadon R: Inferential literacy for experimenal high-throughput biology. Trends Genet 2006, 22: 84–89. 10.1016/j.tig.2005.12.001View ArticlePubMedGoogle Scholar
- Devroye L, Gyorfi L, Lugosi G: A Probabilistic Theory of Pattern Recognition. New York: Springer; 1996.View ArticleGoogle Scholar
- Boser B, Guyon I, Vapnik V: A training algorithm for optimal margin classifie. Fifth Annual Workshop on Comp Learn Theory 1992.Google Scholar
- Vapnik V: Statistical Learning Theory. New York: Wiley Interscience; 1998.Google Scholar
- Poggio T, Rifkin R, Mukherjee S, Niyogi P: General conditions for predictivity in learning theory. Nature 2004, 428: 419–422. 10.1038/nature02341View ArticlePubMedGoogle Scholar
- Martinez A, Zhu M: Where are linear feature extraction methods applicable? IEEE Trans Pattern Anal Mach Intell 2005, 27(12):1934–1944. 10.1109/TPAMI.2005.250View ArticlePubMedGoogle Scholar
- Zhu M, Martinez A: Subclass Discriminant Analysis. IEEE Trans Pattern Anal Mach Intell 2006, 28(8):1274–1286. 10.1109/TPAMI.2006.172View ArticlePubMedGoogle Scholar
- Singh D, Febbo P, Ross K, Jackson D, Manola J, Ladd C, Tamayo P, Renshaw A, D'Amico A, Richie J, Lander E, Loda M, Kantoff T, Golub R, Sellers W: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 2002, 1: 203–209. 10.1016/S1535-6108(02)00030-2View ArticlePubMedGoogle Scholar
- Allison D, Cui X, Page G, Sabripour M: Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet 2006, 5: 55–65. 10.1038/nrg1749View ArticleGoogle Scholar
- Chechik G, Heitz G, Elidan G, Abbeel P, Koller D: Max-margin Classification of Data with Absent Features. J Mach Learn Res 2008, 9: 1–21.Google Scholar
- Zhang M, Zhang D, Wells M: Variable selection for large p small n regression models with incomplete data: mapping QTL with epistases. BMC Bioinformatics 2008., 9(25):Google Scholar
- Hamsici O, Martinez A: Spherical-Homoscedastic Distributions: The equivalency of spherical and Normal distributions in classification. J Mach Learn Res 2007, 8: 1583–1623.Google Scholar
- Parsons H, Ludwig C, Gunther U, Viant M: Improved classification accuracy in 1-and 2-dimensional NMR metabolomics data using the variance stabilising generalised logarithm transformation. BMC Bioinformatics 2007, 8: 234. 10.1186/1471-2105-8-234PubMed CentralView ArticlePubMedGoogle Scholar
- Whitfield M, George L, Grant G, Perou C: Common markers of proliferation. Nat Rev Cancer 2006, 6: 99–106. 10.1038/nrc1802View ArticlePubMedGoogle Scholar
- Rhodes D, Yu J, Shanker K, Deshpande N, Varambally R, Ghosh D, Barrette T, Pandey A, Chinnaiyan A: Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. Proc Natl Acad Sci USA 2004, 101: 9309–9314. 10.1073/pnas.0401994101PubMed CentralView ArticlePubMedGoogle Scholar
- Villanueva J, Shaffer D, Philip J, Chaparro C, Erdjument-Bromage H, Olshen A, Fleisher M, Lilja H, Brogi E, Boyd J, Sanchez-Carbayo M, Holland E, Cordon-Cardo C, Scher H, Tempst P: Differential exoprotease activities confer tumor-specific serum peptidome patterns. J Clin Invest 2006, 116: 271–284. 10.1172/JCI26022PubMed CentralView ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.