- Methodology article
- Open Access
L_{2}-norm multiple kernel learning and its application to biomedical data fusion
- Shi Yu^{1}Email author,
- Tillmann Falck^{2},
- Anneleen Daemen^{1},
- Leon-Charles Tranchevent^{1},
- Johan AK Suykens^{2},
- Bart De Moor^{1} and
- Yves Moreau^{1}
https://doi.org/10.1186/1471-2105-11-309
© Yu et al; licensee BioMed Central Ltd. 2010
- Received: 14 January 2010
- Accepted: 8 June 2010
- Published: 8 June 2010
Abstract
Background
This paper introduces the notion of optimizing different norms in the dual problem of support vector machines with multiple kernels. The selection of norms yields different extensions of multiple kernel learning (MKL) such as L_{∞}, L_{1}, and L_{2} MKL. In particular, L_{2} MKL is a novel method that leads to non-sparse optimal kernel coefficients, which is different from the sparse kernel coefficients optimized by the existing L_{∞} MKL method. In real biomedical applications, L_{2} MKL may have more advantages over sparse integration method for thoroughly combining complementary information in heterogeneous data sources.
Results
We provide a theoretical analysis of the relationship between the L_{2} optimization of kernels in the dual problem with the L_{2} coefficient regularization in the primal problem. Understanding the dual L_{2} problem grants a unified view on MKL and enables us to extend the L_{2} method to a wide range of machine learning problems. We implement L_{2} MKL for ranking and classification problems and compare its performance with the sparse L_{∞} and the averaging L_{1} MKL methods. The experiments are carried out on six real biomedical data sets and two large scale UCI data sets. L_{2} MKL yields better performance on most of the benchmark data sets. In particular, we propose a novel L_{2} MKL least squares support vector machine (LSSVM) algorithm, which is shown to be an efficient and promising classifier for large scale data sets processing.
Conclusions
This paper extends the statistical framework of genomic data fusion based on MKL. Allowing non-sparse weights on the data sources is an attractive option in settings where we believe most data sources to be relevant to the problem at hand and want to avoid a "winner-takes-all" effect seen in L_{∞} MKL, which can be detrimental to the performance in prospective studies. The notion of optimizing L_{2} kernels can be straightforwardly extended to ranking, classification, regression, and clustering algorithms. To tackle the computational burden of MKL, this paper proposes several novel LSSVM based MKL algorithms. Systematic comparison on real data sets shows that LSSVM MKL has comparable performance as the conventional SVM MKL algorithms. Moreover, large scale numerical experiments indicate that when cast as semi-infinite programming, LSSVM MKL can be solved more efficiently than SVM MKL.
Availability
The MATLAB code of algorithms implemented in this paper is downloadable from http://homes.esat.kuleuven.be/~sistawww/bioi/syu/l2lssvm.html.
Keywords
- Little Square Support Vector Machine
- Multiple Kernel
- Multiple Kernel Learning
- Second Order Cone Programming
- Gene Prioritization
Background
In the era of information overflow, data mining and machine learning are indispensable tools to retrieve information and knowledge from data. The idea of incorporating several data sources in analysis may be beneficial by reducing the noise, as well as by improving statistical significance and leveraging the interactions and correlations between data sources to obtain more refined and higher-level information [1], which is known as data fusion. In bioinformatics, considerable effort has been devoted to genomic data fusion, which is an emerging topic pertaining to a lot of applications. At present, terabytes of data are generated by high-throughput techniques at an increasing rate. In data fusion, these terabytes are further multiplied by the number of data sources or the number of species. A statistical model describing this data is therefore not an easy matter. To tackle this challenge, it is rather effective to consider the data as being generated by a complex and unknown black box with the goal of finding a function or an algorithm that operates on an input to predict the output. About 15 years ago, Vapnik [2] introduced the support vector method which makes use of kernel functions. This method has offered plenty of opportunities to solve complicated problems but also brought lots of interdisciplinary challenges in statistics, optimization theory, and the applications therein [3].
Multiple kernel learning (MKL) has been pioneered by Lanckriet et al. [4] and Bach et al. [5] as an additive extension of single kernel SVM to incorporate multiple kernels in classification. It has also been applied as a statistical learning framework for genomic data fusion [6] and many other applications [7]. The essence of MKL, which is the additive extension of the dual problem, relies only on the kernel representation (kernel trick) while the heterogeneities of data sources are resolved by transforming different data structures (i.e., vectors, strings, trees, graphs, etc.) into kernel matrices. In the dual problem, these kernels are combined into a single kernel, moreover, the coefficients of the kernels are leveraged adaptively to optimize the algorithmic objective, known as kernel fusion. The notion of kernel fusion was originally proposed to solve classification problems in computational biology, but recent efforts have lead to analogous solutions for one class [7] and unsupervised learning problems (Yu et al.: Optimized data fusion for kernel K-means clustering, submitted). Currently, most of the existing MKL methods are based on the formulation proposed by Lanckriet et al. [4], which is clarified in our paper as the optimization of the infinity norm (L_{∞}) of kernel fusion. Optimizing L_{∞} MKL in the dual problem corresponds to posing L_{1} regularization on the kernel coefficients in the primal problem. As known, L_{1} regularization is characterized by the sparseness of the kernel coefficients [8]. Thus, the solution obtained by L_{∞} MKL is also sparse, which assigns dominant coefficients to only one or two kernels. The sparseness is useful to distinguish relevant sources from a large number of irrelevant data sources. However, in biomedical applications, there are usually a small number of sources and most of these data sources are carefully selected and preprocessed. They thus often are directly relevant to the problem. In these cases, a sparse solution may be too selective to thoroughly combine the complementary information in the data sources. While the performance on benchmark data may be good, the selected sources may not be as strong on truly novel problems where the quality of the information is much lower. We may thus expect the performance of such solutions to degrade significantly on actual real-world applications. To address this problem, we propose a new kernel fusion scheme by optimizing the L_{2}-norm of multiple kernels. The L_{2} MKL yields a non-sparse solution, which smoothly distributes the coefficients on multiple kernels and, at the same time, leverages the effects of kernels in the objective optimization. Empirical results show that the L_{2}-norm kernel fusion can lead to a better performance in biomedical data fusion.
Methods
Acronyms
Acronyms
ℝ^{ N } | the dual variable of SVM | |
---|---|---|
Q | ℝ^{N × N} | a semi-positive definite matrix |
C | ℝ^{ N } | a convex set |
Ω | ℝ^{N × N} | a combination of multiple semi-positive definite matrices |
j | ℕ | the index of kernel matrices |
p | ℕ | the number of kernel matrices |
θ | [0, 1] | coefficients of kernel matrices |
t | [0, + ∞) | dummy variable in optimization problem |
ℝ^{ p } | ||
ℝ^{ p } | ||
ℝ^{ D }or ℝ^{Φ} | the norm vector of the separating hyperplane | |
ϕ(·) | ℝ^{ D }→ ℝ^{Φ} | the feature map |
i | ℕ | the index of training samples |
ℝ^{ D } | the vector of the i-th training sample | |
ρ | ℝ | bias term in 1-SVM |
ν | ℝ^{+} | regularization term of 1-SVM |
ξ _{ i } | ℝ | slack variable for the i-th training sample |
K | ℝ^{N × N} | kernel matrix |
ℝ^{ D }× ℝ^{ D }→ ℝ | ||
ℝ^{ D } | the vector of a test data sample | |
y _{ i } | -1 or +1 | the class label of the i-th training sample |
Y | ℝ^{N × N} | the diagonal matrix of class labels Y = diag(y_{1}, ..., y_{ N }) |
C | ℝ^{+} | the box constraint on dual variables of SVM |
b | ℝ^{+} | the bias term in SVM and LSSVM |
ℝ^{ p } | ||
k | ℕ | the number of classes |
ℝ^{ p } | ||
ℝ^{ p } | variable vector in SIP problem | |
u | ℝ | dummy variable in SIP problem |
q | ℕ | the index of class number in classification problem, q = 1, ..., k |
A | ℝ^{N × N} | |
λ | ℝ^{+} | the regularization parameter in LSSVM |
e _{ i } | ℝ | the error term of the i-th sample in LSSVM |
ℝ^{ N } | ||
ϵ | ℝ^{+} | precision value as the stopping criterion of SIP iteration |
τ | ℕ | index parameter of SIP iterations |
ℝ^{ p } |
Formal definition of the problem
where . The proof that (8) is the solution of (7) is given in the following theorem.
Theorem 0.1 The QCLP problem in (8) equivalently solves the problem in (7).
Therefore, given , the additive term is bounded by the L_{2}-norm || ||_{2}.
Moreover, it is easy to prove that when , the parametric combination reaches the upperbound and the equality holds. Optimizing this L_{2}-norm results in a non-sparse solution in θ_{ j }. In order to distinguish this from the solution obtained by (3) and (4), we denote it as the L_{2}-norm approach. It can also easily be seen (not shown here) that the L_{1}-norm approach is simply averaging the quadratic terms with uniform coefficients.
where and the constraint is in L_{ n }-norm, moreover, . The problem in (14) is convex and can be solved by cvx toolbox [10, 11].
Theorem 0.2 If the coefficient vector is regularized by a L_{ m }-norm in (13), the problem can be solved as a convex programming problem in (14) with L_{ n }-norm constraint. Moreover, .
Due to the condition that , so , we prove that with the L_{ m }-norm constraint posed on , the additive multiple kernel term is bounded by the L_{ n }-norm of the vector . Moreover, we have .
Next, we will investigate several concrete kernel fusion algorithms and will propose the corresponding L_{2} solutions.
One class SVM kernel fusion for ranking
where Ω_{ N }is the combined kernel of training data , i = 1, ..., N, is the test data point to be ranked, is the kernel function applied on test data and training data, is the dual variable solved as (20). De Bie et al. applied the method in the application of disease gene prioritization, where multiple genomic data sources are combined to rank a large set of test genes using the 1-SVM model trained from a small set of training genes which are known to be relevant for certain diseases. The L_{∞} formulation in their approach yields a sparse solution when integrating genomic data sources (see Figure 2 of [7]). To avoid this disadvantage, they proposed a regularization method by restricting the minimal boundary on the kernel coefficients, notated as θ_{ min }, to ensure the minimal contribution of each genomic data source to be θ_{ min }/p. According to their experiments, the regularized solution performed best, being significantly better than the sparse integration and the average combination of kernels.
where . The problem above is a QCLP problem and can be solved by conic optimization solvers such as Sedumi [14]. In (23), the first constraint represents a Lorentz cone and the second constraint corresponds to p number of rotated Lorentz cones (R cones). The optimal kernel coefficients θ_{ j }correspond to the dual variables of the R cones with ||θ||_{2} = 1. In this L_{2}-norm approach, the integrated kernel Ω is combined by different and the same scoring function as in (22) is applied on the different solutions of and Ω.
Support vector machine MKL for classification
SIP formulation for SVM MKL on larger scale data
The SIP problem above is solved as a bi-level algorithm for which the pseudo code is presented in Algorithm 1 in the Appendix. In each loop τ, Step 1 optimizes and u^{(τ)}for a restricted subset of constraints as a linear programming. Step 3 is an SVM problem with a single kernel and generates a new . If is not satisfied by the current and u^{(τ)}, it will be added successively to step 1 until all constraints are satisfied. The starting points are randomly initialized and SIP always converges to a identical result.
where and is a given value. Moreover, the PSD property of kernel matrices ensures that A_{ j }≥ 0, thus the optimal solution always satisfies .
In the SIP formulation, the SVM MKL is solved iteratively as two components. The first component is a single kernel SVM, which is solved more efficiently when the data scale is larger then thousands of data points (and smaller than ten thousands) and, requires much less memory than the QP formulation. The second component is a small scale problem, which is a linear problem in L_{∞} case and a QCLP problem in the L_{2} approach. As shown, the complexity of the SIP based SVM MKL is mainly determined by the burden of a single kernel SVM multiplied by the number of iterations. This has inspired us to adopt more efficient single SVM learning algorithms to further improve the efficiency. The least squares support vector machines (LSSVM) [22] is known for its simple differentiable cost function, the equality constraints in the separating hyperplane and its solution based on linear equations, which is preferable for large scaler problems. Next, we will investigate the MKL solutions issue using LSSVM formulations.
Least squares SVM MKL for classification
In (38), we add an additional constraint as Y^{-2} = I then the coefficient becomes a static value in the multi-class case. In 1vsA coding, (37) requires to solve k number of linear problems whereas in (38), the coefficient matrix is only factorized once such that the solution of w.r.t. the multi-class label vectors is very efficient to obtain. The constraint Y^{-2} = I can be simply satisfied by assuming the class labels to be -1 and +1. Thus, from now on, we assume Y^{-2} = I in the following discussion.
where . The λ parameter regularizes the squared error term in the primal objective in (36) and the quadratic term in the dual problem. Usually, the optimal λ needs to be selected empirically by cross-validation. In the kernel fusion of LSSVM, we can alternatively transform the effect of regularization as an identity kernel matrix in , where θ_{p + 1}= 1/λ. Then the MKL problem of combining p kernels is equivalent to combining p + 1 kernels where the last kernel is an identity matrix with the optimal coefficient corresponding to the λ value. This method has been mentioned by Lanckriet et al. to tackle the estimation of the regularization parameter in the soft margin SVM [4]. It has also been used by Ye et al. to jointly estimate the optimal kernel for discriminant analysis [17]. Saving the effort of validating λ may significantly reduce the model selection cost in complicated learning problems. By this transformation, the objective of LSSVM MKL becomes similar to that of SVM MKL with the main difference that the dual variables are unconstrained. Though (39) and (40) can in principle both be solved as QP problems by a conic solver or a QP solver, the efficiency of a linear solution of the LSSVM is lost. Fortunately, in a SIP formulation, the LSSVM MKL can be decomposed into iterations of the master problem of single kernel LSSVM learning, which is an unconstrained QP problem, and a coefficient optimization problem with very small scale.
SIP formulation for LSSVM SVM MKL on larger scale data
Summary of algorithms
Summary of algorithms implemented in the paper
Algorithm Nr. | Formulation Nr. | Name | References | Formulation | Equations |
---|---|---|---|---|---|
1 | 1-A | 1-SVM L_{∞} MKL | [7] | SOCP | (20) |
1 | 1-B | 1-SVM L_{∞} MKL | [7] | QCQP | (20) |
2 | 2-A | 1-SVM L_{∞} (0.5) MKL | [7] | SOCP | (20) |
2 | 2-B | 1-SVM L_{∞} (0.5) MKL | [7] | QCQP | (20) |
3 | 3-A | 1-SVM L_{1} MKL | SOCP | (19) | |
3 | 3-B | 1-SVM L_{1} MKL | QCQP | (19) | |
4 | 4-A | 1-SVM L_{2} MKL | novel | SOCP | (23) |
5 | 5-B | SVM L_{∞} MKL | QCQP | (26) | |
5 | 5-C | SVM L_{∞} MKL | [18] | SIP | (33) |
6 | 6-B | SVM L_{∞} (0.5) MKL | novel | QCQP | (26) |
7 | 7-A | SVM L_{1} MKL | [2] | SOCP | (25) |
7 | 7-B | SVM L_{1} MKL | [4] | QCQP | (25) |
8 | 8-A | SVM L_{2} MKL | novel | SOCP | (27) |
8 | 8-C | SVM L_{2} MKL | [40] | SIP | (34) |
9 | 9-B | Weighted SVM L_{∞} MKL | novel | QCQP | Suppl. (3) |
10 | 10-B | Weighted SVM L_{∞} (0.5) MKL | novel | QCQP | Suppl. (3) |
11 | 11-B | Weighted SVM L_{1} MKL | [25] | QCQP | Suppl. (2) |
12 | 12-A | Weighted SVM L_{2} MKL | novel | SOCP | Suppl. (4) |
13 | 13-B | LSSVM L_{∞} MKL | [17] | QCQP | (39) |
13 | 13-C | LSSVM L_{∞} MKL | [17] | SIP | (41) |
14 | 14-B | LSSVM L_{∞} (0.5) MKL | novel | QCQP | (39) |
15 | 15-D | LSSVM L_{1} MKL | [22] | linear | (38) |
16 | 16-B | LSSVM L_{2} MKL | novel | SOCP | (40) |
16 | 16-C | LSSVM L_{2} MKL | novel | SIP | (42) |
17 | 17-B | Weighted LSSVM L_{∞} MKL | novel | QCQP | Suppl. (8) |
18 | 18-B | Weighted LSSVM L_{∞} (0.5) MKL | novel | QCQP | Suppl. (8) |
19 | 19-D | Weighted LSSVM L_{1} MKL | [25] | linear | Suppl. (6) |
20 | 20-A | Weighted LSSVM L_{2} MKL | novel | SOCP | Suppl. (9) |
Experimental setup and data sets
Summary of data sets and algorithms used in five experiments
Nr. | Data Set | Problem | Samples | Classes | Algorihtms | Evaluation |
---|---|---|---|---|---|---|
1 | disease relevant genes | ranking | 620 | 1 | 1-4 | LOO AUC |
2 | prostate cancer genes | ranking | 9 | 1 | 1-4 | AUC |
3 | rectal cancer patients | classification | 36 | 2 | 5-8,13-16 | LOO AUC |
4 | endometrial disease | classification | 339 | 2 | 5-8,13-16 | 3-fold AUC |
miscarriage | classification | 2356 | 2 | 5-8,13-16 | 3-fold AUC | |
pregnancy | classification | 856 | 2 | 9-12,17-20 | 3-fold AUC | |
5 | UCI pen digit and optical digit | classification | 1000-3000 | 10 | 1A,1B,5B,5C,13B,13C | CPU time |
Experiment 1
In the first experiment, we demonstrated a disease gene prioritization application to compare the performance of optimizing different norms in MKL. The computational definition of gene prioritization is mentioned in our earlier work [7, 27, 28]. In this paper, we applied four 1-SVM MKL algorithms to combine kernels derived from 9 heterogeneous genomic sources (shown in section 1 of Additional file 1) to prioritize 620 genes that are annotated to be relevant for 29 diseases in OMIM. The performance was evaluated by leave-one-out (LOO) validation: for each disease which contains K relevant genes, one gene, termed the "defector" gene, was removed from the set of training genes and added to 99 randomly selected test genes (test set). We used the remaining K - 1 genes (training set) to build our prioritization model. Then, we prioritized the test set of 100 genes with the trained model and determined the rank of that defector gene in test data. The prioritization function in (22) scored the relevant genes higher and others lower, thus, by labeling the "defector" gene as class "+1" and the random candidate genes as class "-1", we plotted the Receiver Operating Characteristic (ROC) curves to compare different models using the error of AUC (one minus the area under the ROC curve).
The kernels of data sources were all constructed using linear functions except the sequence data that was transformed into a kernel using a 2-mer string kernel function [29] (details in section 1 of Additional file 1). In total 9 kernels were combined in this experiment. The regularization parameter ν in 1-SVM was set to 0.5 for all comparing algorithms. Since there was no hyper-parameter needed to be tuned in LOO validation, we reported the LOO results as the performance of generalization. For each disease relevant gene, the 99 test genes were randomly selected in each LOO validation run from the whole human protein-coding genome. We repeated the experiment 20 times and the mean value and standard deviation were used for comparison.
Experiment 2
In the second experiment we used the same data sources and kernel matrices as in the previous experiment to prioritize 9 prostate cancer genes recently discovered by Eeles et al. [30], Thomas et al. [31] and Gudmundsson et al. [32]. A training set of 14 known prostate cancer genes was compiled from the reference database OMIM including only the discoveries prior to January 2008. This training set was then used to train the prioritization model. For each novel prostate cancer gene, the test set contained the newly discovered gene plus its 99 closest neighbors on the chromosome. Besides the error of AUC, we also compared the ranking position of the novel prostate cancer gene among its 99 closet neighboring genes. Moreover, we compared the MKL results with the ones obtained via the Endeavour application.
Experiment 3
The third experiment is taken from the work of Daemen et al. about the kernel-based integration of genome-wide data for clinical decision support in cancer diagnosis [33]. Thirty-six patients with rectal cancer were treated by combination of cetuximab, capecitabine and external beam radiotherapy and their tissue and plasma samples were gathered at three time points: before treatment (T_{0}); at the early therapy treatment (T_{1}) and at the moment of surgery (T_{2}). The tissue samples were hybridized to gene chip arrays and after processing, the expression was reduced to 6,913 genes. Ninety-six proteins known to be involved in cancer were measured in the plasma samples, and the ones that had absolute values above the detection limit in less than 20% of the samples were excluded for each time point separately. This resulted in the exclusion of six proteins at T_{0} and four at T_{1}. "Responders" were distinguished from "non-responders" according to the pathologic lymph node stage at surgery (pN-STAGE). The "responder" class contains 22 patients with no lymph node found at surgery whereas the "non-responder" class contains 14 patients with at least 1 regional lymph node. Only the two array-expression data sets (MA) measured at T_{0} and T_{1} and the two proteomics data sets (PT) measured at T_{0} and T_{1} were used to predict the outcome of cancer at surgery.
Similar to the original method applied on the data [33], we used R BioConductor DEDS as feature selection techniques for microarray data and the Wilcoxon rank sum test for proteomics data. The statistical feature selection procedure was independent to the classification procedure, however, the performance varied widely with the number of selected genes and proteins. We considered the relevance of features (genes and proteins) as prior knowledge and systematically evaluated the performance using multiple numbers of genes and proteins. According to the ranking of statistical feature selection, we gradually increased the number of genes and proteins from 11 to 36, and combined the linear kernels constructed by these features. The performance was evaluated by LOO method, where the reason was two folded: firstly, the number of samples was small (36 patients); secondly, the kernels were all constructed with a linear function. Moreover, in LSSVM classification we proposed the strategy to estimate the regularization parameter λ in kernel fusion. Therefore, no hyperparameter was needed to be tuned so we reported the LOO validation result as the performance of generalization.
Experiment 4
Our fourth experiment considered three clinical data sets. These three data sets were derived from different clinical studies and were used by Daemen and De Moor [34] as validation data for clinical kernel function development. Data set I contains clinical information on 402 patients with an endometrial disease who underwent an echographic examination and color Droppler [35]. The patients are divided into two groups according to their histology: malignant (hyperplasia, polyp, myoma, and carcinoma) versus benign (proliferative endometrium, secretory endometrium, atrophia). After excluding patients with incomplete data, the data contains 339 patients of which 163 malignant and 176 benign. Data set II comes from a prospective observational study of 1828 women undergoing transvaginal sonography before 12 weeks gestation, resulting in data for 2356 pregnancies of which 1458 normal at week 12 and 898 miscarriages during the first trimester [36]. Data set III contains data on 1003 pregnancies of unknown location (PUL) [37]. Within the PUL group, there are four clinical outcomes: a failing PUL, an intrauterine pregnancy (IUP), an ectopic pregnancy (EP) or a persisting PUL. Because persisting PULs are rare (18 cases in the data set), they were excluded, as well as pregnancies with missing data. The final data set consists of 856 PULs among which 460 failing PULs, 330 IUPs, and 66 EPs. As the most important diagnostic problem is the correct classification of the EPs versus non-EPs [38], the data was divided as 790 non-EPs and 66 EPs. To simulate a problem of combining multiple sources, for each data we created eight kernels and combined them using MKL algorithms for classification. The eight kernels included one linear kernel, three RBF kernels, three polynomial kernels and a clinical kernel. The kernel width of the first RBF kernel is selected by empirical rules as four times the average covariance of all the samples, the second and the third kernel widths were respectively six and eight times the average covariance. The degrees of the three polynomial kernels were set to 2, 3, and 4 respectively. The bias term of polynomial kernels was set to 1. The clinical kernels were constructed as proposed by Daemen and De Moor [33]. All the kernel functions are explained in section 3 of Additional file 1. We noticed that the class labels of the pregnancy data were quite imbalanced (790 non-EPs and 66 EPs). In literature, the class imbalanced problem can be tackled by modifying the cost of different classes in the objective function of SVM. Therefore, we applied weighted SVM MKL and weighted LSSVM MKL on the imbalanced pregnancy data. For the other two data sets, we compared the performance of SVM MKL and LSSVM MKL with different norms.
The performance of classification was benchmarked using 3-fold cross validation. Each data set was randomly and equally divided into 3 parts. As introduced in the Methods section, when combining multiple pre-constructed kernels in LSSVM based algorithms, the regularization parameter λ can be jointly estimated as the coefficient of identity matrix. In this case we don't need to optimize any hyper-parameter in the LSSVM. In the estimation approach of LSSVM and all approaches of SVM, we therefore could use both training and validation data to train the classifier, and test data to evaluate the performance. The evaluation was repeated three times, so each part was used once as test data. The average performance was reported as the evaluation of one repetition. In the standard validation approach of LSSVM, each dataset was partitioned randomly into three parts for training, validation and testing. The classifier was trained on the training data and the hyper-parameter λ was tuned on the validation data. When tuning the λ, its values were sampled uniformly on the log scale from 2^{-10} to 2^{10}. Then, at optimal λ, the classifier was retrained on the combined training and validation set and the resulting model is tested on the testing set. Obviously, the estimation approach is more efficient than the validation approach because the former approach only requires one training process whereas the latter needs to perform 22 times an additional training (21 λ values plus the model retraining). The performance of these two approaches was also investigated in this experiment.
Experiment 5
As introduced in the Methods section, a same MKL problem can be formulated as different optimization problems such as SOCP, QCQP, and SIP. The accuracy of the discretization method for solving SIP is mainly determined by the tolerance value ε predefined in the stopping criterion. In our implementation, ε was set to 5 × 10^{-4}. These different formulations yield the same result but mainly differ on computational efficiency. In the fifth experiment we compared the efficiency of these optimization techniques on two large scale UCI data sets. The two data sets are digit recognition data for pen based handwriting recognition and optical based digit recognition. Both data sets contain more than 6000 data samples thus they were used as real large scale data sets to evaluate the computational efficiency. In our implementation, the optimization problems were solved by Sedumi [14], MOSEK [15] and the Matlab optimization toolbox. All the numerical experiments were carried on a dual Opteron 250 Unix system with 16 G memory and the computational efficiency was evaluated by the CPU time (in seconds).
Results
Experiment 1: disease relevant gene prioritization by genomic data fusion
Results of experiment 1: prioritization of 620 disease relevant genes by genomic data fusion
Error of AUC (mean) | Error of AUC (std.) | p-value | corr | corr | corr | corr | |
---|---|---|---|---|---|---|---|
L _{ ∞ } | 0.0923 | 0.0035 | 2.98 · 10^{-17} | - | 0.94 | 0.66 | 0.82 |
L_{ ∞ }(0.5) | 0.0806 | 0.0033 | 2.66 · 10^{-06} | 0.94 | - | 0.82 | 0.92 |
L _{1} | 0.0908 | 0.0042 | 1.92 · 10^{-16} | 0.66 | 0.82 | - | 0.90 |
L _{2} | 0.0780 | 0.0034 | - | 0.82 | 0.92 | 0.90 | - |
Experiment 2: Prioritization of recently discovered prostate cancer genes by genomic data fusion
Results of experiment 2: prioritization of prostate cancer genes by genomic data fusion
Name | Ensemble id | References | L _{∞} | L_{∞}(0.5) | L _{1} | L _{2} | Endeavour |
---|---|---|---|---|---|---|---|
CPNE | ENSG00000085719 | Thomas et al. | 0.3030 | 0.2323 | 0.1010 | 0.1212 | - |
31/100 | 24/100 | 11/100 | 13/100 | 70/100 | |||
CDH23 | ENSG00000107736 | Thomas et al. | 0.0606 | 0.0303 | 0.0202 | 0.0101 | - |
7/100 | 4/100 | 3/100 | 2/100 | 78/100 | |||
EHBP1 | ENSG00000115504 | Gudmundsson et al. | 0.5354 | 0.5152 | 0.3434 | 0.3939 | - |
54/100 | 52/100 | 35/100 | 40/100 | 57/100 | |||
MSMB | ENSG00000138294 | Eeles et al. | 0.0202 | 0.0202 | 0.0505 | 0.0303 | - |
Thomas et al. | 3/100 | 3/100 | 6/100 | 4/100 | 69/100 | ||
KLK3 | ENSG00000142515 | Eeles et al. | 0.3434 | 0.3535 | 0.2929 | 0.2929 | - |
35/100 | 36/100 | 30/100 | 30/100 | 28/100 | |||
JAZF1 | ENSG00000153814 | Thomas et al. | 0.0505 | 0.0202 | 0.0202 | 0.0202 | - |
6/100 | 3/100 | 3/100 | 3/100 | 7/100 | |||
LMTK2 | ENSG00000164715 | Eeles et al. | 0.3131 | 0.4646 | 0.8081 | 0.7677 | - |
32/100 | 47/100 | 81/100 | 77/100 | 31/100 | |||
IL16 | ENSG00000172349 | Thomas et al. | 0 | 0.0101 | 0.0303 | 0.0101 | - |
1/100 | 2/100 | 4/100 | 2/100 | 72/100 | |||
CTBP2 | ENSG00000175029 | Thomas et al. | 0.8283 | 0.5758 | 0.6364 | 0.6869 | - |
83/100 | 58/100 | 64/100 | 69/100 | 38/100 |
Experiment 3: Clinical decision support by integrating microarray and proteomics data
Results of experiment 3: classification of patients in rectal cancer clinical decision using microarray and proteomics data sets
LSSVM L_{∞} | SVM L_{∞} | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
14 p | 15 p | 16 p | 17 p | 18 p | 14 p | 15 p | 16 p | 17 p | 18 p | |
24 g | 0.0584 | 0.0519 | 0.0747 | 0.0812 | 0.0812 | 0.1331 | 0.1331 | 0.1331 | 0.1331 | 0.1364 |
25 g | 0.0390 | 0.0390 | 0.0519 | 0.0617 | 0.0649 | 0.1136 | 0.1104 | 0.1234 | 0.1201 | 0.1234 |
26 g | 0.0487 | 0.0487 | 0.0812 | 0.0844 | 0.0877 | 0.1266 | 0.1136 | 0.1234 | 0.1299 | 0.1364 |
27 g | 0.0617 | 0.0649 | 0.0812 | 0.0877 | 0.0942 | 0.1429 | 0.1364 | 0.1364 | 0.1331 | 0.1461 |
28 g | 0.0552 | 0.0487 | 0.0617 | 0.0747 | 0.0714 | 0.1429 | 0.1331 | 0.1331 | 0.1364 | 0.1396 |
LSSVM L_{∞} (0.5) | SVM L_{∞} (0.5) | |||||||||
14 p | 15 p | 16 p | 17 p | 18 p | 14 p | 15 p | 16 p | 17 p | 18 p | |
24 g | 0.0584 | 0.0519 | 0.0747 | 0.0812 | 0.0812 | 0.1266 | 0.1006 | 0.1266 | 0.1299 | 0.1331 |
25 g | 0.0390 | 0.0390 | 0.0519 | 0.0617 | 0.0649 | 0.1136 | 0.1071 | 0.1234 | 0.1201 | 0.1234 |
26 g | 0.0487 | 0.0487 | 0.0812 | 0.0844 | 0.0877 | 0.1136 | 0.1136 | 0.1201 | 0.1266 | 0.1331 |
27 g | 0.0617 | 0.0649 | 0.0812 | 0.0877 | 0.0942 | 0.1364 | 0.1364 | 0.1364 | 0.1331 | 0.1461 |
28 g | 0.0552 | 0.0487 | 0.0617 | 0.0747 | 0.0714 | 0.1299 | 0.1299 | 0.1299 | 0.1331 | 0.1364 |
LSSVM L_{1} | SVM L_{1} | |||||||||
14 p | 15 p | 16 p | 17 p | 18 p | 14 p | 15 p | 16 p | 17 p | 18 p | |
24 g | 0.0487 | 0.0487 | 0.0682 | 0.0682 | 0.0747 | 0.0747 | 0.0584 | 0.0714 | 0.0682 | 0.0747 |
25 g | 0.0357 | 0.0325 | 0.0422 | 0.0455 | 0.0455 | 0.0584 | 0.0519 | 0.0649 | 0.0714 | 0.0714 |
26 g | 0.0357 | 0.0357 | 0.0455 | 0.0455 | 0.0455 | 0.0584 | 0.0519 | 0.0682 | 0.0682 | 0.0682 |
27 g | 0.0357 | 0.0357 | 0.0455 | 0.0487 | 0.0519 | 0.0617 | 0.0584 | 0.0714 | 0.0682 | 0.0682 |
28 g | 0.0422 | 0.0325 | 0.0487 | 0.0487 | 0.0519 | 0.0584 | 0.0584 | 0.0649 | 0.0649 | 0.0682 |
LSSVM L_{2} | SVM L_{2} | |||||||||
14 p | 15 p | 16 p | 17 p | 18 p | 14 p | 15 p | 16 p | 17 p | 18 p | |
24 g | 0.0552 | 0.0487 | 0.0747 | 0.0779 | 0.0714 | 0.0909 | 0.0877 | 0.0974 | 0.0942 | 0.1006 |
25 g | 0.0390 | 0.0390 | 0.0487 | 0.0552 | 0.0552 | 0.0747 | 0.0649 | 0.0812 | 0.0844 | 0.0844 |
26 g | 0.0390 | 0.0455 | 0.0552 | 0.0649 | 0.0649 | 0.0747 | 0.0584 | 0.0812 | 0.0779 | 0.0779 |
27g | 0.0422 | 0.0487 | 0.0552 | 0.0584 | 0.0649 | 0.0779 | 0.0812 | 0.0844 | 0.0812 | 0.0812 |
28 g | 0.0455 | 0.0325 | 0.0487 | 0.0584 | 0.0552 | 0.0812 | 0.0714 | 0.0812 | 0.0779 | 0.0812 |
Experiment 4: Clinical decision support by integrating multiple kernels
Results of experiment 4 data set I: classification of endometrial disease patients using multiple kernels derived from clinical data
Classifier | Mean - error of AUC | Std. - error of AUC | pvalue |
---|---|---|---|
LSSVM L _{∞} (0.5) MKL | 0.2353 | 0.0133 | - |
SVM L _{∞} (0.5) MKL | 0.2388 | 0.0178 | 0.4369 |
SVM L _{∞} MKL | 0.2417 | 0.0165 | 0.2483 |
LSSVM L_{2} MKL | 0.2456 | 0.0124 | 0.0363 |
SVM L_{2} MKL | 0.2489 | 0.0178 | 0.0130 |
SVM L_{1} MKL | 0.2513 | 0.0144 | 0.0057 |
LSSVM L_{1} MKL | 0.2574 | 0.0189 | 9.98 · 10^{-5} |
LSSVM L_{∞} MKL | 0.2678 | 0.0130 | 1.53 · 10^{-6} |
Results of experiment 4 data set II: classification of miscarriage patients using multiple kernels derived from clinical data
Classifier | Mean - error of AUC | Std. - error of AUC | pvalue |
---|---|---|---|
SVM L _{2} MKL | 0.1975 | 0.0037 | - |
LSSVM L _{2} MKL | 0.2002 | 0.0049 | 0.0712 |
LSSVM L_{∞} (0.5) MKL | 0.2027 | 0.0045 | 9.77 · 10^{-4} |
SVM L_{∞} MKL | 0.2109 | 0.0040 | 9.55 · 10^{-12} |
SVM L_{∞} (0.5) MKL | 0.2168 | 0.0040 | 1.79 · 10^{-12} |
LSSVM L_{1} MKL | 0.2132 | 0.0029 | 1.11 · 10^{-13} |
SVM L_{1} MKL | 0.2297 | 0.0038 | 1.10 · 10^{-15} |
LSSVM L_{∞} MKL | 0.2319 | 0.0015 | 3.42 · 10^{-21} |
Results of experiment 4 data set III: classification of PUL patients using multiple kernels derived from clinical data
Classifier | Mean - error of AUC | Std. - error of AUC | pvalue |
---|---|---|---|
Weighted LSSVM L _{2} MKL | 0.1165 | 0.0100 | - |
Weighted LSSVM L _{1} MKL | 0.1243 | 0.0171 | 0.0519 |
Weighted LSSVM L_{∞} (0.5) MKL | 0.1290 | 0.0206 | 0.0169 |
Weighted SVM L_{2} MKL | 0.1499 | 0.0248 | 4.79 · 10^{-5} |
Weighted SVM L_{∞} MKL | 0.1552 | 0.0210 | 1.02 · 10-6 |
Weighted SVM L_{∞} (0.5) | 0.1551 | 0.0153 | 3.87 · 10^{-6} |
Weighted SVM L_{1} MKL | 0.1594 | 0.0162 | 2.29 · 10^{-9} |
Weighted LSSVM L_{∞} MKL | 0.1651 | 0.0174 | 4.41 · 10^{-10} |
To investigate whether the combination of multiple kernels performs as well as the best individual kernel, we evaluated the performance of all the individual kernels in section 5 of Additional file 1. As shown, the clinical kernel proposed by Daemen and De Moor [33] has better quality than linear, RBF and polynomial kernels on endometrial and pregnancy data sets. For the miscarriage data set, the first RBF kernel has better quality than the other seven kernels. Despite the difference in individual kernels, the performance of MKL is comparable to the best individual kernel, demonstrating that MKL is also useful to combine candidate kernels derived from a single data set.
The effectiveness of MKL can also be justified by investigating the kernel coefficients optimized on all the data sets and classifiers. As shown in section 6 of Additional file 1, the kernel coefficients optimized by L_{∞} MKL algorithms were sparse whereas the L_{2} ones were more evenly assigned to different kernels. The best individual kernels of all data sets usually get dominant coefficient, explaining why the performance of MKL algorithms is comparable to the best individual kernels.
Comparison of the performance obtained by joint estimation of λ and standard cross-validation in LSSVM MKL
Data Set | Norm | Validation Approach | Estimation Approach |
---|---|---|---|
endometrial disease | L _{∞} | 0.2625 ± 0.0146 | 0.2678 ± 0.0130 |
L _{2} | 0.2584 ± 0.0188 | 0.2456 ± 0.0124 | |
miscarriage | L _{∞} | 0.1873 ± 0.0100 | 0.2319 ± 0.0015 |
L _{2} | 0.1912 ± 0.0089 | 0.2002 ± 0.0049 | |
pregnancy | L _{∞} | 0.1321 ± 0.0243 | 0.1651 ± 0.0173 |
L _{2} | 0.1299 ± 0.0172 | 0.1165 ± 0.0100 |
Experiment 5: Computational complexity and numerical experiments on large scale problems
Overview of the convexity and complexity
Convexity and complexity of all methods
Method | convexity | complexity |
---|---|---|
1-SVM SOCP L_{∞}, L_{2} | convex | O((p + n)^{2}n^{2.5}) |
1-SVM QCQP L_{∞} | convex | O(pn^{3}) |
SVM SOCP L_{∞}, L_{2} | convex | O((p + n)^{2}(k + n)^{2.5}) |
SVM QCQP L_{∞} | convex | O(pk^{2}n^{2} + k^{3}n^{3}) |
SVM SIP L_{∞} | convex | O(τ(kn^{3} + p^{3})) |
SVM SIP L_{2} | relaxation | O(τ(kn^{3} + p^{3})) |
LSSVM SOCP L_{∞}, L_{2} | convex | O((p + n)^{2}(k + n)^{2.5}) |
LSSVM QCQP L_{∞}, L_{2} | convex | O(pk^{2}n^{2} + k^{3}n^{3}) |
LSSVM SIP L_{∞} | convex | O(τ(n^{2} + p^{3})) |
LSSVM SIP L_{2} | relaxation | O(τ(n^{2} + p^{3})) |
We verified the efficiency in numerical experiments, which adopts two UCI digit recognition data sets (pen-digit and optical digit) to compare the computational time of the proposed algorithms.
QP formulation is more efficient than SOCP
We investigated the efficiency of various formulations to solve the 1-SVM MKL. As mentioned, the problems presented in (15) can be solved either as QCLP or as SOCP. We applied Sedumi [14] to solve it as SOCP and MOSEK to solve it as QCLP and SOCP. We found that solving the QP by MOSEK was most efficient (142 seconds). In contrast, the MOSEK-SOCP method costed 2608 seconds and the Sedumi-SOCP method took 4500 seconds. This is probably because when transforming a QP to a SOCP, a large number of additional variables and constraints are involved, thus becoming more expensive to solve.
SIP formulation is more efficient than QCQP
Discussion
In this paper we propose a new L_{2} MKL framework as the complement to the existing L_{∞} MKL method proposed by Lanckriet et al.. The L_{2} MKL is characterized by the non-sparse integration of multiple kernels to optimize the objective function of machine learning problems. On four real bioinformatics and biomedical applications, we systematically validated the performance through extensive analysis. The motivation for L_{2} MKL is as follows. In real biomedical applications, with a small number of sources that are believed to be truly informative, we would usually prefer a nonsparse set of coefficients because we would want to avoid that the dominant source (like text mining or Gene Ontology) gets a coefficient close to 1. The reason to avoid sparse coefficients is that there is a discrepancy between the experimental setup for performance evaluation and "real world" performance. The dominant source will work well on a benchmark because this is a controlled situation with known outcomes. We for example set up a set of already known genes for a given disease and want to demonstrate that our model can capture the available information to discriminate between a gene from this set and randomly selected genes (for example, in a cross-validation setup). Given that these genes are already known to be associated with the disease, this information will be present in sources like text mining or Gene Ontology in the gene prioritization problem. These sources can then identify these known genes with high confidence and should therefore be assigned a high weight. However, when trying to identify truly novel genes for the same disease, the relevance of the information available through such data sources will be much lower and we would like to avoid anyone data source to complete dominate the other. Given that setting up a benchmark requires knowledge of the association between a gene and a disease, this effect is hard to avoid. We can therefore expect that if we have a smoother solution that performs as well as the sparse solution on benchmark data, it is likely to perform better on real discoveries.
For the specific problem of gene prioritization, an effective way to address this problem is to setup a benchmark where information is "rolled back" a number of years (e.g., two years) prior to the discovery of the association between a gene and a disease (i.e., older information is used so that the information about the association between the gene and the disease is not yet contained in data sources like text mining or Gene Ontology). Given that the date at which the association was discovered is different for each gene, the setup of such benchmarks is notoriously difficult. In future work, we plan to address this problem by freezing available knowledge at a given data and then collecting novel discoveries and benchmarking against such discoveries in a fashion reminiscent of CASP (Critical Assessment of protein Structure Prediction) [39].
The technical merit of the proposed L_{2} MKL lay in the dual form of the learning problems. Though in the literature the issue of using different norms in MKL is recently investigated by Kloft et al. [40, 9] and Kowalski et al. [41], their formulations are based on the primal problems. In our paper, the notion of the proposed L_{2} method is discussed in the dual space, which differs from regularizing the norm of coefficients term in the primal space. We have theoretically proven that optimizing the L_{2} regularization of kernel coefficients in the primal problem corresponds to solving the L_{2}-norm of kernel components in the dual problem. Clarifying this dual solution enabled us to directly solve the L_{2} problem as a convex SOCP. Moreover, the dual solution can be extended to various other machine learning problems. In this paper we have shown the extensions of 1-SVM, SVM and LSSVM. As a matter of fact, the L_{2} dual solution can also be applied in kernel based clustering analysis and regression analysis for a wide range of applications. Another main contribution of our paper is the novel LSSVM L_{2} MKL proposed for classification problems. As known, when applying various machine learning techniques to solve real computational biological problems, the performance may depend on the data set and the experimental settings. When the performance evaluations of various methods are comparable, but with one method showing significant computational efficiency over other methods, this would be a "solid" advantage of this method. In this paper, we have shown that the LSSVM MKL classifier based on SIP formulation can be solved more efficiently than SVM MKL. Moreover, the performance of LSSVM L_{2} MKL is always comparable to the best performance. The SIP based LSSVM L_{2} MKL classifier has two main "solid advantages": the inherent time complexity is small and the regularization parameter λ can be jointly estimated in the experimental setup. Due to these merits, LSSVM L_{2} MKL is a very promising technique for problems pertaining to large scale data fusion.
Conclusions
This paper compared the effect of optimizing different norms in multiple kernel learning in a systematic framework. The obtained results extend and enrich the statistical framework of genomic data fusion proposed by Lanckriet et al. [4, 6] and Bach et al. [5]. According to the optimization of different norms in the dual problem of SVM, we proposed L_{∞}, L_{1}, and L_{2} MKL, which are respectively corresponding to the L_{1} regularization, average combination, and L_{2} regularization of kernel coefficients addressed in the primal problem.
Six real biomedical data sets were investigated in this paper, where L_{2} MKL approach was shown advantageous over the L_{∞} method. We also proposed a novel and efficient LSSVM L_{2} MKL classifier to learn the optimal combination of multiple large scale data sets. All the algorithms implemented in this paper are freely accessible on http://homes.esat.kuleuven.be/~sistawww/bioi/syu/l2lssvm.html.
Appendix
Declarations
Acknowledgements
The work was supported by Research Council KUL: GOA AMBioRICS, CoE EF/ 05/007 SymBioSys, PROMETA, several PhD/postdoc and Fellow Grants; FWO: PhD/postdoc Grants, Projects G.0241.04(Functional Genomics), G.0499.04(Statistics), G.0232.05(Cardiovascular), G.0318.05(subfunctionalization), G.0553.06(VitamineD), G.0302.07(SVM/Kernel), research communities(ICCoS, ANMMM, MLDM); IWT: PhD Grants, GBOU-McKnow-E(Knowledge management algorithms), GBOU-ANA(biosensors), TADBioScope-IT, Silicos; SBO-BioFrame, SBO-MoKa, TBMEndometriosis, TBM-IOTA3, O&O-Dsquare; Belgian Federal Science Policy Office: IUAP P6/25(BioMaGNet, Bioinformatics and Modeling: from Genomes to Networks, 2007-2011); EURTD: ERNSI: European Research Network on System Identification; FP6-NoE Biopattern; FP6-IP e-Tumours, FP6-MC-EST Bioptrain, FP6-STREP Strokemap.
Authors’ Affiliations
References
- Tretyakov K: Methods of genomic data fusion: An overview.2006. [http://ats.cs.ut.ee/u/kt/hw/fusion/fusion.pdf]Google Scholar
- Vapnik V: The Nature of Statistical Learning Theory. Springer-Verlag, New York; 1995.View ArticleGoogle Scholar
- Shawe-Taylor J, Cristianini N: Kernel methods for pattern analysis. Cambridge: Cambridge University Press; 2004.View ArticleGoogle Scholar
- Lanckriet GRG, Cristianini N, Bartlett P, Ghaoui LE, Jordan MI: Learning the Kernel Matrix with Semidefinite Programming. Journal of Machine Learning Research 2005, 5: 27–72.Google Scholar
- Bach FR, Lanckriet GRG, Jordan MI: Multiple kernel learning, conic duality, and the SMO algorithm. Proceedings of 21st International Conference of Machine Learning 2004.Google Scholar
- Lanckriet GRG, De Bie T, Cristianini N, Jordan MI, Noble WS: A statistical framework for genomic data fusion. Bioinformatics 2004, 20: 2626–2635. 10.1093/bioinformatics/bth294View ArticlePubMedGoogle Scholar
- De Bie T, Tranchevent LC, Van Oeffelen L, Moreau Y: Kernel-based data fusion for gene prioritization. Bioinformatics 2007, 23: i125-i132. 10.1093/bioinformatics/btm187View ArticlePubMedGoogle Scholar
- Ng AY: Feature selection, L1 vs. L2 regularization, and rotational invariance. Proceedings of 21st International Conference of Machine Learning 2004.Google Scholar
- Kloft M, Brefeld U, Sonnenburg S, Laskov P, Müller K, Zien A: Efficient and Accurate Lp-norm Multiple Kernel Learning. Advances in Neural Information Processing Systems 22 2009.Google Scholar
- Grant M, Boyd S: CVX: Matlab Software for Disciplined Convex Programming, version 1.21.2010. [http://cvxr.com/cvx]Google Scholar
- Grant M, Boyd S: Graph implementations for nonsmooth convex programs.In Recent Advances in Learning and Control Lecture Notes in Control and Information Sciences Edited by: Blondel V, Boyd S, Kimura H. Springer-Verlag Limited; 2008, 95–110. [http://stanford.edu/~boyd/graph_dcp.html] full_textGoogle Scholar
- Tax DMJ, Duin RPW: Support vector domain description. Pattern Recognition Letter 1999, 20: 1191–1199. 10.1016/S0167-8655(99)00087-2View ArticleGoogle Scholar
- Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC: Estimating the support of a high-dimensional distribution. Neural Computation 2001, 13: 1443–1471. 10.1162/089976601750264965View ArticlePubMedGoogle Scholar
- Sedumi[http://sedumi.ie.lehigh.edu/]
- Andersen ED, Andersen KD: The MOSEK interior point optimizer for linear programming: an implementation of the homogeneous algorithm. High Perf Optimization 2000, 197–232.View ArticleGoogle Scholar
- Kim SJ, Magnani A, Boyd S: Optimal kernel selection in kernel fisher discriminant analysis. Proceeding of 23rd International Conference of Machine Learning 2006.Google Scholar
- Ye JP, Ji SH, Chen JH: Multi-class discriminant kernel learning via convex programming. Journal of Machine Learning Research 2008, 40: 719–758.Google Scholar
- Sonnenburg S, Rätsch G, Schäfer C, Schölkopf B: Large scale multiple kernel learning. Journal of Machine Learning Research 2006, 7: 1531–1565.Google Scholar
- Hettich R, Kortanek KO: Semi-infinite programming: theory, methods, and applications. SIAM Review 1993, 35(3):380–429. 10.1137/1035089View ArticleGoogle Scholar
- Kaliski J, Haglin D, Roos C, Terlaky T: Logarithmic barrier decomposition methods for semi-infinite programming. International Transactions in Operations Research 4(4):Google Scholar
- Reemtsen R: Some other approximation methods for semi-infinite optimization problems. Jounral of Computational and Applied Mathematics 1994, 53: 87–108. 10.1016/0377-0427(92)00122-PView ArticleGoogle Scholar
- Suykens JAK, Van Gestel T, Brabanter J, De Moor B, Vandewalle J: Least Squares Support Vector Machines. World Scientific Publishing, Singapore; 2002.Google Scholar
- Veropoulos K, N C, C C: Controlling the sensitivity of support vector machines. Proc of the IJCAI 99 1999, 55–60.Google Scholar
- Zheng Y, Yang X, Beddoe G: Reduction of False Positives in Polyp Detection Using Weighted Support Vector Machines. Proc. of the 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 2007, 4433–4436. full_textGoogle Scholar
- Suykens JAK, De Brabanter J, Lukas L, Vandewalle J: Weighted least squares support vector machines : robustness and sparse approximation. Neurocomputing, Special issue on fundamental and information processing aspects of neurocomputing 2002, 48(1–4):85–105.Google Scholar
- Cawley GC: Leave-One-Out Cross-Validation Based Model Selection Criteria for Weighted LS-SVMs. Proc. of 2006 International Joint Conference on Neural Networks 2006, 1661–1668. full_textGoogle Scholar
- Aerts S, Lambrechts D, Maity S, Van Loo P, Coessens B, De Smet F, Tranchevent LC, De Moor B, Marynen P, Hassan B, Carmeliet P, Moreau Y: Gene prioritization through genomic data fusion. Nature Biotechnology 2006, 24: 537–544. 10.1038/nbt1203View ArticlePubMedGoogle Scholar
- Yu S, Van Vooren S, Tranchevent LC, De Moor B, Moreau Y: Comparison of vocabularies, representations and ranking algorithms for gene prioritization by text mining. Bioinformatics 2008, 24: i119-i125. 10.1093/bioinformatics/btn291View ArticlePubMedGoogle Scholar
- Leslie C, Eskin E, Weston J, Noble WS: The spectrum kernel: a string kernel for SVM protein classification. Proc. of the Pacific Symposium on Biocomputing 2002 2002.Google Scholar
- Eeles RA, Kote-Jarai Z, Giles GG, Olama AAA, Guy M, Jugurnauth SK, Mulholland S, Leongamornlert DA, Edwards SM, Morrison Jea: Multiple newly identified loci associated with prostate cancer susceptibility. Nat Genet 2008, 40: 316–321. 10.1038/ng.90View ArticlePubMedGoogle Scholar
- Thomas G, Jacobs KB, Yeager M, Kraft P, Wacholder S, Orr N, Yu K, Chatterjee N, Welch R, Hutchinson Aea: Multiple loci identified in a genome-wide association study of prostate cancer. Nat Genet 2008, 40: 310–315. 10.1038/ng.91View ArticlePubMedGoogle Scholar
- Gudmundsson J, Sulem P, Rafnar T, Bergthorsson JT, Manolescu A, Gudbjartsson D, Agnarsson BA, Sigurdsson A, Benediktsdottir KR, Blondal Tea: Common sequence variants on 2p15 and Xp11.22 confer susceptibility to prostate cancer. Nat Genet 2008, 40: 281–283. 10.1038/ng.89View ArticlePubMedPubMed CentralGoogle Scholar
- Daemen A, Gevaert O, Ojeda F, Debucquoy A, Suykens JAK, Sempous C, Machiels JP, Haustermans K, De Moor B: A kernel-based integration of genome-wide data for clinical decision support. Genome Medicine 2009, 1: 39. 10.1186/gm39View ArticlePubMedPubMed CentralGoogle Scholar
- Daemen A, De Moor B: Development of a kernel function for clinical data. Proc. of the 31th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 2009, 5913–5917.Google Scholar
- van den Bosch T, Daemen A, Gevaert O, Timmerman D: Mathematical decision trees versus clinician based algorithms in the diagnosis of endometrial disease. Proc. of the 17th World Congress on Ultrasound in Obstetrics and Gynecology (ISUOG) 2007, 412.Google Scholar
- Bottomley C, Daemen A, Mukri F, Papageorghiou AT, Kirk E, A P, De Moor B, Timmerman D, Bourne T: Functional linear discriminant analysis: a new longitudinal approach to the assessment of embryonic growth. Human Reproduction 2007, 24(2):278–283. 10.1093/humrep/den382View ArticleGoogle Scholar
- Gevaert O, De Smet F, Timmerman D, Moreau Y, De Moor B: Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks. Bioinformatics 2006, 22(14):e184-e190. 10.1093/bioinformatics/btl230View ArticlePubMedGoogle Scholar
- Condous G, Okaro E, Khalid A, Timmerman D, Lu C, Zhou Y, Van Huffel S, Bourne T: The use of a new logistic regression model for predicting the outcome of pregnancies of unknown location. Human Reproduction 2004, 21: 278–283.Google Scholar
- Moult J, Fidelis K, Kryshtafovych A, Rost B, Tramontano A: Critical assessment of methods of protein structure prediction - Round VIII. Proteins: Structure, Function, and Bioinformatics 77(S9):Google Scholar
- Kloft M, Brefeld U, Laskov P, Sonnenburg S: Non-sparse multiple kernel learning. NIPS 08 workshop: kernel learning automatic selection of optimal kernels 2008.Google Scholar
- Kowalski M, Szafranski M, Ralaivola L: Multiple indefinite kernel learning with mixed norm regularization. Proc of the 26th International Conference of Machine Learning 2009.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.