Skip to main content
Figure 1 | BMC Bioinformatics

Figure 1

From: Support Vector Machine Implementations for Classification & Clustering

Figure 1

A sketch of the hyperplane separability heuristic for SVM binary classification. An SVM is trained to find an optimal hyperplane that separates positive and negative instances, while also constrained by structural risk minimization (SRM) criteria, which here manifests as the hyperplane having a thickness, or "margin," that is made as large as possible in seeking a separating hyperplane. A benefit of using SRM is much less complication due to overfitting (a common problem with Neural Network discrimination approaches). Given its geometric expression, it is not surprising that a key construct in the SVM formulation (via the choice of kernel) is the notion of "nearness" between instances (or nearness to the hyperplane, where it gives a measure of confidence in the classification, i.e., instances further from the decision hyperplane are called with greater confidence). Most notions of nearness explored in this context have stayed with the geometric paradigm and are known as "distance kernels," one example being the familiar Gaussian kernel which is based on the Euclidean distance: KGaussian(x,y) = exp(-DEucl.(x,y)2/2σ2), where DEucl.(x,y) = [∑k(xk-yk)2]1/2 is the usual Euclidean distance. Those kernels are used in the signal pattern recognition analysis in Figure 8 along with a new class of kernels, "divergence kernels," based on a notion of nearness appropriate when comparing probability distributions (or probability feature vectors). The main example of this is the Entropic Divergence Kernel: KEntropic = exp(-DEntropic.(x,y)2/2σ2), where DEntropic.(x,y) = D(x||y) + D(y||x) and D(..||..) is the Kullback-Leibler Divergence (or relative entropy) between x and y.

Back to article page