 Research
 Open access
 Published:
Improvement of variables interpretability in kernel PCA
BMC Bioinformatics volume 24, Article number: 282 (2023)
Abstract
Background
Kernel methods have been proven to be a powerful tool for the integration and analysis of highthroughput technologies generated data. Kernels offer a nonlinear version of any linear algorithm solely based on dot products. The kernelized version of principal component analysis is a valid nonlinear alternative to tackle the nonlinearity of biological sample spaces. This paper proposes a novel methodology to obtain a datadriven feature importance based on the kernel PCA representation of the data.
Results
The proposed method, kernel PCA Interpretable Gradient (KPCAIG), provides a datadriven feature importance that is computationally fast and based solely on linear algebra calculations. It has been compared with existing methods on three benchmark datasets. The accuracy obtained using KPCAIG selected features is equal to or greater than the other methods’ average. Also, the computational complexity required demonstrates the high efficiency of the method. An exhaustive literature search has been conducted on the selected genes from a publicly available Hepatocellular carcinoma dataset to validate the retained features from a biological point of view. The results once again remark on the appropriateness of the computed ranking.
Conclusions
The blackbox nature of kernel PCA needs new methods to interpret the original features. Our proposed methodology KPCAIG proved to be a valid alternative to select influential variables in highdimensional highthroughput datasets, potentially unravelling new biological and medical biomarkers.
Background
The recent advancement in highthroughput biotechnologies is making large multiomics datasets easily available. Bioinformatics has recently entered the Big Data era, offering researchers new perspectives to analyse biological systems to discover new genotypephenotype interactions.
Consequently, new adhoc methods to optimise postgenomic data analysis are needed, considering the high complexity and heterogeneity involved. For instance, multiomics datasets pose the additional difficulty of dealing with a multilayered framework making data integration extremely challenging.
In this context, kernel methods offer a natural theoretical framework for the high dimensionality and heterogeneous nature of omics data, addressing their peculiar convoluted nature [63]. These methods facilitate the analysis and the integration of various types of omics data, such as vectors, sequences, networks, phylogenetic trees, and images, through a relevant kernel function. Using kernels enables the representation of the datasets in terms of pairwise similarities between sample points, which is helpful for handling highdimensional sample spaces more efficiently than using Euclidean distance alone. Euclidean distance can be inadequate in complex scenarios, as stated in [17], but kernels can help overcome this limitation. Moreover, kernel methods have the advantage of providing a nonlinear version of any linear algorithm which relies solely on dot products. For instance, Kernel Principal Component Analysis [62], Kernel Canonical Correlation Analysis [4], Kernel Discriminant Analysis [53] and Kernel Clustering [21] are all examples of nonlinear algorithms enabled by the socalled kernel trick.
This work will focus on the kernelized version of Principal Component Analysis, KPCA, that provides a nonlinear alternative to the standard PCA to reduce the sample space dimensions.
However, KPCA and kernel methods, in general, pose new challenges in interpretability. The socalled preimage problem arises since data points are only addressed through the kernel function, causing the original features to be lost during the data embedding process. The initial information contained in the original variables is summarised in the pairwise kernel similarity scores among data sample points. Thus, retrieving the original input dimensions is highly challenging when it comes to identifying the most prominent features. Even if it is possible for certain specific kernels to solve the preimage problem through a fixedpoint iteration method, the provided solution is typically numerically unstable since it involves a nonconvex optimisation problem [59]. Moreover, in most cases, the exact preimage does not even exist [43].
However, it is possible to find works that aim at finding the preimage problem solution, like the preimage based on distance constraints in the feature space in [32] or local isomorphism as in [25].
Instead, in this article, we propose KPCA Interpretable Gradient (KPCAIG), a novel methodology to assess the contribution of the original variables on the KPCA solution, which is based on the computations of the partial derivatives of the kernel itself. More specifically, this method aims at identifying the most influential variables for the kernel principal components that account for the majority of the variability of the data. To the best of our knowledge KPCAIG is the first method for KPCA that offers a computationally fast and stable datadriven feature ranking to identify the most prominent original variables, solely based on the norm of gradients computations. Consequently, unimportant descriptors can be ignored, refining the kernel PCA procedure whose similarity measure can be influenced by irrelevant dimensions [7].
Existing approaches to facilitate feature interpretability in the unsupervised setting
The literature on unsupervised feature selection is generally less extensive than its supervised learning counterpart. One of the main reasons for this disparity is that the selection is made without a specific prediction goal, making it difficult to evaluate the quality of a particular solution. In the same way, the unsupervised feature selection field that takes advantage of the kernel framework has been found to be less explored than kernel applications with classification purposes. As mentioned earlier, interpreting kernel PCA requires additional attention as the kernel principal component axes themselves are only defined by the similarity scores of the sample points. However, the literature has limited attempts to explain how to interpret these axes after the kernel transformation. Therefore, feature selection methods based on KPCA are rare.
Among others, [52] proposed a method to visualize the original variables into the 2D kernel PCs plot. For every sample point projected in the KPCA axes, they propose to display the original variables as arrows representing the vector field of the direction of maximum growth for each input variable or combination of them. This algorithm does not provide variable importance ranking, requiring previous knowledge about which variables to display.
On the contrary, [40] introduced a variable importance selection method to identify the most influential variables for every principal component based on random permutation. The procedure is performed for all variables, selecting the ones that result in the largest CroneCrosby [14] distance between kernel matrices, i.e. the variables whose permutations of the observations lead to a significant change in the kernel Gram matrix values. However, the method does not come with a variable representation and can be computationally expensive, as many other permutationbased methods. This method will be denoted as KPCApermute in the rest of the article.
Another method that takes advantage of the kernel framework is the unsupervised method UKFS with its extension UKFSKPCA in [7] where the authors proposed to select important features through a nonconvex optimization problem with a \(\ell _1\) penalty for a Frobenius norm distortion measure.
As exhaustively described in the overview presented in [35], there are different approaches to assess variable importance in an unsupervised setting not based on the kernel framework. Among others, we can mention two methodologies that are based on the computation of a score, the Laplacian Score lapl in [23] and its extension Spectral Feature Selection SPEC in [78]. Other alternatives are the MultiCluster Feature Selection MCFS in [8], the Nonnegative Discriminative Feature Selection NDFS in [37], and the Unsupervised Discriminative Feature Selection UDFS in [75]. These methods aim to select features by keeping only the ones that best represent the implicit nature of the clustered data. Then, Convex Principal Feature Selection CPFS [41] adopts a distinct approach to feature selection, focusing on selecting a subset of features that can best reconstruct the projection of the data on the initial axes of the Principal Component Analysis.
As mentioned, the present study introduces a novel contribution to the interpretability of variables in kernel PCA, assuming that the first kernel PC axes contain the most relevant information about the data. The newly proposed method follows and extends the idea proposed by [52], with the fundamental difference that it gives a datadriven features importance ranking. Moreover, contrarily to KPCApermute in [40], it does not have a random nature while being considerably faster.
Methods
This section presents the formulation behind our proposed method KPCAIG, starting with the description of the kernel framework.
Kernel PCA
Given a dataset of n observations \(\varvec{x}_1, \ldots , \varvec{x}_n\) with \(\varvec{x}_i \in\) \(\chi\), a function k defined as k: \(\chi\) \(\times\) \(\chi\) \(\longrightarrow {\mathbb {R}}\) is a valid kernel if it is symmetric and positive semidefinite i.e. \(k(\varvec{x}_i, \varvec{x}_j) = k(\varvec{x}_j, \varvec{x}_i)\) and \(\varvec{c}^T\varvec{K}\varvec{c}\) \(\geqslant\) 0, \(\forall \varvec{c} \in \mathbb {R}^n\), where \(\varvec{K}\) is the \(n \times n\) kernel matrix containing all the data pairwise similarities \(\varvec{K} = k(x_i,x_j)\). The input set \(\chi\) does not require any assumption. In this work we consider it to be \(\chi\)\(= \mathbb {R}^p\).
Every kernel function is associated with an implicit function \(\phi : \,{\chi }\longrightarrow\) \(\mathcal {H}\) which maps the input points into a generic feature space \(\mathcal {H}\), with possibly an infinite dimensionality, with the expression \(k(\varvec{x}_i, \varvec{x}_j) = \langle \phi (\varvec{x}_i), \phi (\varvec{x}_j)\rangle\). This relation allows to compute the dot products in the feature space, implicitly applying the kernel function to the input objects, without explicitly computing the mapping function \(\phi\).
Principal Component Analysis is a wellestablished linear algorithm to extract the data structure in an unsupervised setting [22]. However, it is commonly accepted that in specific fields, such as bioinformatics, assuming a linear sample space may not help to capture the data manifold adequately [52]. In other words, the relationships between the variables may be nonlinear, making linear methods unsuitable. Hence, with highdimensional data such as genomic data, where the number of features is usually much larger than the number of samples, nonlinear methods like kernel methods can provide a valid alternative for data analysis.
A compelling approach to overcome this challenge is through kernel PCA, which was introduced in [62]. Kernel PCA applies PCA in the feature space generated by the kernel, and as PCA relies on solving an eigenvalue problem, its kernelized version operates under the same principle.
The algorithm requires the data to be centered in the feature space, and the diagonalization of the centered covariance matrix in the feature space \(\mathcal {H}\) is equivalent to the eigendecomposition of the kernel matrix \(\varvec{K}\). The data coordinates in the feature space are unknown as \(\phi\) is not explicitly computed. Consequently, the required centering of variables in the feature space cannot be done directly. However, it is possible to compute the centered Gram matrix \(\varvec{\tilde{K}}\) as \(\varvec{\tilde{K}} =\varvec{K}  \frac{1}{n}\varvec{K}\varvec{1}_n\varvec{1}_n^T  \frac{1}{n}\varvec{1}_n\varvec{1}_n^T\varvec{K} + \frac{1}{n^2}(\varvec{1}_n^T \varvec{K}\varvec{1}_n)\varvec{1}_n\varvec{1}_n^T\) with \(\varvec{1}_n\) a vector with length n and 1 for all entries. If we express the eigenvalues of \(\varvec{\tilde{K}}\) with \(\lambda _1 \ge \lambda _2 \ge \cdots \ge \lambda _n\) and the corresponding set of eigenvectors \(\tilde{\varvec{a}}^1,\ldots ,\tilde{\varvec{a}}^n\), the principal component axes can be expressed as \(\tilde{\varvec{v}}^k = \sum _{i=1}^{n} \tilde{a}_{i}^k \phi (\varvec{x}_i)\) with \(\tilde{\varvec{v}}^k\) and \(\tilde{\varvec{a}}^k\) orthonormal in \(\mathcal {H}\), \(k = 1, \dots , q\) and q the number of retained components. Thus solving \(n \varvec{\lambda } \tilde{\varvec{a}} = \varvec{\tilde{K}}\tilde{\varvec{a}}\), it is possible to compute the projection of the points into the subspace of the feature space spanned by the eigenvectors. The projection of a generic point \(\varvec{x}\) into this subspace becomes then \(\rho _k:= \langle \tilde{\varvec{v}}^k, \phi (\varvec{x}) \rangle = \sum _{i=1}^{n} \tilde{a}_{i}^k k(\varvec{x}, \varvec{x}_i )\).
Likewise, utilizing the concise, explicit form of the centered gram matrix \(\varvec{\tilde{K}}\), it is possible to express the projection of an arbitrary point \(\varvec{x}\) into the subspace spanned by the eigenvectors \(\tilde{\varvec{v}}^k\). Defining \(\varvec{Z} = (k(\varvec{x}, \varvec{x}_i))_{n\times 1}\), we can express this projection with the \(1 \times q\) row vector
with \(\tilde{\varvec{v}}\) being the \(n \times q\) matrix with the eigenvectors \(\tilde{\varvec{v}}^1 \dots , \tilde{\varvec{v}}^q\) as columns.
As we observe, the kernel PCA algorithm can be mathematically represented using only the entries of the kernel matrix. This means that the algorithm operates entirely on the original input data without requiring the computation of the new coordinates in the feature space. This technique effectively resolves the issue of potentially high computational complexity by allowing the input points to be implicitly mapped into the feature space.
However, it also introduces new challenges in terms of interpretation. Determining which input variables have the most significant impact on the kernel principal components can be highly challenging, making it difficult to interpret them in terms of the original features. In other words, since the kernel function maps the data to a higherdimensional feature space, it can be hard to understand how the original features contribute to the newly obtained kernel principal components.
In the previous section, we have mentioned the few techniques available in the literature that can be used to gain insight into the original input variables that had the most influence on the KPCA solution. The following section presents our contribution to providing practitioners with a datadriven and faster variable ranking methodology.
Improvement of KPCA interpretability with KPCAIG
It is known that gradient descent is one of the most common algorithms for the training phase of most neural networks [55]. The norm of the cost function gradient plays a crucial role as it contributes to the step size for each iteration, together with its direction and the learning rate.
Consequently, for explainability in computer vision classification models, gradientbased methods are a widespread approach for many networks, such as deep neural networks (DNN) and convolutional neural networks (CNNs). Some of the most used techniques are presented in the review proposed in [46], such as Saliency Maps [64], Deconvolutional Networks [76], Guided Backpropagation [67], SmoothGrad [65], GradientInput [3] and Integrated Gradients [68]. In post hoc explainability, they are often preferred over perturbationbased methods since they are not only less computationally expensive but should also be prioritized when a solution robust to input perturbation is required [46]. In the Deep Learning (DL) field, the starting point behind all the gradientbased methods is to assess the socalled attribution value of every input feature of the network. Formally, with a p dimensional input \(x = (x_1, \dots , x_p)\) that produces the output \(S(x) = (S_1(x),\dots ,S_C(x))\), with C the numbers of output neurons, the final goal is to compute for a specific neuron c the relevance of each input feature for the output. This contribution for the target neuron c can be written as \(R^c = (R^c_1, \dots , R^c_p)\), as described in [2]. Depending on the method considered, the attributions are found by a specific algorithm. Generally, the gradientbased algorithms involve the computation of partial derivatives of the output \(S_c(x)\) with respect to the input variables.
In the unsupervised field of KPCA, the idea cannot be applied directly as there is no classification involved, and no numeric output can be used to test the relevance of an input feature. However, as shown in [52, 58], every original variable can be represented in the input space with a function f defined in \(\mathbb {R}^p\), representing the position of every sample point in the input space based on the values of the p variables.
Thus, we propose to compute at each sample point the norm of the partial derivative of every induced feature curve projected into the eigenspace of the kernel Gram matrix.
In support of this procedure, some works in the neuroimaging and earth system sciences domain have also shown that kernel derivatives may indicate the influence carried by the original variables as in [29, 51].
Consequently, the idea is that when the norm of the partial derivative for a variable is high, it means that the variable substantially affects the position of the sample points in the kernel PC axes. Conversely, when the norm of the partial derivative for a variable is small, the variable can be deemed negligible for the kernel principal axes.
To sum up, the novel idea of KPCAIG is to find the most relevant features for the KPCA studying the projections of the gradients of the feature functions onto the linear subspace of the feature space induced by the kernel by computing the lengths of the gradient vectors with respect to each variable at each sample point as they represent how steep the direction given by the partial derivative of the induced curve is.
For completeness, we should also mention that the use of gradient in the kernel unsupervised learning framework can also be found in the context of Kernel canonical correlation analysis, as [69] proposed a new variant of KCCA that does not rely on the kernel matrix but where the maximization of canonical correlation is computed through the gradients of the preimages of the projection directions.
Analytically we can describe our method KPCAIG as follows. First, we can express the projection of f in the feature space through the implicit map \(\phi\) as h. More specifically, h is defined on the subspace of \(\mathcal {H}\) where the input points are mapped, i.e., on \(\phi ({\chi } )\) assuming it is sufficiently smooth to support a Riemannian metric [60]. In [58], the authors demonstrated how the gradient of h can be expressed as a vector field in \(\phi ({\chi })\) under the coordinates \(\varvec{x}= (x^1,\dots , x^p)\) as
where \(j = 1, \dots , p\), \(D_b\) is the partial derivative with respect to the b variable and \(g^{jb}\) is the inverse of the Riemannian metric induced by \(\phi ({\chi } )\) i.e. the symmetric metric tensor \(g_{jb}\) which is unknown and it can be written solely in terms of the kernel [61].
The idea is to look for the curves u whose tangent vectors in t are \(u'(t) = grad(h)\) as they give an indication of the local maximum variation directions of h.
In the previous section, we showed how to represent the projection of every mapped generic point \(\phi (\varvec{x})\) into the subspace spanned by the eigenvectors of \(\varvec{\tilde{K}}\) in (1).
Similarly, the u(t) curves can be projected into the subspace of the kernel PCA. We define \(u(t) = k(\cdot , \varvec{x}(t))\) with \(\varvec{x}(t)\) the solution of \(\frac{dx^j}{dt} = \textrm{grad}\Big(h\Big)^j\) and \(\varvec{Z}_t = (k(\varvec{x}(t), \varvec{x}_i))_{n \times 1}\).
Now we can define the induced curve in the KPCA axes with the row vector:
In order to assess the influence of the original variables on the coordinates of the data points into the kernel principal axes, we can represent the gradient vector field of h i.e. the tangent vector field of u(t) into the KPCA solution. Formally, the tangent vector at an initial condition \(t = t_0\) with \(x_0 = \phi ^{1}\circ u(t_0)\) can be obtained as \(\frac{du}{dt}_{t=t_0}\) and the projected directions of maximum variation as
with
If we assume that \(\phi ({\chi })\) is flat (Euclidean subspace), the metric tensor \(g_{jb}\) becomes the Kronecker delta \(\delta _{jb}\) which is equal to 0 for \(j \ne b\) and to 1 when \(j = b\).
In this case, (2) becomes
and (5):
If we further assume that f takes the linear form \(f(\varvec{x}) = \varvec{x} + t\varvec{e}_j\) with \(t \in \mathbb {R}\) and \(\varvec{e}_j = (0, \dots ,1, \dots , 0\)), having the value 1 only for the jth component, we obtain the same expression as in [52]
Thus, with these hypotheses we can use (4) to compute the projected directions of maximum variation with respect to the jth variable as
with \(\frac{d \varvec{Z}_{t}^i}{dt}\Big _{t=0}\) as in (8).
If we take as kernel the radial basis kernel \(k(\varvec{x}, \varvec{x}_i) = exp( \sigma \Vert \varvec{x}  \varvec{x}_i \Vert ^2)\), then (8) becomes, as showed in [52]:
with \(i =1, \ldots , n\) and with \(x^{j}\) the value for the variable j for the generic point \(\varvec{x}\).
If we consider that the \(1 \times q\) row vector \(w^j\) can be computed for all the training points rather than only for a generic point \(\varvec{x}\), we obtain a \(n \times q\) matrix \(\textbf{W}^j\) giving the direction of maximum variation associated with the jth variable for each input point.
Thus, the idea is to first compute the norm of this partial derivative with respect to the variable j for each sample point and then compute the mean value of these n contributions. The score that we obtain suggests the relevance of the j variable in the KPCA solution.
Analytically, the \(n \times q\) matrix \(\textbf{W}^j\) rows are denoted by the \(1 \times q\) vectors \(\textbf{v}^j_{i}\) as \(\textbf{v}_{i}^j = (w^j_{i1}, \dots , w^j_{iq})\) where \(w^j_{ik}\) denotes the entry in the ith row and kth column of \(\textbf{W}^j\). The square root of the quadratic norm of \(\textbf{v}_{i}^j\) is given by
We can now compute the square root of the quadratic norms for all n rows, resulting in n values \(\Vert \textbf{v}^j_{1}\Vert , \ldots , \Vert \textbf{v}^j_{n}\Vert\). The mean of these values is given by
Thus, \(r^j\), the mean of the norm vectors of the partial derivative of \(\varphi ^j\) among all the n sample points, it can give an indication of the overall influence of the jth variable on the points.
Finally, we can repeat the procedure for all the p variables with \(j = 1, \dots , p\). The vector \(\varvec{r} = (r^1, \dots , r^p)\) will contain all the mean values of the norm vectors for every of the p variables, and after sorting them in descending order, it represents the ranking of the original features proposed by KPCAIG. Every entry of \(\varvec{r}\) is a score that indicates the impact of every variable on the kernel PCA representation of the data, from the most influential to the least important.
The method is noniterative, and it only requires linear algebra. Thus, it is not susceptible to numerical instability or local minimum problems. It variables representing the ranking of the original features proposed by KPCAIG. Every entry of is a score that indicates the impact of every variable on the kernel PCA representation of the data, it is then sorted decreasingly, from the most influential to the least important.is computationally very fast, and it can be applied to any kernel function that admits a firstorder derivative. The described procedure has been implemented on R, and the code can be available upon request to the authors.
Results
We conducted experiments on three benchmark datasets from the biological domain to assess the accuracy of the proposed unsupervised approach for feature selection. These datasets include two microarray datasets, named Carcinom and Glioma, which are available in the Python package scikitfeature [35] and the gene expression data from normal and prostate tumour tissues [10], GPL93 from the GEO, a public functional genomics data repository. Glioma contains the expression of 4434 genes for 50 patients, while Carcinom 9182 genes for 174 individuals. Both datasets have already been used as a benchmark in numerous studies including several methods comparisons, such as [7, 35]. Then, the dataset GPL93 contains the expression of 12626 genes for 165 patients, and it has been chosen for its complexity and higher dimensionality.
The idea is to compare the proposed methodology KPCAIG with existing unsupervised feature selection methods from diverse frameworks, as conducted in [7, 35]:

lapl [23], to include one method that relies on the computation of a score.

NDFS [37], to add one of the methods primarily designed for clustering. It is based on the implicit assumption that samples are structured into subgroups and demands the a priori definition of the number of clusters.

KPCApermute in [40] available in the mixKernel R package to include another methodology from the context of kernel PCA.
To evaluate the selected features provided by the four methods, we measured the overall accuracy (ACC) and normalized mutual information (NMI) [15] based on kmeans cluster performance. For each method, the kmeans clustering ACC and NMI have been obtained using several subsets with a different number d of selected features, with \(d \in \{10,20, \dots ,290, 300\}\). Thus, the relevance of the selected values has been estimated according to their ability to reconstruct the clustered nature of the data. More specifically, the three datasets Glioma, Carcinom and GPL93 are characterized by 4, 11 and 4 groups respectively. Thus, the kmeans clustering was computed using the correct number of clusters in the datasets to obtain a metric for the capability of the selected features to keep this nature.
Note that only the NDFS method is implemented to explicitly obtain an optimum solution in terms of clustering, also requiring in advance the number of groups in the data. For each method, the kmeans clustering was run 20 times to obtain a mean of the overall accuracy and normalized mutual information for each of the 30 subsets of selected features. Both our novel method KPCAIG and KPCApermute have been employed with a Gaussian kernel with a sigma value depending on the dataset. The selected features are, in both cases, based on the first 3, 5 and 3 kernel PC axes for Glioma, Carcinom and GPL93, respectively. The CPU time in seconds required to obtain the feature ranking for all the methods has also been observed. The experiment was conducted on a standard laptop Intel Core i5 with 16GB RAM.
Evaluation on benchmarks datasets
In Table 1, we can see the results in terms of mean Accuracy and NMI over 20 runs for different numbers of retained features d. For the first dataset Glioma lapl seems to show the best performance in terms of NMI and AUC, except when \(d=300\) where the Accuracy obtained with KPCAIG is the highest, even if all the methods seem to behave very similarly in terms of ACC. Analyzing the results for the other two datasets Carcinom and GPL93 that are considerably bigger and possibly more complex in terms of sample space manifold, the two methods based on the kernel framework exhibit to surpass the lapl and NDFS approaches, especially in the GPL93 datasets. The comparison of the different approaches in terms of NMI and ACC of these two datasets can also be observed in Figs. 1 and 2.
Moreover, as shown in [7] NDFS and the other clusterbased methods like MCFS and UDFS suffer if the user selects an incorrect decision for the a priori number of clusters. In our case, we show that the proposed methodology behaves similarly or even better to a method like NDFS that is specifically optimized for this cluster setting.
The two kernelbased approaches, namely KPCApermute and our novel method KPCAIG, reveal an excellent performance in this setting, once again displaying the appropriateness of the kernel framework in the context of complex biological datasets. However, KPCAIG can provide these aboveaverage performances with a considerably lower CPU time.
Only lapl seems as fast as KPCAIG while showing poorer results in the more complex scenario represented in this case by the GPL93 dataset. Other methods, such as the concrete autoencoder in [1], have proven successful in this context. The results obtained with the concrete autoencoder, as demonstrated in [7], were comparable or even inferior in terms of accuracy and NMI. Furthermore, the computational time required to achieve these results was on the order of days. As a result, we opted not to include it in our simulations.
Application on Hepatocellular carcinoma dataset
Liver cancer is a global health challenge, and it is estimated that there will be over 1 million cases by 2025. Hepatocellular carcinoma (HCC) is the most common type of liver cancer, accounting for around \(90\%\) of cases [39].
The most significant risk factors associated with HCC are, among others, chronic hepatitis B and C infections, nonalcoholic fatty disease, and chronic alcohol abuse [44]. To analyse the use of the KPCAIG method, we used the expression profiling by array of an HCC dataset from the Gene expression Omnibus (series GSE102079). It contains the gene expression microarray profiles of 3 groups of patients. First, 152 patients with HCC who were treated with hepatic resection between 2006 and 2011 at Tokyo Medical and Dental University Hospital. Then, the gene expression of normal liver tissues of 14 patients as control [12]. The third group contains the gene expression of 91 patients with liver cancer but of nontumorous liver tissue. The total expression matrix for the 257 patients contains the expression of 54613 genes, and the data has been normalised by robust multichip analysis (RMA) as in [19] and scaled and centered before applying KPCA.
To show the potentiality of KPCAIG, we first perform kernel PCA with radial basis kernel with \(\sigma = 0.00001\), which was set heuristically to maximize the explained variance and obtain a clear two dimension data representation. Even if detecting groups is not the optimization criterion of kernel PCA, it is possible to see that the algorithm catches the dataset’s clustered structure in Figs. 7 and 8.
For this reason, applying a method like the proposed KPCAIG can enlighten the kernel component axes, possibly giving an interpretation of the genes’ influence on the sample points representation.
The KPCAIG provides a feature ranking based on the KPCA solution, in this case, based on the first two kernel Principal Components. As mentioned before, one of the main advantages of the proposed method is the fast computational time required, as with this highdimensional dataset, the CPU time was 654.1 seconds. Table 2 presents the first 25 genes and Fig. 3 the distribution of the 54613 variables scores.
To obtain an indication of the significance of the variables selected by KPCAIG in terms of retained information, we computed the Silhouette Coefficient, a metric used to assess the goodness of a clustering solution. The score values are in \([1, 1]\) where values close to 1 mean that the clusters are clearly separated [6, 54]. In this case, even if KPCA is not optimized to create clusters of data, a 2D KPCA plot with clear separation among different groups may suggest a solution with more explained variability.
Starting from the feature ranking given by KPCAIG, we computed the silhouette scores for the KPCA solution for 5462 subsets in terms of original variables, i.e. datasets with the increasing number of features of \(5, 15, 25, \dots , 54605, 54613\). The metric was first computed using the features ranked by KPCAIG and then with 5 different random feature rankings.
From Fig. 4, it can be seen that the scores obtained for kernel PC solutions applied to datasets composed by the features selected by KPCAIG are consistently higher when compared to those where the ranking of the variables is chosen randomly.
To see more in detail the behaviour of the curves for reduced datasets with a small number of features, which can be more relevant for practical biological use, Fig. 5 represents the scores for 163 subsets of the form in terms of selected features of \(5,10,\dots ,100,110,\dots , 1000, 2000,\dots , 54000, 54613\). As we can see, the information retained by the KPCAIG selected features is higher, as they lead to Silhouette scores for the KPCA plots closer to 1. All the obtained coefficients are based on a two cluster solution where the sigma parameter of the Gaussian kernel has been adapted to the different numbers of considered features to obtain a KPCA solution with maximized explained variance.
Moreover, to check the generalization properties of KPCAIG, we compared the explained variance captured by the selected variables on training and test data. More specifically, based on 5 different random traintest splits (192 train and 65 test points), we computed the feature ranking using KPCAIG for each of the 5 training sets. Based on these features, we plotted the variance explained by KPCA. Again the analysis is made on different subsets with an increasing number of variables retained, namely \(5,10,\dots ,100,110,\dots , 1000, 2000,\dots , 54000, 54613\).
Now, the variable ranking obtained by KPCAIG on the training set is applied on the test sets. Thus, the computed KPCA explained variance on the test sets containing only those features is compared to the one obtained with KPCA applied on the training sets. In Fig. 6, it can be seen that the variance explained by KPCA for all the subsets of selected features on the training and the test sets is consistently similar for all of the 5 randomly generated traintest splits.
Another way to assess the relevance of the obtained ranking is to visualize the genes with the method proposed by [52] and to see if the biomedical community has already investigated the retained genes.
For instance, Fig. 7 displays the representation of the variable TTC36, the second feature in the ranking provided by KPCAIG as it is the first gene that shows differential expression.
The direction of the arrows suggests an upper expression of the gene towards the cluster of patients that do not have liver cancer or patients whose liver tissue is not tumorous.
To validate the procedure, we selected relevant literature about the gene TTC36. This gene, also known as HBP21, is a protein encoding gene. It has been shown that this gene’s encoded protein may function as a tumour suppressor in hepatocellular carcinoma (HCC) since it promotes apoptosis while it has been proven to be downregulated in HCC cases [27].
Another gene that shows differential expression in the two groups is CDK1. In this case, Fig. 8 suggests that this gene seems to be upregulated in the presence of cancer tissue.
The indication found in multiple studies is that the increased expression of this gene is indeed linked with a poorer prognosis or outcome, such as high tumour grade, invasion of lymphovascular or muscularis propria, and the presence of distant metastasis [24, 34, 36, 38, 66]. In the same way, another of the most critical genes, according to KPCAIG, that seems to be prominent in the case of an HCC patient reflecting the same indication in the medical literature is ADAM10, known to be involved in the RIPing and shedding of numerous substrates leading to cancer progression and inflammatory disease [31], and indicated as a target for cancer therapy [13, 45], while being upregulated in metastasis cancers [20, 33]. Rac GTPase activating protein 1 gene RACGAP1 selected by KPCAIG shares a similar behaviour with ADAM10 and CDK1. The literature concerning this gene is also broad, where it has been marked as a potential prognostic and immunological biomarker in different types of cancer, such as gastric cancer [56], uterine carcinosarcoma [42], breast cancer [49] or colorectal cancer [26] among many others. CCNB1 has also been indicated to be an oncogenic factor in the proliferation of HCC cells [9], showing a significant impact on the patient’s survival time [16, 79] and thus has been targeted for cancer treatments [18]. PRC1 has revealed upper expression in other cancer tissues such as, among others, invasive cervical carcinomas [57], papillary renal cell carcinoma [74], pediatric adrenocortical tumour [72], while yet not being studied in depth as compared to CCNB1, ADAM10 or RACGAP1.
ASPM was known initially as a gene involved in the control of the human brain development and in the cerebral cortical size [5, 77] whose mutations may lead to primary autosomal recessive microcephaly [30], more recently its overexpression has also been linked with tumour progression as in [71, 73].
Lastly, for the group of upregulated genes in HCC, RRM2 has also been linked with low overall survival [11, 28], leading to exhaustive cancer research suggesting targeting its inhibition for different types of tumour treatments [47, 48, 50, 70].
On the other hand, the selected genes that manifest downregulation in cancerous HCC tissues are LINC01093, OIT3, VIPR1, CLEC4G, CRHBP, STAB2, CLEC1B, FCN3, FCN2 and CLEC4M. The literature regarding these genes indicates that they work as suppressors in different cancerous situations, once again endorsing the selection provided by KPCAIG for the upregulated genes.
The few genes in the first 25 selected by KPCAIG that do not exhibit differential expression using [52] method (ARPC5, IPO11, C3orf38, SCOC) are potential genes that explain much variability in the data or that share a possibly nonlinear interaction with the differential expressed genes. Since the ultimate goal of KPCA is not to discriminate groups, it is expected that some of the variables found by the novel method are not linked with a classification benefit. However, further followup on the function of these genes may be done in cooperation with an expert in the field.
Conclusion
We have seen how the unsupervised feature selection literature is narrower than its supervised counterpart. Moreover, algorithms that use the kernel principal component analysis for feature selection are reduced to a few works. In the present work, we have introduced a novel method to enhance variables’ interpretability in kernel PCA. Using benchmark datasets, we have proven the comparability in terms of accuracy with already existing and recognized methods, where the efficiency of KPCAIG has proven to be competitive. The application on the reallife Hepatocellular carcinoma dataset and the validation obtained from the comparison of the selected variables by the method with the biomedical literature have confirmed the effectiveness and strengths of the proposed methodology. In future works, further indepth analysis will be realized to assess the impact of the choice of the kernel function on the feature ranking obtained by KPCAIG. Moreover, the method will be adapted to other linear algorithms that are solely based on dotproducts hence supporting a kernelized version, such as kernel Discriminant Analysis or kernel Partial LeastSquares Discriminant Analysis.
Availability of data and materials
Glioma and Carcinoma datasets are freely available at https://github.com/jundongl/scikitfeature/tree/master/skfeature/data, GPL93 is freely available at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL93 HCC datatset is freely available at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE102079.
Abbreviations
 KPCA:

Kernel principal component analysis
 KPCAIG:

Kernel principal component analysis Interpretable Gradient
 HCC:

Hepatocellular carcinoma
 SPEC:

Spectral Feature Selection
 MCFS:

MultiCluster Feature Selection
 NDFS:

Nonnegative Discriminative Feature Selection
 UDFS:

Unsupervised Discriminative Feature Selection
 CPFS:

Convex Principal Feature Selection
 lapl:

Laplacian score
References
Abid A, Balin MF, Zou J. Concrete autoencoders for differentiable feature selection and reconstruction. 2019. https://doi.org/10.48550/ARXIV.1901.09346. arXiv:1901.09346
Ancona M, Ceolini E,Öztireli C, Gross M. Towards better understanding of gradientbased attribution methods for deep neural networks. 2017. https://doi.org/10.48550/ARXIV.1711.06104. arXiv:1711.06104
Ancona M, Ceolini E, Öztireli C, Gross M. Gradientbased attribution methods. In: Explainable AI: interpreting, explaining and visualizing deep learning. Springer; 2019. p. 169–191
Bach F, Jordan M. Kernel independent component analysis. J Mach Learn Res. 2003;3:1–48. https://doi.org/10.1162/153244303768966085.
Bond J, Roberts E, Mochida GH, Hampshire DJ, Scott S, Askham JM, Springell K, Mahadevan M, Crow YJ, Markham AF, Walsh CA, Woods CG. ASPM is a major determinant of cerebral cortical size. Nat Genet. 2002;32(2):316–20. https://doi.org/10.1038/ng995.
Brock G, Pihur V, Datta S, Datta S. clvalid: an r package for cluster validation. J Stat Softw. 2008;4:25. https://doi.org/10.18637/jss.v025.i04.
Brouard C, Mariette J, Flamary R, Vialaneix N. Feature selection for kernel methods in systems biology. NAR Genom Bioinform. 2022;1:4. https://doi.org/10.1093/nargab/lqac014.
Cai D, Zhang C, He X. Unsupervised feature selection for multicluster data. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining; 2010.
Chai N, Xie HH, Yin JP, Sa KD, Guo Y, Wang M, Liu J, Zhang XF, Zhang X, Yin H, Nie YZ, Wu KC, Yang AG, Zhang R. FOXM1 promotes proliferation in human hepatocellular carcinoma cells by transcriptional activation of CCNB1. Biochem Biophys Res Commun. 2018;500(4):924–9. https://doi.org/10.1016/j.bbrc.2018.04.201.
Chandran UR, Ma C, Dhir R, Bisceglia M, LyonsWeiler M, Liang W, Michalopoulos G, Becich M, Monzon FA. Gene expression profiles of prostate cancer reveal involvement of multiple molecular pathways in the metastatic process. BMC Cancer. 2007. https://doi.org/10.1186/14712407764.
Chen W, Yang L.g, Xu L.y, Cheng L, Qian Q, Sun L, Zhu Y.l. Bioinformatics analysis revealing prognostic significance of rrm2 gene in breast cancer. Biosci Rep. 2019;4:39. https://doi.org/10.1042/bsr20182062.
Chiyonobu N, Shimada S, Akiyama Y, Mogushi K, Itoh M, Akahoshi K, Matsumura S, Ogawa K, Ono H, Mitsunori Y, Ban D, Kudo A, Arii S, Suganami T, Yamaoka S, Ogawa Y, Tanabe M, Tanaka S. Fatty acid binding protein 4 (FABP4) overexpression in intratumoral hepatic stellate cells within hepatocellular carcinoma with metabolic risk factors. Am J Pathol. 2018;188(5):1213–24. https://doi.org/10.1016/j.ajpath.2018.01.012.
Crawford H, Dempsey P, Brown G, Adam L, Moss M. ADAM10 as a therapeutic target for cancer and inflammation. Curr Pharm Des. 2009;15(20):2288–99. https://doi.org/10.2174/138161209788682442.
Crone LJ, Crosby DS. Statistical applications of a metric on subspaces to satellite meteorology. Technometrics. 1995;37(3):324–8. https://doi.org/10.1080/00401706.1995.10484338.
Danon L, Duch J, DiazGuilera A, Arenas A. Comparing community structure identification (2005). https://doi.org/10.48550/ARXIV.CONDMAT/0505245
Ding K, Li W, Zou Z, Zou X, Wang C. CCNB1 is a prognostic biomarker for ER+ breast cancer. Med Hypotheses. 2014;83(3):359–64. https://doi.org/10.1016/j.mehy.2014.06.013.
Duda RO, Hart PE, Stork DG. 2nd ed. 2000.
Fang Y, Yu H, Liang X, Xu J, Cai X. Chk1induced CCNB1 overexpression promotes cell proliferation and tumor growth in human colorectal cancer. Cancer Biol Ther. 2014;15(9):1268–79. https://doi.org/10.4161/cbt.29691.
Gautier L, Cope L, Bolstad BM, Irizarry RA. Affyanalysis of Affymetrix GeneChip data at the probe level. Bioinformatics. 2004;20(3):307–15. https://doi.org/10.1093/bioinformatics/btg405.
Gavert N, Sheffer M, Raveh S, Spaderna S, Shtutman M, Brabletz T, Barany F, Paty P, Notterman D, Domany E, BenZe’ev A. Expression of l1CAM and ADAM10 in human colon cancer cells induces metastasis. Cancer Res. 2007;67(16):7703–12. https://doi.org/10.1158/00085472.can070991.
Girolami M. Mercer kernelbased clustering in feature space. IEEE Trans Neural Netw. 2002;13(3):780–4. https://doi.org/10.1109/tnn.2002.1000150.
Hastie T, Tibshirani R, Friedman JH. The elements of statistical learning: data mining, inference, and prediction. Springer series in statistics; 2009. https://books.google.co.uk/books?id=eBSgoAEACAAJ
He X, Cai D, Niyogi P. Laplacian score for feature selection. In: Proceedings of the 18th international conference on neural information processing systems. NIPS’05. Cambridge: MIT Press; 2005. pp 507–514.
Heo J, Lee J, Nam YJ, Kim Y, Yun H, Lee S, Ju H, Ryu CM, Jeong SM, Lee J, Lim J, Cho YM, Jeong EM, Hong B, Son J, Shin DM. The CDK1/TFCP2l1/ID2 cascade offers a novel combination therapy strategy in a preclinical model of bladder cancer. Exp Mol Med. 2022;54(6):801–11. https://doi.org/10.1038/s12276022007860.
Huang D, Tian Y, De la Torre F. Local isomorphism to solve the preimage problem in kernel methods. In: CVPR 2011; 2011. p. 2761–8. https://doi.org/10.1109/CVPR.2011.5995685.
Imaoka H, Toiyama Y, Saigusa S, Kawamura M, Kawamoto A, Okugawa Y, Hiro J, Tanaka K, Inoue Y, Mohri Y, Kusunoki M. RacGAP1 expression, increasing tumor malignant potential, as a predictive biomarker for lymph node metastasis and poor prognosis in colorectal cancer. Carcinogenesis. 2015;36(3):346–54. https://doi.org/10.1093/carcin/bgu327.
Jiang L, Kwong DLW, Li Y, Liu M, Yuan YF, Li Y, Fu L, Guan XY. HBP21, a chaperone of heat shock protein 70, functions as a tumor suppressor in hepatocellular carcinoma. Carcinogenesis. 2015;36(10):1111–20. https://doi.org/10.1093/carcin/bgv116.
Jin CY, Du L, Nuerlan AH, Wang XL, Yang YW, Guo R. High expression of RRM2 as an independent predictive factor of poor prognosis in patients with lung adenocarcinoma. Aging. 2020;13(3):3518–35. https://doi.org/10.18632/aging.202292.
Johnson JE, Laparra V, PérezSuay A, Mahecha MD, CampsValls G. Kernel methods and their derivatives: concept and perspectives for the earth system sciences. PLoS ONE. 2020;15(10):0235885. https://doi.org/10.1371/journal.pone.0235885.
Kouprina N, Pavlicek A, Collins NK, Nakano M, Noskov VN, Ohzeki JI, Mochida GH, Risinger JI, Goldsmith P, Gunsior M, Solomon G, Gersch W, Kim JH, Barrett JC, Walsh CA, Jurka J, Masumoto H, Larionov V. The microcephaly ASPM gene is expressed in proliferating tissues and encodes for a mitotic spindle protein. Hum Mol Genet. 2005;14(15):2155–65. https://doi.org/10.1093/hmg/ddi220.
Krzystanek M, Moldvay J, Szüts D, Szallasi Z, Eklund AC. A robust prognostic gene expression signature for early stage lung adenocarcinoma. Biomark Res. 2016. https://doi.org/10.1186/s4036401600583.
Kwok JTY, Tsang IWH. The preimage problem in kernel methods. IEEE Trans Neural Netw. 2004;15(6):1517–25. https://doi.org/10.1109/tnn.2004.837781.
Lee SB, Schramme A, Doberstein K, Dummer R, AbdelBakky MS, Keller S, Altevogt P, Oh ST, Reichrath J, Oxmann D, Pfeilschifter J, MihicProbst D, Gutwein P. ADAM10 is upregulated in melanoma metastasis compared with primary melanoma. J Investig Dermatol. 2010;130(3):763–73. https://doi.org/10.1038/jid.2009.335.
Li J, Wang Y, Wang X, Yang Q. CDK1 and CDC20 overexpression in patients with colorectal cancer are associated with poor prognosis: evidence from integrated bioinformatics analysis. World J Surg Oncol. 2020. https://doi.org/10.1186/s12957020018178.
Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H. Feature selection feature selection: a data perspective. ACM Comput Surv. 2017;50(6):1–45. https://doi.org/10.1145/3136625.
Li M, He F, Zhang Z, Xiang Z, Hu D. CDK1 serves as a potential prognostic biomarker and target for lung cancer. J Int Med Res. 2020;48(2):030006051989750. https://doi.org/10.1177/0300060519897508.
Li Z, Yang Y, Liu J, Zhou X, Lu H. Unsupervised feature selection using nonnegative spectral analysis. Proc AAAI Conf Artif Intell. 2021;26(1):1026–32. https://doi.org/10.1609/aaai.v26i1.8289.
Liu X, Wu H, Liu Z. An integrative human pancancer analysis of cyclindependent kinase 1 (CDK1). Cancers. 2022;14(11):2658. https://doi.org/10.3390/cancers14112658.
Llovet JM, Kelley RK, Villanueva A, Singal AG, Pikarsky E, Roayaie S, Lencioni R, Koike K, ZucmanRossi J, Finn RS. Hepatocellular carcinoma. Nat Rev Dis Primers. 2021;7:1. https://doi.org/10.1038/s41572020002403.
Mariette J, VillaVialaneix N. Unsupervised multiple kernel learning for heterogeneous data integration. Bioinformatics. 2017;34(6):1009–15. https://doi.org/10.1093/bioinformatics/btx682.
Masaeli M, Yan Y, Cui Y, Fung G, Dy JG. Convex principal feature selection. In: SDM; 2010.
Mi S, Lin M, BrouwerVisser J, Heim J, Smotkin D, Hebert T, Gunter MJ, Goldberg GL, Zheng D, Huang GS. RNAseq identification of RACGAP1 as a metastatic driver in uterine carcinosarcoma. Clin Cancer Res. 2016;22(18):4676–86. https://doi.org/10.1158/10780432.ccr152116.
Mika S, Schölkopf B, Smola A, Müller KR, Scholz M, Rätsch G. Kernel pca and denoising in feature spaces. In: NIPS; 1998.
Morse MA, Sun W, Kim R, He AR, Abada PB, Mynderse M, Finn RS. The role of angiogenesis in hepatocellular carcinoma. Clin Cancer Res. 2019;25(3):912–20. https://doi.org/10.1158/10780432.ccr181254.
Moss M, Stoeck A, Yan W, Dempsey P. ADAM10 as a target for anticancer therapy. Curr Pharm Biotechnol. 2008;9(1):2–8. https://doi.org/10.2174/138920108783497613.
Nielsen IE, Dera D, Rasool G, Ramachandran RP, Bouaynaya NC. Robust explainability: a tutorial on gradientbased attribution methods for deep neural networks. IEEE Signal Process Mag. 2022;39(4):73–84. https://doi.org/10.1109/MSP.2022.3142719.
Ohmura S, Marchetto A, Orth MF, Li J, Jabar S, Ranft A, Vinca E, Ceranski K, CarreñoGonzalez MJ, RomeroPérez L, Wehweck FS, Musa J, Bestvater F, Knott MML, Hölting TLB, Hartmann W, Dirksen U, Kirchner T, CidreAranaz F, Grünewald TGP. Translational evidence for RRM2 as a prognostic biomarker and therapeutic target in ewing sarcoma. Mol Cancer. 2021. https://doi.org/10.1186/s12943021013939.
Osako Y, Yoshino H, Sakaguchi T, Sugita S, Yonemori M, Nakagawa M, Enokida H. Potential tumorsuppressive role of microRNA99a3p in sunitinibresistant renal cell carcinoma cells through the regulation of RRM2. Int J Oncol. 2019. https://doi.org/10.3892/ijo.2019.4736.
Pliarchopoulou K, Kalogeras KT, Kronenwett R, Wirtz RM, Eleftheraki AG, Batistatou A, Bobos M, Soupos N, Polychronidou G, Gogas H, Samantas E, Christodoulou C, Makatsoris T, Pavlidis N, Pectasides D, Fountzilas G. Prognostic significance of RACGAP1 mRNA expression in highrisk early breast cancer: a study in primary tumors of breast cancer patients participating in a randomized hellenic cooperative oncology group trial. Cancer Chemother Pharmacol. 2012;71(1):245–55. https://doi.org/10.1007/s002800122002z.
Rahman MA, Amin ARMR, Wang X, Zuckerman JE, Choi CHJ, Zhou B, Wang D, Nannapaneni S, Koenig L, Chen Z, Chen ZG, Yen Y, Davis ME, Shin DM. Systemic delivery of siRNA nanoparticles targeting RRM2 suppresses head and neck tumor growth. J Control Release. 2012;159(3):384–92. https://doi.org/10.1016/j.jconrel.2012.01.045.
Rasmussen PM, Madsen KH, Lund TE, Hansen LK. Visualization of nonlinear kernel models in neuroimaging by sensitivity maps. NeuroImage. 2011;55(3):1120–31. https://doi.org/10.1016/j.neuroimage.2010.12.035.
Reverter F, Vegas E, Oller JM. KernelPCA data integration with enhanced interpretability. BMC Syst Biol. 2014. https://doi.org/10.1186/175205098s2s6.
Roth V, Steinhage V. Nonlinear discriminant analysis using kernel functions. In: NIPS, 1999; p. 568–574. http://papers.nips.cc/paper/1736nonlineardiscriminantanalysisusingkernelfunctions
Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math. 1987;20:53–65. https://doi.org/10.1016/03770427(87)901257.
Ruder S. An overview of gradient descent optimization algorithms; 2016. https://doi.org/10.48550/ARXIV.1609.04747.
Saigusa S, Tanaka K, Mohri Y, Ohi M, Shimura T, Kitajima T, Kondo S, Okugawa Y, Toiyama Y, Inoue Y, Kusunoki M. Clinical significance of RacGAP1 expression at the invasive front of gastric cancer. Gastric Cancer. 2014;18(1):84–92. https://doi.org/10.1007/s1012001403551.
Santin AD, Zhan F, Bignotti E, Siegel ER, Cané S, Bellone S, Palmieri M, Anfossi S, Thomas M, Burnett A, Kay HH, Roman JJ, O’Brien TJ, Tian E, Cannon MJ, Shaughnessy J, Pecorelli S. Gene expression profiles of primary HPV16 and HPV18infected early stage cervical cancers and normal cervical epithelium: identification of novel candidate molecular markers for cervical cancer diagnosis and therapy. Virology. 2005;331(2):269–91. https://doi.org/10.1016/j.virol.2004.09.045.
Sanz H, Valim C, Vegas E, Oller JM, Reverter F. SVMRFE: selection and visualization of the most relevant features through nonlinear kernels. BMC Bioinform. 2018. https://doi.org/10.1186/s1285901824514.
Schölkopf B, Knirsch P, Smola A, Burges C. Fast approximation of support vector kernel expansions, and an interpretation of clustering as approximation in feature spaces. In: Mustererkennung 1998. Informatik aktuell. Berlin: Springer; 1998. p. 125–132. MaxPlanckGesellschaft.
Scholkopf B, Mika S, Burges CJC, Knirsch P, Muller KR, Ratsch G, Smola AJ. Input space versus feature space in kernelbased methods. IEEE Trans Neural Netw. 1999;10(5):1000–17. https://doi.org/10.1109/72.788641.
Schölkopf B, Smola AJ. Learning with kernels. Cambridge: The MIT Press; 2018. https://doi.org/10.7551/mitpress/4175.001.0001.
Schölkopf B, Smola A, Müller KR. Kernel principal component analysis. In: Artificial neural networks—ICANN’97. Berlin: Springer; 1997. p. 583–8.
Schölkopf B, Tsuda K, Vert JP. Kernel methods in computational biology; 2003.
Simonyan K, Vedaldi A, Zisserman A. Deep inside convolutional networks: visualising image classification models and saliency maps; 2013. https://doi.org/10.48550/ARXIV.1312.6034
Smilkov D, Thorat N, Kim B, Viégas F, Wattenberg M. SmoothGrad: removing noise by adding noise; 2017. https://doi.org/10.48550/ARXIV.1706.03825
Sofi S, Mehraj U, Qayoom H, Aisha S, Almilaibary A, Alkhanani M, Mir MA. Targeting cyclindependent kinase 1 (CDK1) in cancer: molecular docking and dynamic simulations of potential CDK1 inhibitors. Med Oncol. 2022. https://doi.org/10.1007/s12032022017482.
Springenberg JT, Dosovitskiy A, Brox T, Riedmiller M. Striving for simplicity: the all convolutional net; 2014. https://doi.org/10.48550/ARXIV.1412.6806
Sundararajan M, Taly A, Yan Q. Axiomatic attribution for deep networks; 2017. https://doi.org/10.48550/ARXIV.1703.01365
Uurtio V, Bhadra S, Rousu J. Largescale sparse kernel canonical correlation analysis. In: International conference on machine learning; 2019.
Wang N, Zhan T, Ke T, Huang X, Ke D, Wang Q, Li H. Increased expression of RRM2 by human papillomavirus e7 oncoprotein promotes angiogenesis in cervical cancer. Br J Cancer. 2014;110(4):1034–44. https://doi.org/10.1038/bjc.2013.817.
Wang W, Hsu C, Wang T, Li C, Hou Y, Chu J, Lee C, Liu M, Su JJ, Jian K, Huang S, Jiang S, Shan Y, Lin P, Shen Y, Lee MT, Chan T, Chang C, Chen C, Chang I, Lee Y, Chen L, Tsai KK. A gene expression signature of epithelial tubulogenesis and a role for ASPM in pancreatic tumor progression. Gastroenterology. 2013;145(5):1110–20. https://doi.org/10.1053/j.gastro.2013.07.040.
West AN, Neale GA, Pounds S, Figueredo BC, Galindo CR, Pianovski MAD, Filho AGO, Malkin D, Lalli E, Ribeiro R, Zambetti GP. Gene expression profiling of childhood adrenocortical tumors. Cancer Res. 2007;67(2):600–8. https://doi.org/10.1158/00085472.can063767.
Xu Z, Zhang Q, Luh F, Jin B, Liu X. Overexpression of the ASPM gene is associated with aggressiveness and poor outcome in bladder cancer. Oncol Lett. 2018. https://doi.org/10.3892/ol.2018.9762.
Yang XJ, Tan MH, Kim HL, Ditlev JA, Betten MW, Png CE, Kort EJ, Futami K, Furge KA, Takahashi M, Kanayama HO, Tan PH, Teh BS, Luan C, Wang K, Pins M, Tretiakova M, Anema J, Kahnoski R, Nicol T, Stadler W, Vogelzang NG, Amato R, Seligson D, Figlin R, Belldegrun A, Rogers CG, Teh BT. A molecular classification of papillary renal cell carcinoma. Cancer Res. 2005;65(13):5628–37. https://doi.org/10.1158/00085472.can050533.
Yang Y, Shen HT, Ma Z, Huang Z, Zhou X. l2, 1norm regularized discriminative feature selection for unsupervised learning. In: International joint conference on artificial intelligence; 2011.
Zeiler MD, Fergus R. Visualizing and Understanding Convolutional Networks; 2013. https://doi.org/10.48550/ARXIV.1311.2901
Zhang J. Evolution of the human aspm gene, a major determinant of brain size. Genetics. 2003;165(4):2063–70. https://doi.org/10.1093/genetics/165.4.2063.
Zhao Z, Liu H. Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the 24th international conference on machine learning. ICML ’07. Association for Computing Machinery, New York, NY, USA; 2007. p. 1151–57. https://doi.org/10.1145/1273496.1273641.
Zhuang L, Yang Z, Meng Z. Upregulation of BUB1b, CCNB1, CDC7, CDC20, and MCM3 in tumor tissues predicted worse overall survival and diseasefree survival in hepatocellular carcinoma patients. BioMed Res Int. 2018;2018:1–8. https://doi.org/10.1155/2018/7897346.
Acknowledgements
The authors thank Dr. Alberto GonzálezSanz for the help with the revision of the mathematical formulations and Gabriele Tazza for the valuable and enriching discussions.
Funding
This work was funded by eMUSE MSCAITN2020 European Training Network under the Marie SklodowskaCurie Grant Agreement No. 956126. The funding did not influence the study’s design, the interpretation of data, or the writing of the manuscript.
Author information
Authors and Affiliations
Contributions
MB wrote the main manuscript and performed the analysis under the supervision of SD and MAD. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Briscik, M., Dillies, MA. & Déjean, S. Improvement of variables interpretability in kernel PCA. BMC Bioinformatics 24, 282 (2023). https://doi.org/10.1186/s1285902305404y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1285902305404y