Sparse kernel canonical correlation analysis for discovery of nonlinear interactions in highdimensional data
 Kosuke Yoshida^{1, 2}Email authorView ORCID ID profile,
 Junichiro Yoshimoto^{3} and
 Kenji Doya^{2}
DOI: 10.1186/s128590171543x
© The Author(s) 2017
Received: 14 October 2016
Accepted: 8 February 2017
Published: 14 February 2017
Abstract
Background
Advance in highthroughput technologies in genomics, transcriptomics, and metabolomics has created demand for bioinformatics tools to integrate highdimensional data from different sources. Canonical correlation analysis (CCA) is a statistical tool for finding linear associations between different types of information. Previous extensions of CCA used to capture nonlinear associations, such as kernel CCA, did not allow feature selection or capturing of multiple canonical components. Here we propose a novel method, twostage kernel CCA (TSKCCA) to select appropriate kernels in the framework of multiple kernel learning.
Results
TSKCCA first selects relevant kernels based on the HSIC criterion in the multiple kernel learning framework. Weights are then derived by nonnegative matrix decomposition with L1 regularization. Using artificial datasets and nutrigenomic datasets, we show that TSKCCA can extract multiple, nonlinear associations among highdimensional data and multiplicative interactions among variables.
Conclusions
TSKCCA can identify nonlinear associations among highdimensional data more reliably than previous nonlinear CCA methods.
Keywords
Kernel canonical correlation analysis HilbertSchmidt independent criterion L1 regularizationBackground
Canonical correlation analysis (CCA) [1] is a statistical method for finding common information from two different sources of multivariate data. This method optimizes linear projection vectors so that two random multivariate datasets are maximally correlated. With advances in highthroughput biological measurements, such as DNA sequencing, RNA microarrays, and mass spectroscopy, CCA has been extensively used for discovery of interactions between the genome, gene transcription, protein synthesis, and metabolites [2–5]. Because CCA solution is reduced to an eigenvalue problem, multiple components of interactions with sparse constraints are readily introduced [4, 6, 7].
Kernel CCA (KCCA) was introduced to capture nonlinear associations between two blocks of multivariate data [8–11]. Given two blocks of multivariate data x and z, KCCA finds nonlinear transformations f(x) and g(z) in a reproducing kernel Hilbert space (RKHS) so that the correlation between f(x) and g(z) is maximized. In order to avoid overfitting and to improve interpretability of results, sparse additive functional CCA (SAFCCA) [12] constrains f(x) and g(z) as sparse additive models and optimizes them using the biconvex backfitting algorithm [13]. However, it is not straightforward to obtain multiple orthogonal transformations for extracting multiple components of associations. Another method for finding nonlinear associations is to maximize measures of nonlinear matching, such as the HilbertSchmidt Independent Criterion (HSIC) [14] and the Kernel Target Alignment (KTA) [15] between linearly projected datasets x and z [16]. While these methods can obtain multiple orthogonal projections by iteratively analyzing residuals, it is impossible for these methods to remove irrelevant features, making them prone to overfitting.
In this paper, we propose twostage kernel CCA (TSKCCA), which enables us (1) to select sparse features in highdimensional data and (2) to obtain multiple nonlinear associations. In the first stage, we represent target kernels with a weighted sum of prespecified subkernels and optimize their weight coefficients based on HSIC with sparse regularization. In the second stage, we apply standard KCCA using target kernels obtained in the first stage to find multiple nonlinear correlations.
We briefly review CCA, KCCA, and twostage MKL, and then present TSKCCA algorithm. We apply TSKCCA to three synthetic datasets and nutrigenomic experimental data to show that the method discovers multiple nonlinear associations within highdimensional data, and provides interpretation that are robust to irrelevant features.
CCA, kernel CCA, and multiple kernel learning
In this section, we briefly review the bases of our proposed method, namely, linear canonical correlation analysis (CCA), kernel CCA (KCCA), and multiple kernel learning (MKL).
Canonical correlation analysis (CCA)
where Var(·) and Cov(·,·) denote the empirical variance and covariance of the data, respectively. The optimal solution (w ^{∗},v ^{∗}) of Eq. (1a and 1b) is obtained by solving generalized eigenvalue problems and successive eigenvectors represent multiple components. The projections, f ^{∗}(x)=w ^{∗T } x and g ^{∗}(z)=v ^{∗T } z, are said to be canonical variables for \(\textbf {x} \in \mathbb {R}^{p}\) and \(\textbf {z} \in \mathbb {R}^{q}\), respectively. If we introduce sparse regularization on w and v, we obtain sparse projections [4, 6, 7].
Kernel CCA
In Kernel CCA (KCCA), we suppose that the original data are mapped into a feature space via nonlinear functions. Then linear CCA is applied in the feature space. More specifically, nonlinear functions \(\phi _{x}: \mathbb {R}^{p} \to \mathbb {H}_{x}\) and \(\phi _{z}: \mathbb {R}^{q} \to \mathbb {H}_{z}\) transform the original data \(\{(\textbf {x}_{n}, \textbf {z}_{n})\}_{n=1}^{N}\) to feature vectors \(\{(\phi _{x}(\textbf {x}_{n}), \phi _{z}(\textbf {z}_{n}))\}_{n=1}^{N}\) in reproducing kernel Hilbert spaces (RKHS) \(\mathbb {H}_{x}\) and \(\mathbb {H}_{z}\). Innerproduct kernels for \(\mathbb {H}_{x}\) and \(\mathbb {H}_{z}\) are defined as k _{ x }(x,x ^{′})=ϕ _{ x }(x)^{ T } ϕ _{ x }(x ^{′}), and k _{ z }(z,z ^{′})=ϕ _{ z }(z)^{ T } ϕ _{ z }(z ^{′}).
where K _{ x } and K _{ z } are NbyN kernel matrices defined as \(\phantom {\dot {i}\!}[K_{x}]_{nn'} = k_{x}(\textbf {x}_{n},\textbf {x}_{n'})\) and \(\phantom {\dot {i}\!}[K_{z}]_{nn'} = k_{z}(\textbf {z}_{n},\textbf {z}_{n'})\) ^{1}. I is the NbyN identity matrix and κ (κ>0) is the regularization parameter.
respectively. As indicated by Eq. (2a–2c), the nonlinear functions, ϕ _{ x } and ϕ _{ z }, are not explicitly used in the computation of KCCA. Instead, the kernels k _{ x } and k _{ z } implicitly specify the nonlinear functions, and the main goal is to solve the constrained quadratic optimization problem with 2Ndimensional variables.
Multiple kernel learning
where weight coefficients of subkernels, \(\{ \eta _{m} \}_{m=1}^{M_{x}}\) and \(\{ \mu _{l} \}_{l=1}^{M_{z}}\) are tuned to optimize an objective function.
A specific example of this framework is the twostage MKL approach [15, 19]: In the first stage, the weight coefficients are optimized based on a similarity criterion, such as the kernel target alignment; then, a standard kernel algorithm, such as support vector machine, is applied in the second stage.
Methods
In this section, we propose a novel nonlinear CCA method, twostage kernel CCA (TSKCCA), inspired by the concepts of sparse multiple kernel learning and kernel CCA. In the following, we present the general framework of TSKCCA, followed by our solutions for practical issues in the implementation.
First stage: multiple kernel learning with HSIC and sparse regularizer
where \([K_{x}^{(m)}]_{nn'} = k_{x}^{(m)}(\mathbf {x}_{n},\mathbf {x}_{n'})\) and \([K_{z}^{(l)}]_{nn'} = k_{z}^{(l)}(\mathbf {z}_{n},\mathbf {z}_{n'})\). The goal of the first stage is to optimize the weight vector \(\phantom {\dot {i}\!}\boldsymbol {\eta }=(\eta _{1},\ldots,\eta _{M_{x}})^{T}\) and \(\phantom {\dot {i}\!}\boldsymbol {\mu }=(\mu _{1},\ldots,\mu _{M_{z}})^{T}\) so that kernel matrices K _{ x } and K _{ z } statistically depend on each other as much as possible, while irrelevant subkernels are filtered out.
where \(\left \ \mathbf {x}\right \_{p} = (\sum _{i} x_{i}^{p})^{1/p} \) is the L ^{ p }norm of the vector x and c _{1} and c _{2} are parameters (See also “Parameter tuning by a permutation test” section). This optimization problem is an example of penalized matrix decomposition with nonnegativity constraints [4]. Accordingly, we can obtain optimal weight coefficients by performing singular value decomposition of matrix M under constraints. In this process, the ith left singular vector \(\boldsymbol {\eta }^{(i)} = (\eta _{1}^{(i)}, \dots, \eta _{M_{x}}^{(i)})^{T}\) as well as the right singular vector \(\boldsymbol {\mu }^{(i)} = (\mu _{1}^{(i)}, \dots, \mu _{M_{z}}^{(i)})^{T}\) are obtained iteratively by Algorithm 1.
In Algorithm 1, S denotes the elementwise softthresholding operator: The mth element of S(a,c) is given by sign(a _{ m })(a _{ m }−c)_{+}, where (x)_{+} is x if x≥0 and 0 if x<0. In each step, Δ is chosen by a binary search so that L1 constraints ∥η∥_{1}≤c _{1} and ∥μ∥_{1}≤c _{2} are satisfied. In general, the above iteration does not necessarily converge to a global optimum. For each iteration, we initialize η ^{(i)} with a nonsparse, left singular vector of M, following the previous study, to obtain reasonable solutions [4].
The second stage: kernel CCA
After learning kernels via penalized matrix decomposition as above, we perform the second stage of standard kernel CCA [8, 9] to obtain optimal coefficients α ^{∗} and β ^{∗} (Eq. 3a and 3b) with parameter κ for each pair of singular vectors \(\{ \boldsymbol {\eta }^{(i)}, \boldsymbol {\mu }^{(i)}\}_{i=1}^{rank(M)}\). Given test kernel \(\{ K_{x,test}^{(m)} \}_{m=1}^{M_{x}}\) and \(\{ K_{z,test}^{(l)} \}_{l=1}^{M_{z}}\), test correlation corresponding to the ith singular vectors is defined as correlation between \(\sum _{m=1}^{M_{x}} \eta _{m} K_{x,test}^{(m)} \boldsymbol {\alpha }^{\ast }\) and \(\sum _{l=1}^{M_{z}} \mu _{l} K_{z,test}^{(l)} \boldsymbol {\beta }^{\ast }\).
Practical solutions for TSKCCA implementation
TSKCCA still has several options for subkernels to be designed manually. In this study, we focus on featurewise kernel and pairwise kernel defined in the following sections.
Featurewise kernel
where γ _{ x } and γ _{ z } are width parameters. By applying featurewise kernels, projection functions are restricted to additive models defined as \(f^{\ast }(\textbf {x}) = \sum _{m=1}^{p} f_{m}(\textbf {x}_{.m})\) and \(g^{\ast }(\textbf {z}) = \sum _{l=1}^{q} g_{l}(\textbf {z}_{.l})\), where \(f_{m}: \mathbb {R} \to \mathbb {R}\) (m=1,…,p) and \(g_{l}: \mathbb {R} \to \mathbb {R}\) (l=1,…,q) are certain nonlinear functions ^{2}. Note that the number of subkernels, M _{ x } and M _{ z }, are equivalent to the number of features, p and q, respectively.
Pairwise kernel
We introduce pairwise kernels as subkernels to consider crossfeature interactions among all possible pairs of features. Since the sparseness is induced to the weight of subkernels, the pairwise kernels result in selecting relevant crossfeature interactions. Projection functions are defined as \(f^{\ast }(\textbf {x}) = \sum _{m < m'}^{p} f_{m,m'}(\textbf {x}_{.m},\textbf {x}_{.m'})\) and \(g^{\ast }(\textbf {z}) = \sum _{l < l'}^{q} g_{l,l'}(\textbf {z}_{.l},\textbf {z}_{.l'})\), where \(f_{m,m'}: \mathbb {R}^{2} \to \mathbb {R}\) and \(g_{l,l'}: \mathbb {R}^{2} \to \mathbb {R}\) are certain nonlinear functions with two dimensional inputs. Note that the number of subkernels, M _{ x } and M _{ z }, are, p(p−1)/2 and q(q−1)/2, respectively.
Preprocessing for MKL
Dividing each subkernel by its variance \(K \rightarrow \frac {K}{\sigma ^{2}}\), we can achieve normalization of each subkernel.
Parameter tuning by a permutation test
When the kernel matrix K _{ x } (or K _{ y }) is full rank, as is typically our case, KCCA with a small κ (κ≪1) can always find a solution such that the maximum canonical correlation nearly equals one. This property makes it difficult to tune the regularization parameters for the first stage c _{1} and c _{2}. To solve the issue, we introduce a simple heuristics.
The key idea is to conduct a permutation test for deciding whether to reject a null hypothesis that the maximal canonical correlation induced by ith singular vectors is no more than those attained when x and z are statistically independent. Since the pvalue of this test is interpreted as the deviance between the actual outcome and those expected under the null hypothesis, we use it as a score to evaluate the significance of ith singular vectors where smaller pvalue is more significant.
Algorithm 2 summarizes our implementation for the permutation test. Only for the first singular vectors η ^{(1)} and μ ^{(1)}, this procedure is applied to various pairs of (c _{1},c _{2}) that satisfy the constraints of \(1 \leq c_{1} \leq \sqrt {M_{x}}\) and \(1 \leq c_{2} \leq \sqrt {M_{y}}\) [4]. Among them, the pair with the lowest pvalue is chosen as the optimal parameters of c _{1} and c _{2}.
For simplicity, other parameters, such as γ in the Gaussian kernel and κ in KCCA, are fixed heuristically. γ ^{−1} is set to the median of the Euclidean distance between data points and κ is set to 0.02 as recommended in the previous study [9].
Results
In this section, we experimentally evaluate the performance of our proposed TSKCCA, SAFCCA [12], and other methods using synthetic data and nutrigenomic experimental data.
Dataset 1: single nonlinear association
where D was the total number of dimensions and ε was independent noise.
The optimal model in each method was trained using N training samples. Here, we assumed c _{1}=c _{2} in the range of \(1 \leq c_{1}, c_{2} \leq \frac {\sqrt {D}}{2}\) and obtained optimal values using a permutation test with B=100. The test correlation was evaluated with separate 100 test samples, averaged over 100 simulation runs as we varied the number of dimensions, the sample size, and the noise level.
Dataset 2: multiple nonlinear associations
First, we performed a permutation test with B=1000 for ten singular vectors \(\{ \boldsymbol {\eta }^{(i)},\boldsymbol {\mu }^{(i)}\}_{i=1}^{10}\) corresponding to the ten highest singular values of M given by Eq. (8). Pvalues of the top three were significant (p<0.001) and the rest were nonsignificant. This result suggests that only the three singular vectors included nonlinear associations.
Feature selection through singular vectors (SVs) in data 2
1st SV (η ^{(1)})  2nd SV (η ^{(2)})  3rd SV (η ^{(3)})  

η _{1}  0.98 (0.002)  0.00 (0.018)  0.00 (0.001) 
η _{2}  0.00 (0.003)  0.21 (0.033)  0.00 (0.001) 
η _{3}  0.00 (0.001)  0.00 (0.010)  0.22 (0.029) 
η _{4}  0.22 (0.013)  0.00 (0.017)  0.00 (0.005) 
η _{5}  0.00 (0.000)  0.98 (0.004)  0.00 (0.005) 
η _{6}  0.00 (0.004)  0.00 (0.002)  0.98 (0.003) 
1st SV (μ ^{(1)})  2nd SV (μ ^{(2)})  3rd SV (μ ^{(3)})  
μ _{1}  0.99 (0.005)  0.01 (0.022)  0.01 (0.014) 
μ _{2}  0.01 (0.027)  0.99 (0.004)  0.01 (0.015) 
μ _{3}  0.01 (0.024)  0.01 (0.018)  0.99 (0.003) 
μ _{4}  0.01 (0.023)  0.01 (0.026)  0.01 (0.017) 
Comparison of test correlation, precision, and recall in data 2
Correlation  Precision  Recall  

TSKCCA  0.9670  0.9163  1 
0.9636  
0.9732  
SAFCCA  0.7585  0.6350  0.4375 
Dataset 3: feature interactions
where D was the number of dimensions. For this dataset, we used featurewise kernels and pairwise kernels as subkernels in order to handle both single feature effects and crossfeature interactions like the term x _{.1} x _{.2}. There were D+D×(D−1)/2 subkernels, the weight coefficients of which were optimized in our method.
Dataset 4: nutrigenomic data
We then analyzed a nutrigenomic dataset from a previous mouse study [22, 23]. In this study, expression of 120 genes in liver cells that would be relevant in the context of nutrition and concentrations of 21 hepatic fatty acids were measured on 20 wildtype mice and 20 PPAR αdeficient mice. Mice of each genotype were fed 5 different diets with different levels of fat. For matrix notation, gene expression data were denoted by \(X \in \mathbb {R}^{40\times 120}\), and data regarding concentrations of fatty acids was denoted by \(Z \in \mathbb {R}^{40\times 21}\). Data were standardized to have a mean of zero and unit variance in each dimension. Several linear correlations between X and Z were detected by applying a regularized version of the linear CCA [5, 23].
First, we performed a permutation test for sparse CCA, KCCA, SAFCCA, and TSKCCA on parameters defined by equallyspaced grid points in order to identify significant associations in these data. In KCCA and SAFCCA, there were no significant associations; thus, we focused on sparse CCA and TSKCCA in the following analysis. We identified two significant linear associations in sparse CCA (p<0.001 using a permutation test) and one nonlinear association in TSKCCA (p=0.0067 using a permutation test) with c _{1}=2.6257 and c _{2}=1.9275.
Frequency of selection per subkernel corresponding to genes (left) and fatty acids (right) in nutrigenomic data
Genes/Pair of genes  Freq.  Fatty acids/Pair of fatty acids  Freq 

PMDCI  643  C16.0C18.0  622 
CAR1PMDCI  564  C18.0  485 
PMDCITHIOL  563  C16.0C20.3n.6  429 
ACBPPMDCI  473  C16.0  340 
L.FABPPMDCI  451  C18.0C20.3n.6  315 
CYP4A10PMDCI  379     
CYP3A11PMDCI  370     
ALDH3PMDCI  369     
NtcpPMDCI  354     
PMDCISPI1.1  347     
ACOTHPMDCI  330     
PMDCISR.BI  306     
Discussion
Other researchers have employed the sparse additive model [13] to extend KCCA to highdimensional problems, and have defined two equivalent formulations, such as sparse additive functional CCA (SAFCCA) and sparse additive kernel CCA (SAKCCA) [12]. The former was defined in a second order Sobolev space and solved using the biconvex backfitting procedure. The latter, defined in RKHS, was derived by applying representer theorem to the former. Given some function \(f_{m} \in \mathbb {H}_{m}\), these algorithms optimize the additive model, \(f_{1} \in \mathbb {H}_{1},f_{2} \in \mathbb {H}_{2},\ldots,f_{p} \in \mathbb {H}_{p}\). In contrast, our formulation supposes an additive kernel, such as \(\sum \eta _{m} K_{m}\) associated with RKHS \(\mathbb {H}_{add}\) and finds correlations in this space. This approach enables us to reveal multiple components of associations.
Some problems specific to KCCA, such as choosing two parameters (i.e. regularization parameter κ and the width parameter γ) and the number of components, remain unsolved. While cross validation is applicable to set these values [24], they are fixed for simplicity in our study, based on the previous study [9].
Next, we discuss the validity of feature selection in nutrigenomic data performed using sparse CCA and TSKCCA. In the original study, the authors focused on the role of PPAR α as a major transcriptional regulator of lipid metabolism and determined that PPAR α regulates the expression of many genes in mouse liver under lower dietary fat conditions [22]. They provided a list of genes that have significantly different expression levels between wildtype and PPAR αdeficient mice. While only a few genes selected by sparse CCA were included in the list, 13 out of 14 genes selected with the 1st singular vector in TSKCCA were included in the list. This result shows that TSKCCA successfully extracts meaningful nonlinear associations induced by PPAR αdeficiency.
Moreover, in our analysis of pairwise kernels, most of the frequently selected pairs of genes retained PMDCI known as a sort of enoylCoA isomerases involved in βoxidation of polyunsaturated fatty acids. This implies that the interactions of PMDCI and other genes contribute to lipid metabolism in PPAR αdeficient mice.
Many variants of subkernels, such as string kernels or graph kernels, can be employed in the same framework. In the field of bioinformatics, Yamanishi et al. adopted integrated KCCA (IKCCA), which exploited the simple sum of multiple kernels to combine many sorts of biological data [11]. This technique can be improved by optimizing weight coefficients of each kernel in the frame of TSKCCA. Finally, if kernels are defined on groups of features, it enables us to perform groupwise feature selection, just like group sparse CCA [25–27]. It is beneficial to consider groupwise feature selection for biomarker detection problems.
Conclusions
This paper proposes a novel extension of kernel CCA that we call twostage kernel CCA, which is able to identify multiple canonical variables from sparse features. This method optimizes the sparse weight coefficients of prespecified subkernels as a sparse matrix decomposition before performing standard kernel CCA. This procedure enables us to achieve interpretability by removing irrelevant features in the context of nonlinear correlational analysis.
Through three numerical experiments, we have demonstrated that TSKCCA is more useful for higher dimensional data and for extracting multiple nonlinear associations than an existing method, SAFCCA. Using nutrigenomic data, our results show that TSKCCA can retrieve information about genotype and may reveal an interactive mechanism of lipid metabolism in PPAR αdeficient mice.
Endnotes
^{1} In this article, \(\phantom {\dot {i}\!}[\cdot ]_{nn'}\) denotes the (n,n ^{′})th elements of the matrix enclosed by the brackets.
^{2} In this article, x _{.m } denotes the mth feature of x.
Abbreviations
 CCA:

Canonical correlation analysis
 HSIC:

HilbertSchmidt independent criterion
 KCCA:

Kernel canonical correlation analysis
 MKL:

Multiple kernel learning
 RKHS:

Reproducing kernel Hilbert space
 SAFCCA:

Sparse additive functional canonical correlation analysis
 TSKCCA:

Twostage kernel canonical correlation analysis
Declarations
Acknowledgements
We thank Dr. Mitsuo Kawato, Dr. Noriaki Yahata, and Dr. Jun Morimoto for their valuable comments, and Dr. Steven D. Aird for editing the manuscript.
Funding
This work was supported by a GrantinAid for Scientific Research on Innovative Areas: Artificial Intelligence and Brain Science (16H06563), the Strategic Research Program for Brain Sciences from Japan Agency for Medical Research and Development, AMED, and Okinawa Institute of Science and Technology Graduate University.
Availability of data and materials
The datasets analyzed during the current study are available from https://cran.rproject.org/web/packages/CCA.
Our code for TSKCCA and synthetic data is available in https://github.com/kosyoshida/TSKCCA.
Authors’ contributions
KY designed the model and developed the algorithm. KY, JY, and KD participated in writing of the manuscript. All authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Consent for publication
Not applicable.
Ethics approval and consent to participate
Not applicable.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
 Hotelling H. Relations between two sets of variates. Biometrika. 1936; 28:321–77.View ArticleGoogle Scholar
 Yamanishi Y, Vert JP, Kanehisa M. Protein network inference from multiple genomic data: a supervised approach. Bioinformatics. 2004; 20(suppl 1):363–70.View ArticleGoogle Scholar
 Waaijenborg S, Verselewel de Witt Hamer PC, Zwinderman AH. Quantifying the association between gene expressions and dnamarkers by penalized canonical correlation analysis. Stat Appl Genet Mol Biol. 2008; 7(1):3.Google Scholar
 Witten DM, Tibshirani R, Hastie T. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics. 2009; 10(3):515–34.View ArticlePubMedPubMed CentralGoogle Scholar
 González I, Déjean S, Martin PG, Gonçalves O, Besse P, Baccini A. Highlighting relationships between heterogeneous biological data through graphical displays based on regularized canonical correlation analysis. J Biol Syst. 2009; 17(02):173–99.View ArticleGoogle Scholar
 Wilms I, Croux C. Sparse canonical correlation analysis from a predictive point of view. Biom J. 2015; 57(5):834–51.View ArticlePubMedGoogle Scholar
 Parkhomenko E, Tritchler D, Beyene J, et al.Sparse canonical correlation analysis with application to genomic data integration. Stat Appl Genet Mol Biol. 2009; 8(1):1–34.Google Scholar
 Akaho S. A kernel method for canonical correlation analysis. In: Proceedings of the International Meeting of the Psychometric Society (IMPS2001). Osaka: Springer Japan: 2001.Google Scholar
 Bach FR, Jordan MI. Kernel independent component analysis. J Mach Learn Res. 2003; 3:1–48.Google Scholar
 Vert JP, Kanehisa M. Graphdriven feature extraction from microarray data using diffusion kernels and kernel cca. In: Advances in Neural Information Processing Systems. Vancouver: NIPS Foundation: 2002. p. 1425–432.Google Scholar
 Yamanishi Y, Vert JP, Nakaya A, Kanehisa M. Extraction of correlated gene clusters from multiple genomic data by generalized kernel canonical correlation analysis. Bioinformatics. 2003; 19(suppl 1):323–30.View ArticleGoogle Scholar
 Balakrishnan S, Puniyani K, Lafferty JD. Sparse additive functional and kernel CCA. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012, Edinburgh, Scotland, UK, June 26  July 1, 2012. The International Machine Learning Society: 2012.
 Ravikumar P, Lafferty J, Liu H, Wasserman L. Sparse additive models. J R Stat Soc Ser B Stat Methodol. 2009; 71(5):1009–30.View ArticleGoogle Scholar
 Gretton A, Bousquet O, Smola AJ, Schölkopf B. Measuring statistical dependence with hilbertschmidt norms. In: Algorithmic Learning Theory, 16th International Conference, ALT 2005, Singapore, October 811, 2005, Proceedings. Springer: 2005. p. 63–77.
 Cristianini N, Shawetaylor J, Elisseeff A, Kandola J. On kerneltarget alignment. In: Advances in Neural Information Processing Systems 14. Vancouver: 2001. Citeseer.
 Chang B, Krüger U, Kustra R, Zhang J. Canonical correlation analysis based on hilbertschmidt independence criterion and centered kernel target alignment. In: Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 1621 June 2013. The International Machine Learning Society: 2013. p. 316–24.
 Lanckriet GR, Cristianini N, Bartlett P, Ghaoui LE, Jordan MI. Learning the kernel matrix with semidefinite programming. J Mach Learn Res. 2004; 5:27–72.Google Scholar
 Bach FR, Lanckriet GR, Jordan MI. Multiple kernel learning, conic duality, and the smo algorithm. In: Proceedings of the Twentyfirst International Conference on Machine Learning. ACM: 2004. p. 6.
 Cortes C, Mohri M, Rostamizadeh A. Twostage learning kernel algorithms. In: Proceedings of the 27th International Conference on Machine Learning (ICML10), June 2124, 2010, Haifa, Israel. The International Machine Learning Society: 2010. p. 239–46.
 Yamada M, Jitkrittum W, Sigal L, Xing EP, Sugiyama M. Highdimensional feature selection by featurewise kernelized lasso. Neural Comput. 2014; 26(1):185–207.View ArticlePubMedGoogle Scholar
 Kloft M, Brefeld U, Sonnenburg S, Zien A. Lpnorm multiple kernel learning. J Mach Learn Res. 2011; 12:953–97.Google Scholar
 Martin P, Guillou H, Lasserre F, Déjean S, Lan A, Pascussi J, Sancristobal M, Legrand P, Besse P, Pineau T. Novel aspects of pparalphamediated regulation of lipid and xenobiotic metabolism revealed through a nutrigenomic study. Hepatol (Baltim Md.) 2007; 45(3):767–77.View ArticleGoogle Scholar
 González I, Déjean S, Martin PG, Baccini A, et al.Cca: An r package to extend canonical correlation analysis. J Stat Softw. 2008; 23(12):1–14.View ArticleGoogle Scholar
 Leurgans SE, Moyeed RA, Silverman BW. Canonical correlation analysis when the data are curves. R Stat Soc Ser B Methodol, J. 1993;:725–40.
 Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B Methodol. 2006; 68(1):49–67.View ArticleGoogle Scholar
 Bach FR. Consistency of the group lasso and multiple kernel learning. J Mach Learn Res. 2008; 9:1179–225.Google Scholar
 Lin D, Zhang J, Li J, Calhoun VD, Deng HW, Wang YP. Group sparse canonical correlation analysis for genomic data integration. BMC Bioinforma. 2013; 14(1):245.View ArticleGoogle Scholar