 Research Article
 Open Access
 Published:
A multiple kernel learning algorithm for drugtarget interaction prediction
BMC Bioinformaticsvolume 17, Article number: 46 (2016)
Abstract
Background
Drugtarget networks are receiving a lot of attention in late years, given its relevance for pharmaceutical innovation and drug lead discovery. Different in silico approaches have been proposed for the identification of new drugtarget interactions, many of which are based on kernel methods. Despite technical advances in the latest years, these methods are not able to cope with large drugtarget interaction spaces and to integrate multiple sources of biological information.
Results
We propose KronRLSMKL, which models the drugtarget interaction problem as a link prediction task on bipartite networks. This method allows the integration of multiple heterogeneous information sources for the identification of new interactions, and can also work with networks of arbitrary size. Moreover, it automatically selects the more relevant kernels by returning weights indicating their importance in the drugtarget prediction at hand. Empirical analysis on four data sets using twenty distinct kernels indicates that our method has higher or comparable predictive performance than 18 competing methods in all prediction tasks. Moreover, the predicted weights reflect the predictive quality of each kernel on exhaustive pairwise experiments, which indicates the success of the method to automatically reveal relevant biological sources.
Conclusions
Our analysis show that the proposed data integration strategy is able to improve the quality of the predicted interactions, and can speed up the identification of new drugtarget interactions as well as identify relevant information for the task.
Availability
The source code and data sets are available at www.cin.ufpe.br/~acan/kronrlsmkl/.
Background
Drugtarget networks are receiving a lot of attention in late years, given their relevance for pharmaceutical innovation and drug repositioning purposes [1–3]. Although the amount of known interactions between drugs and target proteins has been increasing, the number of targets for approved drugs is still only a small proportion (<10 %) from the human proteome [1]. Recent advances on highthroughput methods provide ways for the production of large data sets about molecular entities as drugs and proteins. There is also an increase in the availability of reliable databases integrating information about interactions between these entities. Nevertheless, as the experimental verification of such interactions does not scale with the demand for innovation, the use of computational methods for the large scale prediction is mandatory. There is also a clear need for systemsbased approaches to integrate these data for drug discovery and repositioning applications [1].
Recently, an increasing number of methods have been proposed for drugtarget interaction (DTI) prediction. They can be categorized in ligandbased, dockingbased, or networkbased methods [4]. The docking approach, which can provide accurate estimates to DTIs, is computationally demanding and requires a 3D model of the target protein. Ligandbased methods, such as the quantitative structure activity relationship (QSAR), are based on a comparison of a candidate ligand to the known ligands of a biological target [5]. However, the utility of these ligandbased methods is limited when there are few ligands for a given target [2, 4, 6]. Alternatively, network based approaches use computational methods and known DTIs to predict new interactions [4, 5]. Even though ligandbased and dockingbased methods are more precise when compared to network based approaches, the latter are more adequate for the estimation of new interactions from complete proteomes and drugs catalogs [1]. Therefore, it can indicate novel candidates to be evaluated by more accurate methods.
Most network approaches are based on bipartite graphs, in which the nodes are composed of drugs (small molecules) and biological targets (proteins) [3, 7, 8]. Edges between drugs and targets indicate a known DTI (Fig.1). Given a known interaction network, kernel based methods can be used to predict unknown drugtarget interactions [2, 9–11]. A kernel can be seen as a similarity matrix estimated on all pairs of instances. The main assumption behind network kernel methods is that similar ligands tend to bind to similar targets and vice versa. These approaches use base kernels to measure the similarity between drugs (or targets) using distinct sources of information (e.g., structural, pharmacophore, sequence and function similarity). A pairwise kernel function, which measures the similarity between drugtarget pairs, is obtained by combining a drug and a protein base kernel via kernel product.
The majority of previous network approaches use classification methods, as Support Vector Machines (SVM), to perform predictions over the drugtarget interaction space [2, 4]. However, such techniques have major limitations. First, they can only incorporate one pair of base kernels at a time (one for drugs and one for proteins) to perform predictions. Second, the computation of the pairwise kernel matrix for the whole interaction space (all possible drugtarget pairs) is computationally unfeasible even for a moderate number of drugs and targets. Moreover, most drug target interaction databases provide no true negative interaction examples. The common solution for these issues is to randomly sample a small proportion of unknown interactions to be used as negative examples. While this approach provides a computationally trackable small drugtarget pairwise kernel, it generates an easier but unreal classification task with balanced class size [12].
An emerging machine learning (ML) discipline focused on the search for an optimal combination of kernels, called Multiple Kernel Learning (MKL) [13]. MKLlike methods have been previously proposed to the problem of DTI prediction [14–16] and the closely related proteinprotein interaction (PPI) prediction problem [17, 18]. This is extremely relevant, as it allows the use of distinct sources of biological information to define similarities between molecular entities. However, since traditional MKL methods are SVMbased [13, 19], they are subject to memory limitations imposed by the pairwise kernel, and are not able to perform predictions in the complete drugs vs. protein space. Moreover, MKL approaches used in PPI prediction problem [17, 18] and protein function prediction [20, 21] can not be applied to bipartite graphs, as the problem at hand. Currently, we are only aware of two recent works [19, 22] proposing MKL approach to integrate similarity measures for drugs and targets.
Drugtarget prediction fits a link prediction problem [4], which can be solved by a Kronecker regularized least squares approach (KronRLS) [10]. A single kernel version of this method has been recently applied to drugtarget prediction problem [10, 11]. A recent survey indicated that KronRLS outperforms SVM based methods in DTI prediction [2]. KronRLS uses Kronecker product algebraic properties to be able to perform predictions on the whole drugtarget space, without the explicit calculation of the pairwise kernels. Therefore, it can cope with problems on large drugs vs. proteins spaces. However, KronRLS can not be used on a MKL context.
In this work, we propose a new MKL algorithm to automatically select and combine kernels on a bipartite drugprotein prediction problem, the KronRLSMKL algorithm (Fig 1). For this, we extend the KronRLS method to a MKL scenario. Our method uses L2 regularization to produce a nonsparse combination of base kernels. The proposed method can cope with large drug vs. target interaction matrices; does not requires subsampling of the drugtarget network; and is also able to combine and select relevant kernels. We perform an empirical analysis using drugtarget datasets previously described [23] and a diverse set of drug kernels (10) and protein kernels (10).
In our experiments, we considered three different scenarios in the DTI prediction [2, 11, 24]: pair prediction, where every drug and target in the training set have at least one known interaction; or the ‘new drug’ and ‘new target’ setting, where some drugs and targets are present only in the test set, respectively. A comparative analysis with top performance single kernel approaches [2, 8, 10, 25–27] and all competing integrative approaches [14, 15, 22] demonstrates that our method is better or competitive in the majority of evaluated scenarios. Moreover, KronRLSMKL was able to select and also indicate the relevance of kernels, in the form of weights, for each problem.
Methods
In this work, we propose an extension of the KronRLS algorithm under recent developments of the MKL framework [28] to address the problem of link prediction on bipartite networks with multiple kernels. Before introducing our method, we will describe the RLS and the KronRLS algorithms (for further information, see [10, 11]).
RLS and KronRLS
Given a set of drugs \(\phantom {\dot {i}\!}D = \{ d_{1}, \ldots, d_{n_{d}}\}\), targets \(\phantom {\dot {i}\!}T = \{ t_{1}, \ldots, t_{n_{t}}\}\), and the set of training inputs x _{ i } (drugtarget pairs) and their binary labels \(y_{i} \in \mathbb {R}\) (where 1 stands for a known interaction and 0 otherwise), with 1<i≤n, n=DT (number of drugtarget pairs). The RLS approach minimizes the following function [29]:
where ∥f∥_{ K } is the norm of the prediction function f on the Hilbert space associated to the kernel K, and λ>0 is a regularization parameter which determines the compromise between the prediction error and the complexity of the model. According to the representer theorem [30], a minimizer of the above objective function admits a dual representation of the following form
where \(K: DT \times DT \rightarrow \mathbb {R}\) is named the pairwise kernel function and a is the vector of dual variables corresponding to each separation constraint. The RLS algorithm obtains the minimizer of Eq. 1 solving a system of linear equations defined by (K+λ I)a=y, where a and y are both ndimensional vectors consisting of the parameters a _{ i } and labels y _{ i }.
One can construct such pairwise kernel as the product of two base kernels, namely K((d,t),(d ^{′},t ^{′}))=K _{ D }(d,d ^{′})K _{ T }(t,t ^{′}), where K _{ D } and K _{ T } are the base kernels for drugs and targets, respectively. This is equivalent to the Kronecker product of the two base kernels [4, 31]: K=K _{ D }⊗K _{ T }. The size of the kernel matrix makes the model training computationally unfeasible even for moderate number of drugs and targets [4].
The KronRLS algorithm is a modification of RLS, and takes advantage of two specific algebraic properties of the Kronecker product to speed up model training: the so called vec trick [31] and the relation of the eigendecomposition of the Kronecker product to the eigendecomposition of its factors [11, 32].
Let \(K_{D} = Q_{D} \Lambda _{D} {Q_{D}^{T}}\) and \(K_{T} = Q_{T} \Lambda _{T} {Q_{T}^{T}}\) be the eigendecomposition of the kernel matrices K _{ D } e K _{ T }. The solution a can be given by solving the following equation [11]:
where v e c(·) is the vectorization operator that stacks the columns of a matrix into a vector, and C is a matrix defined as:
The KronRLS algorithm is well suited for the large pairwise space involved on the DTI prediction problem, since the estimation of vector a using Eqs. 3 and 4 is a much faster solution compared to the original RLS estimation process in such scenario. However, it does not support the use of multiple kernels.
KronRLS MKL
In this work, a vector of different kernels is considered, i.e., \(\boldsymbol {k}_{D} = ({K_{D}^{1}}, {K_{D}^{2}},\ldots, K_{D}^{P_{D}})\) and \(\boldsymbol {k}_{T} = ({K_{T}^{1}}, {K_{T}^{2}}, \ldots, K_{T}^{P_{T}})\), P _{ D } and P _{ T } indicate the number of base kernels defined over the drugs and target set, respectively. In this section, we propose an extension of KronRLS to handle multiple kernels.
The kernels can be combined by a linear function, i.e., the weighted sum of base kernels, corresponding to the optimal kernels \(K_{D}^{*}\) and \(K_{T}^{*}\):
where \(\boldsymbol {\beta }_{D} = \left \{{\beta _{D}^{1}},\ldots,\beta _{D}^{P_{D}}\right \}\) and \(\boldsymbol {\beta }_{T} = \left \{{\beta _{T}^{1}},\ldots,\beta _{T}^{P_{T}}\right \}\), correspond to the weights of drug and protein kernels, respectively.
In [28], the author demonstrated that MKL can be interpreted as a particular instance of a kernel machine with two layers, in which the second layer is a linear function. His work provides the theoretical basis for the development of a MKL extension for the closely related KronRLS algorithm in our work.
The classification function of Eq. 2 can be written in matricial form, f _{ a }=K a [29] and applying the well known property of the Kronecker product, (A⊗B)v e c(X)=v e c(B X A ^{T})[32], we have:
This way, we can rewrite the classification function as \(\left (K_{T}^{*} A \left (K_{D}^{*}\right)^{T}\right)\), where A=u n v e c(a). Using the same iterative approach considered in previous MKL strategies [13], we propose the use of a two step optimization process, in which the optimization of the vector a is interleaved with the optimization of the kernel weights. Given two initial weight vectors, \(\boldsymbol {\beta }_{D}^{0}\) and \(\boldsymbol {\beta }_{T}^{0}\), an optimal value for the vector a, using Eq. 3 is found, and with such optimal a, we can proceed to find optimal β _{ D } and β _{ T }. More specifically, Eq. 1 can be redefined when a is fixed, and knowing that \(\parallel f {\parallel _{F}^{2}}=\boldsymbol {a}^{T}K\boldsymbol {a}\) [28], we have:
then,
Since the second term does not depend on K (and thus does not depend on the kernel weights), and, as y and a are fixed, it can be discarded from the weights optimization procedure. Note that we are not interested in a sparse selection of base kernels as in [28], therefore we introduce a L2 regularization term to control sparsity [33] of the kernel weights, also known as a ball constraint. This term is parameterized by the σ regularization coefficient. Additionally, we can convert u to its matrix form by the application of the unvec operator, i.e., U=u n v e c(u), and also use a more appropriate matrix norm (Frobenius, ∥A∥_{2}≤∥A∥_{ F } [32]). In this way, for any fixed values of a and β _{ T }, the optimal value for the combination vector is obtained by solving the optimization problem defined as:
while the optimal β _{ T } can be found fixing the values of a and β _{ D }, according to:
The optimization method used here is the interiorpoint optimization algorithm [34] implemented in MATLAB [35].
Data
The datasets considered were first proposed by [23] and used by most competing methods [2, 10, 11, 15, 25]. Each dataset consists of a binary matrix, containing the known interactions of a determined set of drug targets, namely Enzyme (E), Ion Channel (IC), GPCR and Nuclear Receptors (NR), based on information extracted from the KEGG BRITE [36], BRENDA [37], SuperTarget [38] and DrugBank databases [39]. All four datasets are extremely unbalanced, if we consider the whole drugtarget interaction space, i.e., the number of known interactions is extremely lower than the number of unknown interactions, as presented in Table 1.
In order to analyze each type of entity from different points of view, we extracted 20 (10 for targets and 10 for drugs) distinct kernels from chemical structures, sideeffects, amino acid sequence, biological function, PPI interactions and network topology (a summary of base kernels is presented in Table 2).
Protein kernels
Here we use the following information sources about target proteins: amino acid sequence, functional annotation and proximity in the proteinprotein network. Concerning sequence information, we consider the normalized score of the SmithWaterman alignment of the amino acid sequence (SW) [23], as well as different parametrizations of the Mismatch (MIS) [40] and the Spectrum (SPEC) [41] kernels. For the Mismatch kernel, we evaluated four combinations of distinct values for the kmers length (k=3 and k=4) and the number of maximal mismatches per kmer (m=1 and m=2), namely MISk3m1, MISk3m2, MISk4m1 and MISk4m2; for the Spectrum kernel, we varied the kmers length (k=3 and k=4, SPECk3 and SPECk4, respectively). Both Mismatch and Spectrum kernels were calculated using the R packageKeBABS [42].
The Gene Ontology semantic similarity kernel (GO) was used to encode functional information. GO terms were extracted from the BioMART database [43], and the semantic similarity scores between the GO annotation terms were calculated using the csbl.go R package [44], with the Resnik algorithm [45]. We also extracted a similarity measure from the human proteinprotein network (PPI), obtained from the BioGRID database [46]. The similarity between each pair of targets was calculated based on the shortest distance on the corresponding PPI network, according to:
where A and b parameters were set as in [14] (A=0.9,b=1), and D(p,p ^{′}) is the shortest hop distance between proteins p and p ^{′}.
Drug kernels
As drug information sources, we consider 6 distinct chemical structure and 3 sideeffects kernels. Chemical structure similarity between drugs was achieved by the application of the SIMCOMP algorithm [47] (obtained from [23]), defined as the ratio of common substructures between two drugs based on the chemical graph alignment. We also computed the Lambdak kernel (LAMBDA) [48], the Marginalized kernel [49] (MARG), the MINMAX kernel [50], the Spectrum kernel [48] (SPEC) and the Tanimoto kernel [50] (TAN). These later kernels were calculated with the R Package Rchemcpp [48] with default parameters.
Two distinct sideeffects data sources were also considered. The FDA adverse event reporting system (AERS), from which side effect keywords (adverse event keywords) similarities for drugs were first retrieved by [51]. The authors introduced two types of pharmacological profiles for drugs, one based on the frequency information of side effect keywords in adverse event reports (AERSfreq) and another based on the binary information (presence or absence) of a particular sideeffect in adverse event reports (AERSbit). Since not every drug in the Nuclear Receptors, Ion Channel, GPCR and Enzyme datasets is also present on AERSbased data, we extracted the similarities of the drugs in AERS, and assigned zero similarity to drugs not present.
The second sideeffect resource was the SIDER database^{1} [52]. This database contains information about commercial drugs and their recorded side effects or adverse drug reactions. Each drug is represented by a binary profile, in which the presence or absence of each side effect keyword is coded 1 or 0, respectively. Both AERS and SIDER based profile similarities were obtained by the weighted cosine correlation coefficient between each pair of drug profiles [51].
Network topology information
We also use drugtarget network structure in the form of a network interaction profile as a similarity measure for both proteins and drugs. The idea is to encode the connectivity behavior of each node in the subjacent network. The Gaussian Interaction Profile kernel (GIP) [10] was calculated for both drugs and targets.
Competing methods
We compare the predictive performance of the KronRLSMKL algorithm against other MKL approaches, as well as in a single kernel context (one kernel for drugs, and one for targets). In the latter, we evaluate the performance of each possible combination of base kernels (Table 2) with the KronRLS algorithm, recently reported as the best method for predicting drugtarget pairs with single paired kernels [2]. This resulted in a total of 10×10=100 different combinations. The best performing pairs were then used as baselines in our method evaluation, selected according to two distinct criteria: the kernel pair that achieved the largest area under the precision recall curve (AUPR) on the training set, and, a more optimistic approach, which considered the largest AUPR on the testing set.
Besides the combination of single kernels for drugs and targets, two different kinds of methods were adopted to integrate multiple kernels: (1) standard nonMKL kernel methods for DTI prediction, trained on the average of multiple kernels (respectively for drugs and targets); (2) actual MKL methods specifically proposed for DTI prediction.
NonMKL approaches
We extend stateoftheart methods [8, 10, 25–27] for the DTI prediction problem for a multiple kernel context. For this, initially we average multiple kernels to produce a single kernel (respectively for drugs and targets). Once we have a single average kernel (one for drug and one for target), we adopt a standard kernel method for DTI prediction, i.e., the base learner. In our experiments, two distinct previous combinations strategies are used: the mean of base kernels and the kernel alignment (KA) heuristic, previously proposed by [53]. We will briefly describe the base learners, followed by a short overview of the two combination strategies considered.
The Bipartite Local Model (BLM) [26] is a machine learning based algorithm, where drugtarget pairs are predicted by the construction of the so called ‘local models’, i.e., a SVM classifier is trained for each drug in the training set, and the same is done for targets. Then, the maximum scores for drugs and targets are used to predict new drugtarget interactions. Since BLM demonstrated superior performance than Kernel Regression Method (KRM) [23] in previous studies [2, 26], we did not consider KRM in our experiments.
The Networkbased Random Walk with Restart on the Heterogeneous network (NRWRH) [8] algorithm predicts new interactions between drugs and targets by the simulation of a random walk in the network of known drugtarget predictions as well as in the drugdrug and proteinprotein similarity networks. LapRLS and NetLapRLS are both proposed in [25]. Both are based on the RLS learning algorithm, and perform similarity normalization by the application of the Laplacian operator. Predictions are done for drugs and targets separately, and the final prediction scores are obtained by averaging the prediction result from drug and target spaces.
As said previously, most previous SVMbased methods found on the literature can be reduced to the Pairwise Kernel Method (PKM) [27], with the distinction being made by the kernels used and the adopted combination strategy. PKM starts with the construction of a pairwise kernel, computed from the drug and target similarities. Given two drugtarget pairs, (d,p) and (d ^{′},p ^{′}), and the respective drug and target similarities, K _{ D } and K _{ P }, the pairwise kernel is given by K((d,p),(d ^{′},p ^{′})=K _{ D }(d,d ^{′})×K _{ P }(p,p ^{′}). Once the pairwise matrix is computed, it is then used to train a SVM classifier.
The PKM [27], KronRLS, BLM, NRWRH, LapRLS and NetLapRLS algorithms cannot cope with multiple kernels. For this reason, we consider two simple methods available for kernel combination: the mean of base kernels and the kernel alignment (KA) heuristic [53]. The mean drug kernel is computed as \(K_{D}^{*} = 1 / P_{D} \sum _{i=1}^{P_{D}}{K_{D}^{i}}\), and the same can be done for targets, analogously. KA is a heuristic for the estimation of kernel weights based on the notion of kernel alignment [54]. More specifically, the weight vector, β _{ D } for instance, can be obtained by:
where yy ^{T} stands for the ideal kernel and y being the label vector. The alignment A(K,yy ^{T}) of a given kernel K and the ideal kernel yy ^{T} is defined as:
where \(\left \langle K,\boldsymbol {yy}^{T} \right \rangle _{F} = \sum \limits _{i=1}^{n}\sum \limits _{j=1}^{n} (K)_{\textit {ij}} \left (\boldsymbol {yy}^{T}\right)_{\textit {ij}}\). Once such combinations are performed, the resulting drug and protein kernels are then used as input to the learning algorithm. We refer to the mean and KA heuristics appending the MEAN and KA, respectively, to each base learner.
Multiple kernel approaches
Similaritybased Inference of drugTARgets (SITAR) [14] constructs a feature vector with the similarity values, where each feature is based on one drugdrug and one genegene similarity measure, resulting in a total of P _{ D }×P _{ T } features. Each one is calculated by combining the drugdrug similarities between the query drug and other drugs and the genegene similarities between the query gene and other target genes across all true drugtarget associations. The method also performs a feature selection procedure and yields the final classification scores using a logistic regression classifier.
Gönen and Kaski [22] proposed the Kernelized Bayesian Matrix Factorization with Twin Multiple Kernel Learning (KBMF2MKL) algorithm, extending a previous work [55] to handle multiple kernels. The KBMF2MKL factorizes the drugtarget interaction matrix by projecting the drugs and the targets into a common subspace, where the projected drug and target kernels are multiplied. Normally distributed Kernel weights for each subspace projected kernel are then estimated without any constraints. The product of the final combined matrices is then used to make predictions.
Wang et al. [15] proposes to use a simple heuristic to previously combine the drug and target similarities, and then use a SVM classifier to perform the predictions. Only the maximum similarity values of drug and target kernel matrices are selected, resulting in two distinct kernels. They are then used to construct a pairwise kernel, computed from the drug and target similarities. Once the pairwise matrix is computed, it is then used to train a SVM classifier. This procedure is also known as the Pairwise Kernel Method (PKM) [27]. For this reason, we refer to the approach proposed by [15] by PKMMAX.
The authors in [15] suggest as further work a weighted sum approach. They suggest to learn the optimal convex combination of data sources maximizing the correlation of the obtained kernel matrix with the topology of drugprotein network. This objective can be achieved by solving a linear programming problem, as follows:
where \(K_{D}^{*}\) correspond to the optimal combination of drug kernel matrices with weight vector β _{ D }, dist is the drugdrug distance matrix in the DTI network, and corr represents the correlation coefficient. Analogously, the same can be done for targets. We call this method WANGMKL.
Experimental setup
Previous work [2, 11, 24] suggest that, in the context of paired input problems, one should consider separately the experiments where the training and test sets share common drugs or proteins. In order to achieve a clear notion of the performance of each method, all competing approaches were evaluated under 5 runs of three distinct 5fold crossvalidation (CV) procedures:

1.
‘new drug’ scenario: it simulates the task of predicting targets for new drugs. In this scenario, the drugs in a dataset were divided in 5 disjoint subsets (folds). Then the pairs associated to 4 folds of drugs were used to train the classifier and the remaining pairs are used to test;

2.
‘new target’ scenario: it corresponds in turn to predicting interacting drugs for new targets. This is analogous to the above scenario, however considering 5 folds of targets;

3.
pair prediction: is consists of predicting unknown interactions between known drugs and targets. All drugtarget interactionswere split in five folds, from which 4 were used for training and 1 for testing. Some of the competing methods (PKMbased, WANGMKL and SITAR) were trained with subsampled datasets, i.e., we randomly selected the same number of known interactions among the unknown interaction set, since these methods cannot be executed in large networks [2, 4, 14, 15]. Although balanced classes are unlikely in real scenarios, we also performed experiments in context (3), using a subsampled test set, obtained by sampling as many negative examples as positive examples [14, 15] from the test fold. This experiment is relevant for comparison to previous work, since most previous studies on drugtarget prediction performed undersampling to evaluate predictive performance (see Additional file 1: Table S1).^{2}
The hyperparameters of each competing methods were optimized under a nested CV procedure, using the following values: for the SVMbased methods (PKM, BLM and WANGMKL), the SVM cost parameter was evaluated under the interval {2^{−1},…,2^{3}}; for the KronRLSbased methods, the λ parameter was evaluated in the interval {2^{−15},2^{−10},…,2^{30}}. The σ regularization coefficient of the KRONRLSMKL algorithm was also optimized in the interval {0,0.25,0.5,0.75,1}. The number of components in KBMF2MKL was varied in the interval R∈{5,10,…,40}, and for the LapRLS and NetLapRLS we varied β _{ d },β _{ t }∈{0.25,0.50,…,1}. In NetLapRLS we also considered two distinct values for γ _{ d2},γ _{ t2}∈{0.01,0.1}. For NRWRH the restart probability was evaluated in the set {0.1,0.2,…,0.9}. After the hyperparameters were selected for each method, the outer loop evaluated the predictive performance for the test set partition with the model built using the selected hyperparameters.
The evaluation metric considered was the AUPR, as it allows a good quantitative estimate of the ability to separate the positive interactions from the negative ones. According to [56], this metric provides a better quality estimate for highly unbalanced data, since it punishes more heavily the existence of false positives (FP). This is specially true for the datasets considered, as demonstrated on Table 1, in which all datasets are extremely unbalanced.
Results and discussion
Paired kernel experiments
As a base study, we evaluate the performance of KronRLS on all pairs of kernels (10×10 pairs). The AUPR results of all pairs of kernels for the Nuclear Receptors, GPCR, Ion Channel and Enzyme datasets are show in more detail in the supplementary material (see Additional file 1).
The performance of KronRLS varies drastically with the kernel choice, as clearly demonstrated by the average performance of each kernel on the single kernel experiments (Fig. 2). For Nuclear Receptors, the best kernel pair combination was SPECk4 and GIP, while GIP and SW performed best in all other data sets. It is also important to notice the impact of different parametrizations of the Mismatch sequence kernel. Its performance decreases as more mismatches are allowed inside a kmer. Overall, both versions of AERS, SIMCOMP, GIP, MINIMAX and SIDER drug kernels showed better performance, while LAMBDA, MARG, SPEC and TAN performed worse. For targets, GIP, GO, MISk4m1, SPEC and SW kernels performed better than other target kernels.
Comparative analysis
In this section, we compare the competing methods in terms of AUPR for all datasets. Concerning KronRLS, we will use the best kernel pair (Best Pair) with largest AUPR as described in the previous section. This will serve as a baseline to evaluate the MKL approaches. Results are presented in Table 3. In the pair prediction scenario, KRONRLSMKL obtained highest AUPR in all datasets. Its results are even superior than the performance in comparison to the best kernel pair under the optimistic selection. The results of KRONRLSMKL in pair prediction are statistically significant against all other methods (at α=0.05), except from KRONRLSKA and KRONRLSMEAN, according to the Wilcoxon rank sum test (Additional file 2). Concerning the subsampled pair prediction, KRONRLSMKL achieved highest AUPR in the NR and IC data sets, and SITAR performed best in the GPCR and Enzyme data. There it performed second, just after SITAR (see Additional file 3: Table S1). The highest AUPR values obtained in the subsampled data sets in comparison to the unbalanced data sets clearly indicate that performing predictions in the complete data is a more difficult task. Moreover, the number of positive examples was negatively correlated to the dataset size for the complete datasets.
In the ’new target’ scenario, BLMKA performed best in 3 of 4 datasets, followed closely by BLMmean and KRONRLSMKL, demonstrating that the local SVM model is more effective in such scenario. BLMKA performed better than all evaluated methods with the exception of BLMMean, KBMFMKL, KRONRLSKA, KRONRLMEAN and KRONRLSMKL (α=0.05 Additional file 2). In the ’new drug’ problem, KRONRLSMKL obtained higher AUPR in the NR and GPCR datasets, while BLMKA had higher AUPR values in the IC and Enzyme data. Both KRONRLSMKL and BLMKA had statistically significant higher AUPR (at α=0.05; Additional file 2) than all other competing methods. In order to give an overview of the performance of the evaluated methods, an average ranking of the AUPR values obtained by all methods across the four datasets is presented in Table 4.
Methods also displayed distinct computational requirements. Memory usage was stable accross all methods, except from the SVMbased algorithms, which demonstrated quadratic growth of the memory used in relation to the size of the dataset (BLM, PKM, WANGMKL). This is in part due to the construction of the explicit pairwise kernel (see Additional file 3: Table S3). This fact turns such methods inadequate for contexts in which subsampling of pairs is undesirable.
We now discuss about computational time in the pair prediction scenario. The precomputed kernels approaches (MEAN and KA) were overall the fastest on average, with PKMbased methods requiring less time to train and test the models (∼1 min), followed by KronRLSbased and LapRLSbased algorithms(∼20 and 27 min, respectively). KBMF2MKL and BLM were the slowest, requiring more than 100 min on average at the same task. The lower computation time of the heuristicbased methods is explained by the absence of complex optimization procedures to find the kernel weights. KronRLSMKL took a little less time than KBMF2MKL, taking an average over the four datasets of 74 min. (see Additional file 3: Table S4).
Predictions on new drugtarget interactions
In order to evaluate the quality of final predictions in a more realistic scenario, we performed an experiment similar to that described by [10, 26]. We estimate the most highly ranked drug–target pairs as most likely true interactions, and performed a search on the current release of four major databases (DrugBank [39], MATADOR [38], KEGG [57]) and ChEMBL [58]. As the training datasets were generated almost eight years ago, new interactions included in these databases will serve as a external validation set. We exclude interactions already present in the training data.
We trained all methods with all interactions present in the original datasets. In the specific case of BLM and NRWRH, one model for drugs and another for targets was trained, and then the maximum score for each DT pair was considered for prediction. Then, we calculated the AUPR for each dataset separately, discarding already known interactions (see Additional file 3: Table S2). The low AUPR values of all methods indicate the difficulty in performing predictions in such large search space. An average ranking (Fig. 3) of each method across all databases indicates that KronRLS methods as best performing algorithms followed by single kernel approaches. It is also important to highlight the poor performance of BLMKA and BLMMEAN in this task. This indicates a poor generalization capacity of the BLM framework to the drugtarget prediction problem (see Table 3).
Next, a more practical assessment of the predicting power of KRONRLSMKL is done, by looking to the top 5 ranked interactions predicted by our method (Table 5). We observe that the great majority of interactions (14 out of 20) have been already described in ChEMBL, DrugBank or Matador. We focus our discussion in selected novel interactions. For example, in the Nuclear Receptor database, the 5th ranked prediction indicates the association of Tretinoin with the nuclear factor RARrelated orphan receptor A (RORa). Tretinoin is a drug currently used to treatment of acnes [59]. Interestingly, its molecular activity is associated with the activation of nuclear receptors of the closelly related RAR family.
This is also a good example to illustrate the benefits for incorporation of multiple sources of data. Both RORa and Tretinoin do not share nodes in the training set. All targets of Tretinoin have a high GO similarity to RORa (mean value of 0.8368) despite of theirr low sequence similarity (SW mean value is 0.1563). In addition, one of the targets RORa is NR0B1 (nuclear receptor subfamily 0, group B, member 1). This protein is very close to RORa in the PPI network (similarity score of 0.90).
Concerning Ion Channel models, prediction ranked 2 and 3 indicate the interaction of Verapamil and Diazoxide with ATPbinding cassete subfamily C (ABBCC8). ABBCC8 is one of the proteins encoding the sulfonylurea receptor (SUR1) and is associated to calcium regulation and diabetes type I [60]. Interestingly, there are positive reports of Diazoxide treatments to prevent diabetes in rats [61].
Evaluation of kernel weigths
The kernel weights given by KBMF2MKL, KRONRLSMKL and WANGMKL, as well as the KA heuristic, can be used to analyze the ability of such methods to identify the most relevant information sources. As there is no guideline or gold standard for this, we resort to a simple approach: compare the kernel weights (Fig. 4) with the average performance of each kernel on the single kernel experiments (Fig. 2). First, it is noticeable that the KA weights are very similar to the average selection (0.10). This indicates that no clear kernel selection is performed. WANGMKL and KRONRLSMKL give low weights to drug kernels LAMBDA, MARG, MINIMAX, SPEC and TAN and protein kernel MISk3m2. These kernels have overall worst AUPR in the single kernel experiments, which indicates an agreement with both selection procedures. Although the weights assigned by KBMF2MKL are not subject to convex constraints, as indicated by the larger weights assigned to all kernels, they also provide a notion of quality of base kernels. We can observe a stronger preference to the GIP kernel, in all datasets, even though the algorithm assigned a high weight for the lower quality MISk3m2 in three of the four datasets.
Conclusions
We have presented a new Multiple Kernel Learning algorithm for the bipartite link prediction problem, which is able to identify and select the most relevant information sources for DTI prediction. Most previous MKL methods mainly solve the problem of MKL when kernels are built over the same set of entities, which is not the case for the bipartite link prediction problem, e.g. drugtarget networks. Regarding predictions in drugtarget networks, the sampling of negative/unknown examples, as a way to cope with large data sets, is a clear limitation [2]. Our method takes advantage of the KronRLS framework to efficiently perform link prediction on data with arbitrary size.
In our experiments, the KronRLSMKL algorithm demonstrated an interesting balance between accuracy and computational cost in relation to other approaches. It performed best in the “pair” prediciton problem and the “new target” problem. In the ’new drug’ and ’new target’ prediction tasks, BLMKA was also top ranked. This method has a high computational cost. This arises from the fact it requires a classifier for each DT pair [2]. Moreover, it obtained poor results in the evaluation scenario to predict novel drugprotein pairs interactions.
The convex constraint estimation of kernel weights correlated well with the accuracy of a brute force pair kernel search. This nonsparse combination of kernels possibly increased the generalization of the model by reducing the bias for a specific type of kernel. This usually leads to better performance, since the model can benefit from different heterogeneous information sources in a systematic way [33]. Finally, the algorithm performance was not sensitive to class unbalance and can be trained over the whole interaction space without sacrificing performance.
Endnotes
^{1} http://sideeffects.embl.de/.
^{2} NRWRH cannot be applied to the pair prediction [8], by which this method was not considered in such context.
References
 1
Csermely P, Korcsmáros T, Kiss HJM, London G, Nussinov R. Structure and dynamics of molecular networks: a novel paradigm of drug discovery: a comprehensive review. Pharmacol Ther. 2013; 138(3):333–408. doi:10.1016/j.pharmthera.2013.01.016.
 2
Ding H, Takigawa I, Mamitsuka H, Zhu S. Similaritybased machine learning methods for predicting drugtarget interactions: a brief review. Brief Bioinform. 2013. doi:10.1093/bib/bbt056.
 3
Chen X, Yan CC, Zhang X, Zhang X, Dai F. Drug – target interaction prediction : databases, web servers and computational models. Brief Bioinform. 2015:1–17. doi:10.1093/bib/bbv066.
 4
Yamanishi Y. Chemogenomic approaches to infer drug–target interaction networks. Data Min Syst Biol. 2013; 939:97–113. doi:10.1007/9781627031073.
 5
Dudek AZ, Arodz T, Gálvez J. Computational methods in developing quantitative structureactivity relationships (QSAR): a review. Comb Chem High Throughput Screen. 2006; 9(3):213–8.
 6
Sawada R, Kotera M, Yamanishi Y. Benchmarking a wide range of chemical descriptors for drugtarget interaction prediction using a chemogenomic approach. Mol Inform. 2014; 33(1112):719–31. doi:10.1002/minf.201400066.
 7
Cheng F, Liu C, Jiang J, Lu W, Li W, Liu G, et al. Prediction of drugtarget interactions and drug repositioning via networkbased inference. PLoS Comput Biol. 2012; 8(5):1002503. doi:10.1371/journal.pcbi.1002503.
 8
Chen X, Liu MX, Yan GY. Drugtarget interaction prediction by random walk on the heterogeneous network. Mol BioSyst. 2012; 8(7):1970–8. doi:10.1039/c2mb00002d.
 9
Yamanishi Y, Kotera M, Kanehisa M, Goto S. Drugtarget interaction prediction from chemical, genomic and pharmacological data in an integrated framework. Bioinformatics (Oxford, England). 2010; 26(12):246–54. doi:10.1093/bioinformatics/btq176.
 10
van Laarhoven T, Nabuurs SB, Marchiori E. Gaussian interaction profile kernels for predicting drugtarget interaction. Bioinformatics (Oxford, England). 2011; 27(21):3036–43. doi:10.1093/bioinformatics/btr500.
 11
Pahikkala T, Airola A, Pietila S, Shakyawar S, Szwajda A, Tang J, et al. Toward more realistic drugtarget interaction predictions. Brief Bioinform. 2014. doi:10.1093/bib/bbu010.
 12
Pahikkala T, Airola A, Stock M, Baets BD, Waegeman W. Efficient regularized leastsquares algorithms for conditional ranking on relational data. Mach Learn. 2013; 93:321–356. arXiv:1209.4825v2.
 13
Gönen M, Alpaydın E. Multiple kernel learning algorithms. J Mach Learn Res. 2011; 12:2211–268.
 14
Perlman L, Gottlieb A, Atias N, Ruppin E, Sharan R. Combining drug and gene similarity measures for drugtarget elucidation. J Comput Biol. 2011; 18(2):133–45. doi:10.1089/cmb.2010.0213.
 15
Wang YC, Zhang CH, Deng NY, Wang Y. Kernelbased data fusion improves the drugprotein interaction prediction. Comput Biol Chem. 2011; 35(6):353–62. doi:10.1016/j.compbiolchem.2011.10.003.
 16
Wang Y, Chen S, Deng N, Wang Y. Drug repositioning by kernelbased integration of molecular structure, molecular activity, and phenotype data. PLoS ONE. 2013; 8(11):78518. doi:10.1371/journal.pone.0078518.
 17
BenHur A, Noble WS. Kernel methods for predicting proteinprotein interactions,. Bioinformatics (Oxford, England). 2005; 21 Suppl 1:38–46. doi:10.1093/bioinformatics/bti1016.
 18
Hue M, Riffle M, Vert Jp, Noble WS. Largescale prediction of proteinprotein interactions from structures. BMC Bioinforma. 2010; 11:144.
 19
AmmadUdDin M, Georgii E, Gönen M, Laitinen T, Kallioniemi O, Wennerberg K, et al. Integrative and Personalized QSAR Analysis in Cancer by Kernelized Bayesian Matrix Factorization. J Chem Inf Model. 2014; 1. doi:10.1021/ci500152b.
 20
Lanckriet GR, Deng M, Cristianini N, Jordan MI, Noble WS. Kernelbased data fusion and its application to protein function prediction in yeast. In: Pacific Symposium on Biocomputing. World Scientific: 2004. p. 300–11.
 21
Yu G, Zhu H, Domeniconi C, Guo M. Integrating multiple networks for protein function prediction. BMC Syst Biol. 2015; 9(Suppl 1):3. doi:10.1186/175205099S1S3.
 22
Gönen M, Kaski S. Kernelized Bayesian Matrix Factorization. IEEE Trans Pattern Anal Mach Intell. 2014; 36(10):2047–2060.
 23
Yamanishi Y, Araki M, Gutteridge A, Honda W, Kanehisa M. Prediction of drugtarget interaction networks from the integration of chemical and genomic spaces. Bioinformatics (Oxford, England). 2008; 24(13):232–40. doi:10.1093/bioinformatics/btn162.
 24
Park Y, Marcotte EM. Flaws in evaluation schemes for pairinput computational predictions. Nat Methods. 2012; 9(12):1134–6. doi:10.1038/nmeth.2259.
 25
Xia Z, Wu LY, Zhou X, Wong STC. Semisupervised drugprotein interaction prediction from heterogeneous biological spaces. BMC Syst Biol. 2010; 4 Suppl 2(Suppl 2):6. doi:10.1186/175205094S2S6.
 26
Bleakley K, Yamanishi Y. Supervised prediction of drugtarget interactions using bipartite local models. Bioinformatics (Oxford, England). 2009; 25(18):2397–403. doi:10.1093/bioinformatics/btp433.
 27
Jacob L, Vert JP. Proteinligand interaction prediction: an improved chemogenomics approach. Bioinformatics (Oxford, England). 2008; 24(19):2149–56. doi:10.1093/bioinformatics/btn409.
 28
Dinuzzo F. Learning functions with kernel methods. 2011. PhD thesis, University of Pavia.
 29
Rifkin R, Yeo G, Poggio T. Regularized leastsquares classification. Nato Science Series Sub Series III Computer and Systems Sciences. 2003; 190:131–54.
 30
Kimeldorf G, Wahba G. Some results on Tchebycheffian spline functions. J Math Anal Appl. 1971; 33(1):82–95.
 31
Kashima H, Oyama S, Yamanishi Y, Tsuda K. On pairwise kernels: an efficient alternative and generalization analysis. Adv Data Min Knowl Disc. 2009; 5476:1030–7.
 32
Laub AJ. Matrix Analysis for Scientists and Engineers. Davis, California: SIAM; 2005, pp. 139–44.
 33
Kloft M, Brefeld U, Laskov P, Sonnenburg S. Nonsparse multiple kernel learning. In: NIPS Workshop on Kernel Learning: Automatic Selection of Optimal Kernels (Vol. 4): 2008.
 34
Byrd RH, Hribar ME, Nocedal J. An interior point algorithm for largescale nonlinear programming. SIAM J Optim. 1999; 9(4):877–900. doi:10.1137/S1052623497325107.
 35
MATLAB. version 8.1.0 (R2013a). Natick, Massachusetts: The MathWorks Inc.; 2013.
 36
Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, Itoh M, et al. KEGG for linking genomes to life and the environment. Nucleic Acids Res. 2008; 36(suppl 1):480–4.
 37
Schomburg I, Chang A, Ebeling C, Gremse M, Heldt C, Huhn G, et al. BRENDA, the enzyme database: updates and major new developments. Nucleic Acids Res. 2004; 32(suppl 1):431–3.
 38
Günther S, Kuhn M, Dunkel M, Campillos M, Senger C, Petsalaki E, et al. SuperTarget and Matador: resources for exploring drugtarget relationships. Nucleic Acids Res. 2008; 36(suppl 1):919–22.
 39
Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, Tzur D, et al. DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res. 2008; 36(suppl 1):901–6.
 40
Eskin E, Weston J, Noble WS, Leslie CS. Mismatch String Kernels for SVM Protein Classification. In: Advances in neural information processing systemsNIPS: 2002. p. 1417–1424.
 41
Leslie CS, Eskin E, Noble WS. The spectrum kernel: a string kernel for SVM protein classification. In: Pac Symp Biocomput vol. 7: 2002. p. 566–575.
 42
Palme J, Hochreiter S, Bodenhofer U. KeBABS  an R package for kernelbased analysis of biological sequences. Bioinformatics. 2015; 31(15):2574–2576. doi:10.1093/bioinformatics/btv176.
 43
Smedley D, Haider S, Durinck S, Al E. The BioMart community portal: an innovative alternative to large, centralized data repositories. Nucleic Acids Res. 2015. doi:10.1093/nar/gkv350.
 44
Ovaska K, Laakso M, Hautaniemi S. Fast Gene Ontology based clustering for microarray experiments. BioData Min. 2008; 1(1):11.
 45
Resnik P. Semantic Similarity in a Taxonomy: An Information Based Measure and Its Application to Problems of Ambiguity in Natural Language. J Artif Intell Res. 1999; 11:95–130.
 46
Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M. BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 2006; 34(suppl 1):535–9.
 47
Hattori M, Okuno Y, Goto S, Kanehisa M. Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. J Am Ceram Soc. 2003; 125(39):11853–65.
 48
Klambauer G, Wischenbart M, Mahr M, Unterthiner T, Mayr A, Hochreiter S. Rchemcpp: a web service for structural analoging in ChEMBL, Drugbank and the Connectivity Map. Bioinformatics. 2015. Advance access doi:10.1093/bioinformatics/btv373.
 49
Kashima H, Tsuda K, Inokuchi A. Marginalized kernels between labeled graphs. In: ICML, vol. 3: 2003. p. 321–328.
 50
Ralaivola L, Swamidass SJ, Saigo H, Baldi P. Graph kernels for chemical informatics. Neural Netw. 2005; 18(8):1093–110. doi:10.1016/j.neunet.2005.07.009.
 51
Takarabe M, Kotera M, Nishimura Y, Goto S, Yamanishi Y. Drug target prediction using adverse event report systems: A pharmacogenomic approach. Bioinformatics. 2012; 28(18):611–8. doi:10.1093/bioinformatics/bts413.
 52
Kuhn M, Campillos M, Letunic I, Jensen LJ, Bork P. A side effect resource to capture phenotypic effects of drugs. Mol Syst Biol. 2010; 6(1):343.
 53
Qiu S, Lane T. A framework for multiple kernel support vector regression and its applications to siRNA efficacy prediction. IEEE/ACM Trans Comput Biol Bioinf. 2009; 6(2):190–9.
 54
Cristianini N, Kandola J, Elisseeff A, ShaweTaylor J. On kerneltarget alignment. In: Advances in Neural Information Processing Systems 14. Cambridge MA: MIT Press: 2002. p. 367–73.
 55
Gönen M. Predicting drugtarget interactions from chemical and genomic kernels using Bayesian matrix factorization. Bioinformatics (Oxford, England). 2012; 28(18):2304–10. doi:10.1093/bioinformatics/bts360.
 56
Davis J, Goadrich M. The relationship between PrecisionRecall and ROC curves. In: Proceedings of the 23rd international conference on Machine learning  ICML ’06. New York, NY, USA: ACM: 2006. p. 233–40. doi:10.1145/1143844.1143874.
 57
Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000; 28(1):27–30.
 58
Bento AP, Gaulton A, Hersey A, Bellis LJ, Chambers J, Davies M, et al. The ChEMBL bioactivity database: an update. Nucleic Acids Res. 2014; 42(D1):1083–90. doi:10.1093/nar/gkt1031.
 59
Webster GF. Topical tretinoin in acne therapy. J Am Acad Dermatol. 1998; 39(2):38–44.
 60
REIS A, VELHO G. Sulfonylurea receptor1 (sur1): Genetic and metabolic evidences for a role in the susceptibility to type 2 diabetes mellitus. Diabetes Metab. 2002; 28(1):14–19.
 61
Huang Q, Bu S, Yu Y, Guo Z, Ghatnekar G, Bu M, et al. Diazoxide prevents diabetes through inhibiting pancreatic βcells from apoptosis via bcl2/bax rate and p38 β mitogenactivated protein kinase. Endocrinology. 2007; 148(1):81–91.
Acknowledgements
The authors thank the authors of the studies by [23] for making their data publicly available. This work was supported by the Interdisciplinary Center for Clinical Research (IZKF Aachen), RWTH Aachen University Medical School, Aachen, Germany; DAAD; and Brazilian research agencies: FACEPE, CAPES and CNPq.
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
Conceived and designed the experiments: AN RP IC. Performed the experiments: AN. Analyzed the data: AN RP IC. All authors read and approved the final manuscript.
Additional files
Additional file 1
Figure. Single kernel experiments on the Nuclear Receptor dataset with the KronRLS algorithm as base learner. The heatmap shows the AUPR performance of different kernel combinations; red means higher AUPR. (PDF 460 kb)
Additional file 2
Spreadsheet. pvalues under pairwise Wilcoxon Rank Sum statistical tests of all competing methods in pair, drug and target prediction tasks. (XLS 24 kb)
Additional file 3
Supplementary Tables. AUPR Results of competing methods under pair prediction setting considering subsampled test sets (S1); AUPR results of predicted scores against new interactions found on current release of KEGG, Matador, Drugbank and ChEMBL databases (S2); Average memory (MB) usage during training and testing of competing methods (S3); Average time (minutes) required to train and test the models with the competing methods (S4). (PDF 89.7 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Received
Accepted
Published
DOI
Keywords
 Artificial intelligence
 Supervised machine learning
 Kernel methods
 Multiple kernel learning
 Drug discovery