# A multiple kernel learning algorithm for drug-target interaction prediction

- André C. A. Nascimento
^{1, 2, 3}Email author, - Ricardo B. C. Prudêncio
^{1}and - Ivan G. Costa
^{1, 3, 4}

**17**:46

https://doi.org/10.1186/s12859-016-0890-3

© Nascimento et al. 2016

**Received: **23 July 2015

**Accepted: **5 January 2016

**Published: **22 January 2016

## Abstract

### Background

Drug-target networks are receiving a lot of attention in late years, given its relevance for pharmaceutical innovation and drug lead discovery. Different in silico approaches have been proposed for the identification of new drug-target interactions, many of which are based on kernel methods. Despite technical advances in the latest years, these methods are not able to cope with large drug-target interaction spaces and to integrate multiple sources of biological information.

### Results

We propose KronRLS-MKL, which models the drug-target interaction problem as a link prediction task on bipartite networks. This method allows the integration of multiple heterogeneous information sources for the identification of new interactions, and can also work with networks of arbitrary size. Moreover, it automatically selects the more relevant kernels by returning weights indicating their importance in the drug-target prediction at hand. Empirical analysis on four data sets using twenty distinct kernels indicates that our method has higher or comparable predictive performance than 18 competing methods in all prediction tasks. Moreover, the predicted weights reflect the predictive quality of each kernel on exhaustive pairwise experiments, which indicates the success of the method to automatically reveal relevant biological sources.

### Conclusions

Our analysis show that the proposed data integration strategy is able to improve the quality of the predicted interactions, and can speed up the identification of new drug-target interactions as well as identify relevant information for the task.

### Availability

The source code and data sets are available at www.cin.ufpe.br/~acan/kronrlsmkl/.

## Keywords

## Background

Drug-target networks are receiving a lot of attention in late years, given their relevance for pharmaceutical innovation and drug repositioning purposes [1–3]. Although the amount of known interactions between drugs and target proteins has been increasing, the number of targets for approved drugs is still only a small proportion (<10 *%*) from the human proteome [1]. Recent advances on high-throughput methods provide ways for the production of large data sets about molecular entities as drugs and proteins. There is also an increase in the availability of reliable databases integrating information about interactions between these entities. Nevertheless, as the experimental verification of such interactions does not scale with the demand for innovation, the use of computational methods for the large scale prediction is mandatory. There is also a clear need for systems-based approaches to integrate these data for drug discovery and repositioning applications [1].

Recently, an increasing number of methods have been proposed for drug-target interaction (DTI) prediction. They can be categorized in ligand-based, docking-based, or network-based methods [4]. The docking approach, which can provide accurate estimates to DTIs, is computationally demanding and requires a 3D model of the target protein. Ligand-based methods, such as the quantitative structure activity relationship (QSAR), are based on a comparison of a candidate ligand to the known ligands of a biological target [5]. However, the utility of these ligand-based methods is limited when there are few ligands for a given target [2, 4, 6]. Alternatively, network based approaches use computational methods and known DTIs to predict new interactions [4, 5]. Even though ligand-based and docking-based methods are more precise when compared to network based approaches, the latter are more adequate for the estimation of new interactions from complete proteomes and drugs catalogs [1]. Therefore, it can indicate novel candidates to be evaluated by more accurate methods.

*kernel*can be seen as a similarity matrix estimated on all pairs of instances. The main assumption behind network kernel methods is that similar ligands tend to bind to similar targets and vice versa. These approaches use

*base kernels*to measure the similarity between drugs (or targets) using distinct sources of information (e.g., structural, pharmacophore, sequence and function similarity). A

*pairwise kernel*function, which measures the similarity between drug-target pairs, is obtained by combining a drug and a protein base kernel via kernel product.

The majority of previous network approaches use classification methods, as Support Vector Machines (SVM), to perform predictions over the drug-target interaction space [2, 4]. However, such techniques have major limitations. First, they can only incorporate one pair of base kernels at a time (one for drugs and one for proteins) to perform predictions. Second, the computation of the pairwise kernel matrix for the whole interaction space (all possible drug-target pairs) is computationally unfeasible even for a moderate number of drugs and targets. Moreover, most drug target interaction databases provide no true negative interaction examples. The common solution for these issues is to randomly sample a small proportion of unknown interactions to be used as negative examples. While this approach provides a computationally trackable small drug-target pairwise kernel, it generates an easier but unreal classification task with balanced class size [12].

An emerging machine learning (ML) discipline focused on the search for an optimal combination of kernels, called Multiple Kernel Learning (MKL) [13]. MKL-like methods have been previously proposed to the problem of DTI prediction [14–16] and the closely related protein-protein interaction (PPI) prediction problem [17, 18]. This is extremely relevant, as it allows the use of distinct sources of biological information to define similarities between molecular entities. However, since traditional MKL methods are SVM-based [13, 19], they are subject to memory limitations imposed by the pairwise kernel, and are not able to perform predictions in the complete drugs vs. protein space. Moreover, MKL approaches used in PPI prediction problem [17, 18] and protein function prediction [20, 21] can not be applied to bipartite graphs, as the problem at hand. Currently, we are only aware of two recent works [19, 22] proposing MKL approach to integrate similarity measures for drugs and targets.

Drug-target prediction fits a link prediction problem [4], which can be solved by a Kronecker regularized least squares approach (KronRLS) [10]. A single kernel version of this method has been recently applied to drug-target prediction problem [10, 11]. A recent survey indicated that KronRLS outperforms SVM based methods in DTI prediction [2]. KronRLS uses Kronecker product algebraic properties to be able to perform predictions on the whole drug-target space, without the explicit calculation of the pairwise kernels. Therefore, it can cope with problems on large drugs vs. proteins spaces. However, KronRLS can not be used on a MKL context.

In this work, we propose a new MKL algorithm to automatically select and combine kernels on a bipartite drug-protein prediction problem, the KronRLS-MKL algorithm (Fig 1). For this, we extend the KronRLS method to a MKL scenario. Our method uses *L*2 regularization to produce a non-sparse combination of base kernels. The proposed method can cope with large drug vs. target interaction matrices; does not requires sub-sampling of the drug-target network; and is also able to combine and select relevant kernels. We perform an empirical analysis using drug-target datasets previously described [23] and a diverse set of drug kernels (10) and protein kernels (10).

In our experiments, we considered three different scenarios in the DTI prediction [2, 11, 24]: pair prediction, where every drug and target in the training set have at least one known interaction; or the ‘new drug’ and ‘new target’ setting, where some drugs and targets are present only in the test set, respectively. A comparative analysis with top performance single kernel approaches [2, 8, 10, 25–27] and all competing integrative approaches [14, 15, 22] demonstrates that our method is better or competitive in the majority of evaluated scenarios. Moreover, KronRLS-MKL was able to select and also indicate the relevance of kernels, in the form of weights, for each problem.

## Methods

In this work, we propose an extension of the KronRLS algorithm under recent developments of the MKL framework [28] to address the problem of link prediction on bipartite networks with multiple kernels. Before introducing our method, we will describe the RLS and the KronRLS algorithms (for further information, see [10, 11]).

### RLS and KronRLS

*x*

_{ i }(drug-target pairs) and their binary labels \(y_{i} \in \mathbb {R}\) (where 1 stands for a known interaction and 0 otherwise), with 1<

*i*≤

*n*,

*n*=|

*D*||

*T*| (number of drug-target pairs). The RLS approach minimizes the following function [29]:

*f*∥

_{ K }is the norm of the prediction function

*f*on the Hilbert space associated to the kernel

*K*, and

*λ*>0 is a regularization parameter which determines the compromise between the prediction error and the complexity of the model. According to the representer theorem [30], a minimizer of the above objective function admits a dual representation of the following form

where \(K: |D||T| \times |D||T| \rightarrow \mathbb {R}\) is named the pairwise kernel function and a is the vector of dual variables corresponding to each separation constraint. The RLS algorithm obtains the minimizer of Eq. 1 solving a system of linear equations defined by (*K*+*λ*
*I*)a=y, where a and y are both *n*-dimensional vectors consisting of the parameters *a*
_{
i
} and labels *y*
_{
i
}.

One can construct such pairwise kernel as the product of two base kernels, namely *K*((*d,t*),(*d*
^{′},*t*
^{′}))=*K*
_{
D
}(*d,d*
^{′})*K*
_{
T
}(*t,t*
^{′}), where *K*
_{
D
} and *K*
_{
T
} are the base kernels for drugs and targets, respectively. This is equivalent to the Kronecker product of the two base kernels [4, 31]: *K*=*K*
_{
D
}⊗*K*
_{
T
}. The size of the kernel matrix makes the model training computationally unfeasible even for moderate number of drugs and targets [4].

The KronRLS algorithm is a modification of RLS, and takes advantage of two specific algebraic properties of the Kronecker product to speed up model training: the so called *vec trick* [31] and the relation of the eigendecomposition of the Kronecker product to the eigendecomposition of its factors [11, 32].

*K*

_{ D }e

*K*

_{ T }. The solution a can be given by solving the following equation [11]:

*v*

*e*

*c*(·) is the vectorization operator that stacks the columns of a matrix into a vector, and

*C*is a matrix defined as:

The KronRLS algorithm is well suited for the large pairwise space involved on the DTI prediction problem, since the estimation of vector a using Eqs. 3 and 4 is a much faster solution compared to the original RLS estimation process in such scenario. However, it does not support the use of multiple kernels.

### KronRLS MKL

In this work, a vector of different kernels is considered, i.e., \(\boldsymbol {k}_{D} = ({K_{D}^{1}}, {K_{D}^{2}},\ldots, K_{D}^{P_{D}})\) and \(\boldsymbol {k}_{T} = ({K_{T}^{1}}, {K_{T}^{2}}, \ldots, K_{T}^{P_{T}})\), *P*
_{
D
} and *P*
_{
T
} indicate the number of base kernels defined over the drugs and target set, respectively. In this section, we propose an extension of KronRLS to handle multiple kernels.

In [28], the author demonstrated that MKL can be interpreted as a particular instance of a kernel machine with two layers, in which the second layer is a linear function. His work provides the theoretical basis for the development of a MKL extension for the closely related KronRLS algorithm in our work.

*f*

_{ a }=

*K*a [29] and applying the well known property of the Kronecker product, (

*A*⊗

*B*)

*v*

*e*

*c*(

*X*)=

*v*

*e*

*c*(

*B*

*X*

*A*

^{ T })[32], we have:

*A*=

*u*

*n*

*v*

*e*

*c*(a). Using the same iterative approach considered in previous MKL strategies [13], we propose the use of a two step optimization process, in which the optimization of the vector a is interleaved with the optimization of the kernel weights. Given two initial weight vectors, \(\boldsymbol {\beta }_{D}^{0}\) and \(\boldsymbol {\beta }_{T}^{0}\), an optimal value for the vector a, using Eq. 3 is found, and with such optimal a, we can proceed to find optimal β

_{ D }and β

_{ T }. More specifically, Eq. 1 can be redefined when a is fixed, and knowing that \(\parallel f {\parallel _{F}^{2}}=\boldsymbol {a}^{T}K\boldsymbol {a}\) [28], we have:

*K*(and thus does not depend on the kernel weights), and, as y and a are fixed, it can be discarded from the weights optimization procedure. Note that we are not interested in a sparse selection of base kernels as in [28], therefore we introduce a

*L*2 regularization term to control sparsity [33] of the kernel weights, also known as a ball constraint. This term is parameterized by the

*σ*regularization coefficient. Additionally, we can convert u to its matrix form by the application of the

*unvec*operator, i.e.,

*U*=

*u*

*n*

*v*

*e*

*c*(u), and also use a more appropriate matrix norm (Frobenius, ∥

*A*∥

_{2}≤∥

*A*∥

_{ F }[32]). In this way, for any fixed values of a and β

_{ T }, the optimal value for the combination vector is obtained by solving the optimization problem defined as:

_{ T }can be found fixing the values of a and β

_{ D }, according to:

The optimization method used here is the interior-point optimization algorithm [34] implemented in MATLAB [35].

### Data

Number drugs, targets and positive instances (known interactions) vs. the number of negative (or unknown) interactions on each dataset

Datasets | ||||
---|---|---|---|---|

Nuclear receptors | GPCR | Ion channel | Enzyme | |

Interactions | ||||

Known | 90 | 635 | 1476 | 2926 |

(6.41 %) | (3 %) | (3.45 %) | (1 %) | |

Unknown | 1314 | 20550 | 41364 | 292554 |

(93.59 %) | (97 %) | (96.55 %) | (99 %) | |

Entity | ||||

Drugs | 54 | 223 | 210 | 445 |

Targets | 26 | 95 | 204 | 664 |

Network entities and respective kernels considered for combination purposes

Entity | Kernels | Information |
---|---|---|

source | ||

Drugs | AERS-bit - AERS bit | Side-effects |

AERS-freq - AERS freq | Side-effects | |

GIP - Gaussian Interaction Profile | Network | |

LAMBDA - Lambda-k Kernel | Chem. Struct. | |

MARG - Marginalized Kernel | Chem. Struct. | |

MINMAX - MinMax Kernel | Chem. Struct. | |

SIMCOMP - Graph kernel | Chem. Struct. | |

SIDER - Side-effects Similarity | Side-effects | |

SPEC - Spectrum Kernel | Chem. Struct. | |

TAN - Tanimoto Kernel | Chem. Struct. | |

Proteins | GIP - Gaussian Interaction Profile | Network |

GO - Gene Ontology Semantic Similarity | Func. Annot. | |

MIS-k3m1 - Mismatch kernel ( | Sequences | |

MIS-k4m1 - Mismatch kernel ( | Sequences | |

MIS-k3m2 - Mismatch kernel ( | Sequences | |

MIS-k4m2 - Mismatch kernel ( | Sequences | |

PPI - Proximity in protein-protein network | Protein-protein Interactions | |

SPEC-k3 - Spectrum kernel ( | Sequences | |

SPEC-k4 - Spectrum kernel ( | Sequences | |

SW - Smith-Waterman aligment score | Sequences |

#### Protein kernels

Here we use the following information sources about target proteins: amino acid sequence, functional annotation and proximity in the protein-protein network. Concerning sequence information, we consider the normalized score of the Smith-Waterman alignment of the amino acid sequence (SW) [23], as well as different parametrizations of the Mismatch (MIS) [40] and the Spectrum (SPEC) [41] kernels. For the Mismatch kernel, we evaluated four combinations of distinct values for the k-mers length (*k*=3 and *k*=4) and the number of maximal mismatches per k-mer (*m*=1 and *m*=2), namely MIS-k3m1, MIS-k3m2, MIS-k4m1 and MIS-k4m2; for the Spectrum kernel, we varied the k-mers length (*k*=3 and *k*=4, SPEC-k3 and SPEC-k4, respectively). Both Mismatch and Spectrum kernels were calculated using the R packageKeBABS [42].

*A*and

*b*parameters were set as in [14] (

*A*=0.9,

*b*=1), and

*D*(

*p,p*

^{′}) is the shortest hop distance between proteins

*p*and

*p*

^{′}.

#### Drug kernels

As drug information sources, we consider 6 distinct chemical structure and 3 side-effects kernels. Chemical structure similarity between drugs was achieved by the application of the SIMCOMP algorithm [47] (obtained from [23]), defined as the ratio of common substructures between two drugs based on the chemical graph alignment. We also computed the Lambda-k kernel (LAMBDA) [48], the Marginalized kernel [49] (MARG), the MINMAX kernel [50], the Spectrum kernel [48] (SPEC) and the Tanimoto kernel [50] (TAN). These later kernels were calculated with the R Package Rchemcpp [48] with default parameters.

Two distinct side-effects data sources were also considered. The FDA adverse event reporting system (AERS), from which side effect keywords (adverse event keywords) similarities for drugs were first retrieved by [51]. The authors introduced two types of pharmacological profiles for drugs, one based on the frequency information of side effect keywords in adverse event reports (AERS-freq) and another based on the binary information (presence or absence) of a particular side-effect in adverse event reports (AERS-bit). Since not every drug in the Nuclear Receptors, Ion Channel, GPCR and Enzyme datasets is also present on AERS-based data, we extracted the similarities of the drugs in AERS, and assigned zero similarity to drugs not present.

The second side-effect resource was the SIDER database^{1} [52]. This database contains information about commercial drugs and their recorded side effects or adverse drug reactions. Each drug is represented by a binary profile, in which the presence or absence of each side effect keyword is coded 1 or 0, respectively. Both AERS and SIDER based profile similarities were obtained by the weighted cosine correlation coefficient between each pair of drug profiles [51].

#### Network topology information

We also use drug-target network structure in the form of a network interaction profile as a similarity measure for both proteins and drugs. The idea is to encode the connectivity behavior of each node in the subjacent network. The Gaussian Interaction Profile kernel (GIP) [10] was calculated for both drugs and targets.

### Competing methods

We compare the predictive performance of the KronRLS-MKL algorithm against other MKL approaches, as well as in a single kernel context (one kernel for drugs, and one for targets). In the latter, we evaluate the performance of each possible combination of base kernels (Table 2) with the KronRLS algorithm, recently reported as the best method for predicting drug-target pairs with single paired kernels [2]. This resulted in a total of 10×10=100 different combinations. The best performing pairs were then used as baselines in our method evaluation, selected according to two distinct criteria: the kernel pair that achieved the largest area under the precision recall curve (AUPR) on the training set, and, a more optimistic approach, which considered the largest AUPR on the testing set.

Besides the combination of single kernels for drugs and targets, two different kinds of methods were adopted to integrate multiple kernels: (1) standard non-MKL kernel methods for DTI prediction, trained on the average of multiple kernels (respectively for drugs and targets); (2) actual MKL methods specifically proposed for DTI prediction.

#### Non-MKL approaches

We extend state-of-the-art methods [8, 10, 25–27] for the DTI prediction problem for a multiple kernel context. For this, initially we average multiple kernels to produce a single kernel (respectively for drugs and targets). Once we have a single average kernel (one for drug and one for target), we adopt a standard kernel method for DTI prediction, i.e., the base learner. In our experiments, two distinct previous combinations strategies are used: the mean of base kernels and the kernel alignment (KA) heuristic, previously proposed by [53]. We will briefly describe the base learners, followed by a short overview of the two combination strategies considered.

The Bipartite Local Model (BLM) [26] is a machine learning based algorithm, where drug-target pairs are predicted by the construction of the so called ‘local models’, i.e., a SVM classifier is trained for each drug in the training set, and the same is done for targets. Then, the maximum scores for drugs and targets are used to predict new drug-target interactions. Since BLM demonstrated superior performance than Kernel Regression Method (KRM) [23] in previous studies [2, 26], we did not consider KRM in our experiments.

The Network-based Random Walk with Restart on the Heterogeneous network (NRWRH) [8] algorithm predicts new interactions between drugs and targets by the simulation of a random walk in the network of known drug-target predictions as well as in the drug-drug and protein-protein similarity networks. LapRLS and NetLapRLS are both proposed in [25]. Both are based on the RLS learning algorithm, and perform similarity normalization by the application of the Laplacian operator. Predictions are done for drugs and targets separately, and the final prediction scores are obtained by averaging the prediction result from drug and target spaces.

As said previously, most previous SVM-based methods found on the literature can be reduced to the Pairwise Kernel Method (PKM) [27], with the distinction being made by the kernels used and the adopted combination strategy. PKM starts with the construction of a pairwise kernel, computed from the drug and target similarities. Given two drug-target pairs, (*d,p*) and (*d*
^{′},*p*
^{′}), and the respective drug and target similarities, *K*
_{
D
} and *K*
_{
P
}, the pairwise kernel is given by *K*((*d,p*),(*d*
^{′},*p*
^{′})=*K*
_{
D
}(*d,d*
^{′})×*K*
_{
P
}(*p,p*
^{′}). Once the pairwise matrix is computed, it is then used to train a SVM classifier.

_{ D }for instance, can be obtained by:

^{ T }stands for the ideal kernel and y being the label vector. The alignment

*A*(

*K*,yy

^{ T }) of a given kernel

*K*and the ideal kernel yy

^{ T }is defined as:

where \(\left \langle K,\boldsymbol {yy}^{T} \right \rangle _{F} = \sum \limits _{i=1}^{n}\sum \limits _{j=1}^{n} (K)_{\textit {ij}} \left (\boldsymbol {yy}^{T}\right)_{\textit {ij}}\). Once such combinations are performed, the resulting drug and protein kernels are then used as input to the learning algorithm. We refer to the mean and KA heuristics appending the -MEAN and -KA, respectively, to each base learner.

#### Multiple kernel approaches

Similarity-based Inference of drug-TARgets (SITAR) [14] constructs a feature vector with the similarity values, where each feature is based on one drug-drug and one gene-gene similarity measure, resulting in a total of *P*
_{
D
}×*P*
_{
T
} features. Each one is calculated by combining the drug-drug similarities between the query drug and other drugs and the gene-gene similarities between the query gene and other target genes across all true drug-target associations. The method also performs a feature selection procedure and yields the final classification scores using a logistic regression classifier.

Gönen and Kaski [22] proposed the Kernelized Bayesian Matrix Factorization with Twin Multiple Kernel Learning (KBMF2MKL) algorithm, extending a previous work [55] to handle multiple kernels. The KBMF2MKL factorizes the drug-target interaction matrix by projecting the drugs and the targets into a common subspace, where the projected drug and target kernels are multiplied. Normally distributed Kernel weights for each subspace projected kernel are then estimated without any constraints. The product of the final combined matrices is then used to make predictions.

Wang et al. [15] proposes to use a simple heuristic to previously combine the drug and target similarities, and then use a SVM classifier to perform the predictions. Only the maximum similarity values of drug and target kernel matrices are selected, resulting in two distinct kernels. They are then used to construct a pairwise kernel, computed from the drug and target similarities. Once the pairwise matrix is computed, it is then used to train a SVM classifier. This procedure is also known as the Pairwise Kernel Method (PKM) [27]. For this reason, we refer to the approach proposed by [15] by PKM-MAX.

_{ D },

*dist*is the drug-drug distance matrix in the DTI network, and

*corr*represents the correlation coefficient. Analogously, the same can be done for targets. We call this method WANG-MKL.

### Experimental setup

- 1.
‘new drug’ scenario: it simulates the task of predicting targets for new drugs. In this scenario, the drugs in a dataset were divided in 5 disjoint subsets (folds). Then the pairs associated to 4 folds of drugs were used to train the classifier and the remaining pairs are used to test;

- 2.
‘new target’ scenario: it corresponds in turn to predicting interacting drugs for new targets. This is analogous to the above scenario, however considering 5 folds of targets;

- 3.
pair prediction: is consists of predicting unknown interactions between known drugs and targets. All drug-target interactionswere split in five folds, from which 4 were used for training and 1 for testing. Some of the competing methods (PKM-based, WANG-MKL and SITAR) were trained with sub-sampled datasets, i.e., we randomly selected the same number of known interactions among the unknown interaction set, since these methods cannot be executed in large networks [2, 4, 14, 15]. Although balanced classes are unlikely in real scenarios, we also performed experiments in context (3), using a sub-sampled test set, obtained by sampling as many negative examples as positive examples [14, 15] from the test fold. This experiment is relevant for comparison to previous work, since most previous studies on drug-target prediction performed under-sampling to evaluate predictive performance (see Additional file 1: Table S1).

^{2}

The hyperparameters of each competing methods were optimized under a nested CV procedure, using the following values: for the SVM-based methods (PKM, BLM and WANG-MKL), the SVM cost parameter was evaluated under the interval {2^{−1},…,2^{3}}; for the KronRLS-based methods, the *λ* parameter was evaluated in the interval {2^{−15},2^{−10},…,2^{30}}. The *σ* regularization coefficient of the KRONRLS-MKL algorithm was also optimized in the interval {0,0.25,0.5,0.75,1}. The number of components in KBMF2MKL was varied in the interval *R*∈{5,10,…,40}, and for the LapRLS and NetLapRLS we varied *β*
_{
d
},*β*
_{
t
}∈{0.25,0.50,…,1}. In NetLapRLS we also considered two distinct values for *γ*
_{
d2},*γ*
_{
t2}∈{0.01,0.1}. For NRWRH the restart probability was evaluated in the set {0.1,0.2,…,0.9}. After the hyperparameters were selected for each method, the outer loop evaluated the predictive performance for the test set partition with the model built using the selected hyperparameters.

The evaluation metric considered was the AUPR, as it allows a good quantitative estimate of the ability to separate the positive interactions from the negative ones. According to [56], this metric provides a better quality estimate for highly unbalanced data, since it punishes more heavily the existence of false positives (FP). This is specially true for the datasets considered, as demonstrated on Table 1, in which all datasets are extremely unbalanced.

## Results and discussion

### Paired kernel experiments

As a base study, we evaluate the performance of KronRLS on all pairs of kernels (10×10 pairs). The AUPR results of all pairs of kernels for the Nuclear Receptors, GPCR, Ion Channel and Enzyme datasets are show in more detail in the supplementary material (see Additional file 1).

### Comparative analysis

*α*=0.05), except from KRONRLS-KA and KRONRLS-MEAN, according to the Wilcoxon rank sum test (Additional file 2). Concerning the subsampled pair prediction, KRONRLS-MKL achieved highest AUPR in the NR and IC data sets, and SITAR performed best in the GPCR and Enzyme data. There it performed second, just after SITAR (see Additional file 3: Table S1). The highest AUPR values obtained in the subsampled data sets in comparison to the unbalanced data sets clearly indicate that performing predictions in the complete data is a more difficult task. Moreover, the number of positive examples was negatively correlated to the dataset size for the complete datasets.

Results on MKL Experiments on 5 × 5 cross-validation experiments

Dataset | Combination | Pairs | Targets | Drugs | |||
---|---|---|---|---|---|---|---|

NR | [SPEC-k4]-[AERS-freq] | 0.4630 | (±0.0215) | 0.3851 | (±0.0254) | 0.2341 | (±0.0054) |

[SPEC-k4]-[GIP] | 0.5187 | (±0.0255) | 0.3725 | (±0.0247) | 0.0949 | (±0.0068) | |

BLM-KA | 0.0709 | (±0.0048) | 0.3441 | (±0.0264) | 0.3130 | (±0.0224) | |

BLM-MEAN | 0.0685 | (±0.0062) | 0.3453 | (±0.0264) | 0.2934 | (±0.0154) | |

KBMF2MKL | 0.2041 | (±0.0150) | 0.2059 | (±0.0388) | 0.1459 | (±0.0272) | |

KRONRLS-KA | 0.4321 | (±0.0147) | 0.3489 | (±0.0337) | 0.2850 | (±0.0126) | |

KRONRLS-MEAN | 0.4078 | (±0.0211) | 0.3482 | (±0.0341) | 0.2665 | (±0.0109) | |

KRONRLS-MKL |
| (±0.0137) |
| (±0.0321) |
| (±0.0224) | |

LAPRLS-KA | 0.1989 | (±0.0207) | 0.2120 | (±0.0277) | 0.1841 | (±0.0044) | |

LAPRLS-MEAN | 0.1870 | (±0.0196) | 0.2008 | (±0.0251) | 0.1832 | (±0.0022) | |

NETLAPRLS-KA | 0.2310 | (±0.0277) | 0.2091 | (±0.0288) | 0.1841 | (±0.0044) | |

NETLAPRLS-MEAN | 0.2195 | (±0.0273) | 0.1989 | (±0.0263) | 0.1831 | (±0.0023) | |

NRWRH-KA | – | – | 0.1776 | (±0.0380) | 0.1911 | (±0.0116) | |

NRWRH-MEAN | – | – | 0.1755 | (±0.0364) | 0.1881 | (±0.0109) | |

PKM-KA | 0.1830 | (±0.0114) | 0.2363 | (±0.0387) | 0.1741 | (±0.0158) | |

PKM-MAX | 0.0946 | (±0.0188) | 0.0774 | (±0.0108) | 0.1174 | (±0.0080) | |

PKM-MEAN | 0.1702 | (±0.0099) | 0.2163 | (±0.0400) | 0.1672 | (±0.0152) | |

SITAR | 0.4477 | (±0.0658) | 0.1396 | (±0.0505) | 0.0694 | (±0.0189) | |

WANG-MKL | 0.3293 | (±0.0175) | 0.2238 | (±0.0300) | 0.2628 | (±0.0225) | |

GPCR | [SPEC-k4]-[MINMAX] | 0.3246 | (±0.0093) | 0.5053 | (±0.0322) | 0.0924 | (±0.0055) |

[SW]-[GIP] | 0.6188 | (±0.0075) | 0.4561 | (±0.0201) | 0.0419 | (±0.0014) | |

BLM-KA | 0.0633 | (±0.0071) |
| (±0.0123) | 0.3000 | (±0.0198) | |

BLM-MEAN | 0.0519 | (±0.0032) | 0.5353 | (±0.0135) | 0.2526 | (±0.0188) | |

KBMF2MKL | 0.4960 | (±0.0124) | 0.0963 | (±0.0346) | 0.1408 | (±0.0120) | |

KRONRLS-KA | 0.6208 | (±0.0081) | 0.4727 | (±0.0101) | 0.3005 | (±0.0148) | |

KRONRLS-MEAN | 0.6213 | (±0.0085) | 0.4461 | (±0.0086) | 0.2731 | (±0.0155) | |

KRONRLS-MKL |
| (±0.0052) | 0.4127 | (±0.0126) |
| (±0.0112) | |

LAPRLS-KA | 0.2183 | (±0.0067) | 0.1458 | (±0.0050) | 0.1210 | (±0.0058) | |

LAPRLS-MEAN | 0.2169 | (±0.0066) | 0.1369 | (±0.0049) | 0.1215 | (±0.0061) | |

NETLAPRLS-KA | 0.3763 | (±0.0096) | 0.1451 | (±0.0041) | 0.1211 | (±0.0062) | |

NETLAPRLS-MEAN | 0.3841 | (±0.0088) | 0.1357 | (±0.0039) | 0.1221 | (±0.0061) | |

NRWRH-KA | – | – | 0.0762 | (±0.0041) | 0.1201 | (±0.0088) | |

NRWRH-MEAN | – | – | 0.0704 | (±0.0036) | 0.1176 | (±0.0099) | |

PKM-KA | 0.2625 | (±0.0133) | 0.2327 | (±0.0175) | 0.1424 | (±0.0146) | |

PKM-MAX | 0.1230 | (±0.0106) | 0.0652 | (±0.0071) | 0.0935 | (±0.0044) | |

PKM-MEAN | 0.2613 | (±0.0178) | 0.1632 | (±0.0186) | 0.1254 | (±0.0107) | |

SITAR | 0.5324 | (±0.0267) | 0.1151 | (±0.0538) | 0.0283 | (±0.0110) | |

WANG-MKL | 0.4240 | (±0.0071) | 0.3521 | (±0.0111) | 0.2686 | (±0.0274) | |

IC | [PPI]-[GIP] | 0.6789 | (±0.0078) | 0.1548 | (±0.0020) | 0.0467 | (±0.0009) |

[SW]-[GIP] | 0.8679 | (±0.0056) | 0.7301 | (±0.0140) | 0.0476 | (±0.0008) | |

BLM-KA | 0.1169 | (±0.0127) |
| (±0.0047) |
| (±0.0304) | |

BLM-MEAN | 0.1106 | (±0.0088) | 0.7798 | (±0.0040) | 0.2152 | (±0.0257) | |

KBMF2MKL | 0.7671 | (±0.0033) | 0.4420 | (±0.0141) | 0.0856 | (±0.0044) | |

KRONRLS-KA | 0.8553 | (±0.0017) | 0.7246 | (±0.0071) | 0.2039 | (±0.0190) | |

KRONRLS-MEAN | 0.8693 | (±0.0011) | 0.6885 | (±0.0067) | 0.1887 | (±0.0186) | |

KRONRLS-MKL | 0.8769 | (±0.0011) | 0.6894 | (±0.0056) | 0.2406 | (±0.0259) | |

LAPRLS-KA | 0.3088 | (±0.0021) | 0.2747 | (±0.0031) | 0.0942 | (±0.0022) | |

LAPRLS-MEAN | 0.3187 | (±0.0024) | 0.2760 | (±0.0032) | 0.0939 | (±0.0021) | |

NETLAPRLS-KA | 0.5359 | (±0.0065) | 0.2750 | (±0.0032) | 0.0931 | (±0.0022) | |

NETLAPRLS-MEAN | 0.5560 | (±0.0073) | 0.2766 | (±0.0034) | 0.0928 | (±0.0023) | |

NRWRH-KA | – | – | 0.2371 | (±0.0046) | 0.0720 | (±0.0026) | |

NRWRH-MEAN | – | – | 0.2363 | (±0.0042) | 0.0712 | (±0.0024) | |

PKM-KA | 0.5133 | (±0.0235) | 0.4151 | (±0.0092) | 0.1156 | (±0.0041) | |

PKM-MAX | 0.1608 | (±0.0132) | 0.1673 | (±0.0038) | 0.0660 | (±0.0031) | |

PKM-MEAN | 0.5474 | (±0.0261) | 0.3840 | (±0.0062) | 0.0998 | (±0.0019) | |

SITAR | 0.7505 | (±0.0153) | 0.1717 | (±0.0633) | 0.0174 | (±0.0046) | |

WANG-MKL | 0.7116 | (±0.0214) | 0.6009 | (±0.0158) | 0.2217 | (±0.0124) | |

E | [GO]-[GIP] | 0.6900 | (±0.0032) | 0.2371 | (± 0.0025) | 0.0124 | (±0.0004) |

[SW]-[GIP] | 0.8429 | (±0.00540) | 0.7438 | (± 0.0189) | 0.0159 | (±0.0003) | |

BLM-KA | 0.0471 | (±0.0045) |
| (±0.0070) |
| (±0.0060) | |

BLM-MEAN | 0.0374 | (±0.0032) | 0.8099 | (±0.0063) | 0.2079 | (±0.0051) | |

KBMF2MKL | 0.6722 | (±0.0051) | 0.0757 | (±0.0049) | 0.0213 | (±0.0004) | |

KRONRLS-KA | 0.8630 | (±0.0127) | 0.7274 | (±0.0071) | 0.1829 | (±0.0034) | |

KRONRLS-MEAN | 0.8667 | (±0.0098) | 0.6917 | (±0.0062) | 0.1655 | (±0.0030) | |

KRONRLS-MKL | 0.8818 | (±0.0128) | 0.7384 | (±0.0063) | 0.2168 | (±0.0050) | |

LAPRLS-KA | 0.1920 | (±0.0014) | 0.1677 | (±0.0072) | 0.0682 | (±0.0012) | |

LAPRLS-MEAN | 0.1750 | (±0.0015) | 0.1402 | (±0.0055) | 0.0646 | (±0.0013) | |

NETLAPRLS-KA | 0.2853 | (±0.0024) | 0.1669 | (±0.0042) | 0.0670 | (±0.0018) | |

NETLAPRLS-MEAN | 0.2548 | (±0.0019) | 0.1402 | (±0.0046) | 0.0636 | (±0.0016) | |

NRWRH-KA | – | – | 0.0886 | (±0.0011) | 0.0403 | (±0.0024) | |

NRWRH-MEAN | – | – | 0.0816 | (±0.0006) | 0.0383 | (±0.0018) | |

PKM-KA | 0.2383 | (±0.0069) | 0.1905 | (±0.0047) | 0.0480 | (±0.0037) | |

PKM-MAX | 0.0762 | (±0.0011) | 0.0597 | (±0.0007) | 0.0323 | (±0.0007) | |

PKM-MEAN | 0.2161 | (±0.0072) | 0.1239 | (±0.0032) | 0.0382 | (±0.0031) | |

SITAR | 0.7558 | (±0.0160) | 0.0232 | (±0.0151) | 0.0097 | (±0.0111) | |

WANG-MKL | 0.7286 | (±0.0046) | 0.6663 | (±0.0069) | 0.1648 | (±0.0042) |

*α*=0.05 Additional file 2). In the ’new drug’ problem, KRONRLS-MKL obtained higher AUPR in the NR and GPCR datasets, while BLM-KA had higher AUPR values in the IC and Enzyme data. Both KRONRLS-MKL and BLM-KA had statistically significant higher AUPR (at

*α*=0.05; Additional file 2) than all other competing methods. In order to give an overview of the performance of the evaluated methods, an average ranking of the AUPR values obtained by all methods across the four datasets is presented in Table 4.

Average ranking over all four datasets

Prediction task | |||
---|---|---|---|

Method | Pair | Targets | Drugs |

SINGLE | 7.0 | 7.8 | 15.0 |

SINGLE | 3.3 | 3.3 | 17.5 |

BLM-KA | 16.0 | 2.5 | 1.8 |

BLM-MEAN | 17.0 | 3.0 | 4.0 |

KBMF2MKL | 7.3 | 13.5 | 13.3 |

KRONRLS-KA | 3.8 | 4.3 | 3.8 |

KRONRLS-MEAN | 3.0 | 5.8 | 5.0 |

KRONRLS-MKL | 1.0 | 4.8 | 1.5 |

LAPRLS-KA | 12.8 | 11.5 | 9.8 |

LAPRLS-MEAN | 13.3 | 12.8 | 10.5 |

NETLAPRLS-KA | 9.3 | 12.0 | 10.3 |

NETLAPRLS-MEAN | 9.0 | 13.0 | 11.3 |

NRWRH-KA | – | 15.8 | 12.0 |

NRWRH-MEAN | – | 16.8 | 13.0 |

PKM-KA | 11.8 | 8.8 | 9.8 |

PKM-MAX | 15.0 | 18.5 | 16.0 |

PKM-MEAN | 12.0 | 11.0 | 11.5 |

SITAR | 5.0 | 17.3 | 19.0 |

WANG-MKL | 6.8 | 7.8 | 5.0 |

Methods also displayed distinct computational requirements. Memory usage was stable accross all methods, except from the SVM-based algorithms, which demonstrated quadratic growth of the memory used in relation to the size of the dataset (BLM, PKM, WANG-MKL). This is in part due to the construction of the explicit pairwise kernel (see Additional file 3: Table S3). This fact turns such methods inadequate for contexts in which subsampling of pairs is undesirable.

We now discuss about computational time in the pair prediction scenario. The precomputed kernels approaches (MEAN and KA) were overall the fastest on average, with PKM-based methods requiring less time to train and test the models (∼1 min), followed by KronRLS-based and LapRLS-based algorithms(∼20 and 27 min, respectively). KBMF2MKL and BLM were the slowest, requiring more than 100 min on average at the same task. The lower computation time of the heuristic-based methods is explained by the absence of complex optimization procedures to find the kernel weights. KronRLS-MKL took a little less time than KBMF2MKL, taking an average over the four datasets of 74 min. (see Additional file 3: Table S4).

### Predictions on new drug-target interactions

In order to evaluate the quality of final predictions in a more realistic scenario, we performed an experiment similar to that described by [10, 26]. We estimate the most highly ranked drug–target pairs as most likely true interactions, and performed a search on the current release of four major databases (DrugBank [39], MATADOR [38], KEGG [57]) and ChEMBL [58]. As the training datasets were generated almost eight years ago, new interactions included in these databases will serve as a external validation set. We exclude interactions already present in the training data.

Top five predicted interactions by KRONRLS-MKL

Drug | Target | |||
---|---|---|---|---|

Nuclear Receptors | ||||

D00951 |
| hsa2099 |
| (D,C) |

D00585 |
| hsa2099 |
| (C) |

D00182 |
| hsa2099 |
| (C) |

D00105 |
| hsa5241 |
| (C) |

D00094 |
| hsa6095 |
| |

GPCR | ||||

D02358 |
| hsa154 |
| (D,C) |

D00283 |
| hsa1814 |
| (D,C,M) |

D00371 |
| hsa135 |
| (K,D,C) |

D00371 |
| hsa134 |
| (K,D,C) |

D00095 |
| hsa155 |
| (K,D,C) |

Ion Channel | ||||

D00775 |
| hsa2898 |
| (M) |

D02356 |
| hsa6833 |
| |

D00294 |
| hsa10060 |
| |

D02356 |
| hsa56660 |
| |

D00524 |
| hsa1134 |
| |

Enzyme | ||||

D00542 |
| hsa1571 |
| (D,C,M) |

D00437 |
| hsa1559 |
| (D,C,M) |

D00528 |
| hsa1549 |
| (M) |

D03670 |
| hsa1579 |
| |

D00139 |
| hsa1543 |
| (D,M) |

This is also a good example to illustrate the benefits for incorporation of multiple sources of data. Both RORa and Tretinoin do not share nodes in the training set. All targets of Tretinoin have a high GO similarity to RORa (mean value of 0.8368) despite of theirr low sequence similarity (SW mean value is 0.1563). In addition, one of the targets RORa is NR0B1 (nuclear receptor subfamily 0, group B, member 1). This protein is very close to RORa in the PPI network (similarity score of 0.90).

Concerning Ion Channel models, prediction ranked 2 and 3 indicate the interaction of Verapamil and Diazoxide with ATP-binding cassete sub-family C (ABBCC8). ABBCC8 is one of the proteins encoding the sulfonylurea receptor (SUR1) and is associated to calcium regulation and diabetes type I [60]. Interestingly, there are positive reports of Diazoxide treatments to prevent diabetes in rats [61].

### Evaluation of kernel weigths

## Conclusions

We have presented a new Multiple Kernel Learning algorithm for the bipartite link prediction problem, which is able to identify and select the most relevant information sources for DTI prediction. Most previous MKL methods mainly solve the problem of MKL when kernels are built over the same set of entities, which is not the case for the bipartite link prediction problem, e.g. drug-target networks. Regarding predictions in drug-target networks, the sampling of negative/unknown examples, as a way to cope with large data sets, is a clear limitation [2]. Our method takes advantage of the KronRLS framework to efficiently perform link prediction on data with arbitrary size.

In our experiments, the KronRLS-MKL algorithm demonstrated an interesting balance between accuracy and computational cost in relation to other approaches. It performed best in the “pair” prediciton problem and the “new target” problem. In the ’new drug’ and ’new target’ prediction tasks, BLM-KA was also top ranked. This method has a high computational cost. This arises from the fact it requires a classifier for each DT pair [2]. Moreover, it obtained poor results in the evaluation scenario to predict novel drug-protein pairs interactions.

The convex constraint estimation of kernel weights correlated well with the accuracy of a brute force pair kernel search. This non-sparse combination of kernels possibly increased the generalization of the model by reducing the bias for a specific type of kernel. This usually leads to better performance, since the model can benefit from different heterogeneous information sources in a systematic way [33]. Finally, the algorithm performance was not sensitive to class unbalance and can be trained over the whole interaction space without sacrificing performance.

## Endnotes

^{1}
http://sideeffects.embl.de/.

^{2} NRWRH cannot be applied to the pair prediction [8], by which this method was not considered in such context.

## Declarations

### Acknowledgements

The authors thank the authors of the studies by [23] for making their data publicly available. This work was supported by the Interdisciplinary Center for Clinical Research (IZKF Aachen), RWTH Aachen University Medical School, Aachen, Germany; DAAD; and Brazilian research agencies: FACEPE, CAPES and CNPq.

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

## Authors’ Affiliations

## References

- Csermely P, Korcsmáros T, Kiss HJM, London G, Nussinov R. Structure and dynamics of molecular networks: a novel paradigm of drug discovery: a comprehensive review. Pharmacol Ther. 2013; 138(3):333–408. doi:10.1016/j.pharmthera.2013.01.016.PubMed CentralView ArticlePubMedGoogle Scholar
- Ding H, Takigawa I, Mamitsuka H, Zhu S. Similarity-based machine learning methods for predicting drug-target interactions: a brief review. Brief Bioinform. 2013. doi:10.1093/bib/bbt056.
- Chen X, Yan CC, Zhang X, Zhang X, Dai F. Drug – target interaction prediction : databases, web servers and computational models. Brief Bioinform. 2015:1–17. doi:10.1093/bib/bbv066.
- Yamanishi Y. Chemogenomic approaches to infer drug–target interaction networks. Data Min Syst Biol. 2013; 939:97–113. doi:10.1007/978-1-62703-107-3.Google Scholar
- Dudek AZ, Arodz T, Gálvez J. Computational methods in developing quantitative structure-activity relationships (QSAR): a review. Comb Chem High Throughput Screen. 2006; 9(3):213–8.View ArticlePubMedGoogle Scholar
- Sawada R, Kotera M, Yamanishi Y. Benchmarking a wide range of chemical descriptors for drug-target interaction prediction using a chemogenomic approach. Mol Inform. 2014; 33(11-12):719–31. doi:10.1002/minf.201400066.Google Scholar
- Cheng F, Liu C, Jiang J, Lu W, Li W, Liu G, et al. Prediction of drug-target interactions and drug repositioning via network-based inference. PLoS Comput Biol. 2012; 8(5):1002503. doi:10.1371/journal.pcbi.1002503.View ArticleGoogle Scholar
- Chen X, Liu MX, Yan GY. Drug-target interaction prediction by random walk on the heterogeneous network. Mol BioSyst. 2012; 8(7):1970–8. doi:10.1039/c2mb00002d.View ArticlePubMedGoogle Scholar
- Yamanishi Y, Kotera M, Kanehisa M, Goto S. Drug-target interaction prediction from chemical, genomic and pharmacological data in an integrated framework. Bioinformatics (Oxford, England). 2010; 26(12):246–54. doi:10.1093/bioinformatics/btq176.View ArticleGoogle Scholar
- van Laarhoven T, Nabuurs SB, Marchiori E. Gaussian interaction profile kernels for predicting drug-target interaction. Bioinformatics (Oxford, England). 2011; 27(21):3036–43. doi:10.1093/bioinformatics/btr500.View ArticleGoogle Scholar
- Pahikkala T, Airola A, Pietila S, Shakyawar S, Szwajda A, Tang J, et al. Toward more realistic drug-target interaction predictions. Brief Bioinform. 2014. doi:10.1093/bib/bbu010.
- Pahikkala T, Airola A, Stock M, Baets BD, Waegeman W. Efficient regularized least-squares algorithms for conditional ranking on relational data. Mach Learn. 2013; 93:321–356. arXiv:1209.4825v2.View ArticleGoogle Scholar
- Gönen M, Alpaydın E. Multiple kernel learning algorithms. J Mach Learn Res. 2011; 12:2211–268.Google Scholar
- Perlman L, Gottlieb A, Atias N, Ruppin E, Sharan R. Combining drug and gene similarity measures for drug-target elucidation. J Comput Biol. 2011; 18(2):133–45. doi:10.1089/cmb.2010.0213.View ArticlePubMedGoogle Scholar
- Wang YC, Zhang CH, Deng NY, Wang Y. Kernel-based data fusion improves the drug-protein interaction prediction. Comput Biol Chem. 2011; 35(6):353–62. doi:10.1016/j.compbiolchem.2011.10.003.View ArticlePubMedGoogle Scholar
- Wang Y, Chen S, Deng N, Wang Y. Drug repositioning by kernel-based integration of molecular structure, molecular activity, and phenotype data. PLoS ONE. 2013; 8(11):78518. doi:10.1371/journal.pone.0078518.View ArticleGoogle Scholar
- Ben-Hur A, Noble WS. Kernel methods for predicting protein-protein interactions,. Bioinformatics (Oxford, England). 2005; 21 Suppl 1:38–46. doi:10.1093/bioinformatics/bti1016.View ArticleGoogle Scholar
- Hue M, Riffle M, Vert J-p, Noble WS. Large-scale prediction of protein-protein interactions from structures. BMC Bioinforma. 2010; 11:144.View ArticleGoogle Scholar
- Ammad-Ud-Din M, Georgii E, Gönen M, Laitinen T, Kallioniemi O, Wennerberg K, et al. Integrative and Personalized QSAR Analysis in Cancer by Kernelized Bayesian Matrix Factorization. J Chem Inf Model. 2014; 1. doi:10.1021/ci500152b.
- Lanckriet GR, Deng M, Cristianini N, Jordan MI, Noble WS. Kernel-based data fusion and its application to protein function prediction in yeast. In: Pacific Symposium on Biocomputing. World Scientific: 2004. p. 300–11.Google Scholar
- Yu G, Zhu H, Domeniconi C, Guo M. Integrating multiple networks for protein function prediction. BMC Syst Biol. 2015; 9(Suppl 1):3. doi:10.1186/1752-0509-9-S1-S3.View ArticleGoogle Scholar
- Gönen M, Kaski S. Kernelized Bayesian Matrix Factorization. IEEE Trans Pattern Anal Mach Intell. 2014; 36(10):2047–2060.View ArticlePubMedGoogle Scholar
- Yamanishi Y, Araki M, Gutteridge A, Honda W, Kanehisa M. Prediction of drug-target interaction networks from the integration of chemical and genomic spaces. Bioinformatics (Oxford, England). 2008; 24(13):232–40. doi:10.1093/bioinformatics/btn162.View ArticleGoogle Scholar
- Park Y, Marcotte EM. Flaws in evaluation schemes for pair-input computational predictions. Nat Methods. 2012; 9(12):1134–6. doi:10.1038/nmeth.2259.PubMed CentralView ArticlePubMedGoogle Scholar
- Xia Z, Wu LY, Zhou X, Wong STC. Semi-supervised drug-protein interaction prediction from heterogeneous biological spaces. BMC Syst Biol. 2010; 4 Suppl 2(Suppl 2):6. doi:10.1186/1752-0509-4-S2-S6.View ArticleGoogle Scholar
- Bleakley K, Yamanishi Y. Supervised prediction of drug-target interactions using bipartite local models. Bioinformatics (Oxford, England). 2009; 25(18):2397–403. doi:10.1093/bioinformatics/btp433.View ArticleGoogle Scholar
- Jacob L, Vert JP. Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics (Oxford, England). 2008; 24(19):2149–56. doi:10.1093/bioinformatics/btn409.View ArticleGoogle Scholar
- Dinuzzo F. Learning functions with kernel methods. 2011. PhD thesis, University of Pavia.Google Scholar
- Rifkin R, Yeo G, Poggio T. Regularized least-squares classification. Nato Science Series Sub Series III Computer and Systems Sciences. 2003; 190:131–54.Google Scholar
- Kimeldorf G, Wahba G. Some results on Tchebycheffian spline functions. J Math Anal Appl. 1971; 33(1):82–95.View ArticleGoogle Scholar
- Kashima H, Oyama S, Yamanishi Y, Tsuda K. On pairwise kernels: an efficient alternative and generalization analysis. Adv Data Min Knowl Disc. 2009; 5476:1030–7.Google Scholar
- Laub AJ. Matrix Analysis for Scientists and Engineers. Davis, California: SIAM; 2005, pp. 139–44.View ArticleGoogle Scholar
- Kloft M, Brefeld U, Laskov P, Sonnenburg S. Non-sparse multiple kernel learning. In: NIPS Workshop on Kernel Learning: Automatic Selection of Optimal Kernels (Vol. 4): 2008.Google Scholar
- Byrd RH, Hribar ME, Nocedal J. An interior point algorithm for large-scale nonlinear programming. SIAM J Optim. 1999; 9(4):877–900. doi:10.1137/S1052623497325107.View ArticleGoogle Scholar
- MATLAB. version 8.1.0 (R2013a). Natick, Massachusetts: The MathWorks Inc.; 2013.Google Scholar
- Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, Itoh M, et al. KEGG for linking genomes to life and the environment. Nucleic Acids Res. 2008; 36(suppl 1):480–4.Google Scholar
- Schomburg I, Chang A, Ebeling C, Gremse M, Heldt C, Huhn G, et al. BRENDA, the enzyme database: updates and major new developments. Nucleic Acids Res. 2004; 32(suppl 1):431–3.View ArticleGoogle Scholar
- Günther S, Kuhn M, Dunkel M, Campillos M, Senger C, Petsalaki E, et al. SuperTarget and Matador: resources for exploring drug-target relationships. Nucleic Acids Res. 2008; 36(suppl 1):919–22.Google Scholar
- Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, Tzur D, et al. DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res. 2008; 36(suppl 1):901–6.Google Scholar
- Eskin E, Weston J, Noble WS, Leslie CS. Mismatch String Kernels for SVM Protein Classification. In: Advances in neural information processing systems-NIPS: 2002. p. 1417–1424.Google Scholar
- Leslie CS, Eskin E, Noble WS. The spectrum kernel: a string kernel for SVM protein classification. In: Pac Symp Biocomput vol. 7: 2002. p. 566–575.Google Scholar
- Palme J, Hochreiter S, Bodenhofer U. KeBABS - an R package for kernel-based analysis of biological sequences. Bioinformatics. 2015; 31(15):2574–2576. doi:10.1093/bioinformatics/btv176.View ArticlePubMedGoogle Scholar
- Smedley D, Haider S, Durinck S, Al E. The BioMart community portal: an innovative alternative to large, centralized data repositories. Nucleic Acids Res. 2015. doi:10.1093/nar/gkv350.
- Ovaska K, Laakso M, Hautaniemi S. Fast Gene Ontology based clustering for microarray experiments. BioData Min. 2008; 1(1):11.PubMed CentralView ArticlePubMedGoogle Scholar
- Resnik P. Semantic Similarity in a Taxonomy: An Information Based Measure and Its Application to Problems of Ambiguity in Natural Language. J Artif Intell Res. 1999; 11:95–130.Google Scholar
- Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M. BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 2006; 34(suppl 1):535–9.View ArticleGoogle Scholar
- Hattori M, Okuno Y, Goto S, Kanehisa M. Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. J Am Ceram Soc. 2003; 125(39):11853–65.Google Scholar
- Klambauer G, Wischenbart M, Mahr M, Unterthiner T, Mayr A, Hochreiter S. Rchemcpp: a web service for structural analoging in ChEMBL, Drugbank and the Connectivity Map. Bioinformatics. 2015.
*Advance access*doi:10.1093/bioinformatics/btv373. - Kashima H, Tsuda K, Inokuchi A. Marginalized kernels between labeled graphs. In: ICML, vol. 3: 2003. p. 321–328.Google Scholar
- Ralaivola L, Swamidass SJ, Saigo H, Baldi P. Graph kernels for chemical informatics. Neural Netw. 2005; 18(8):1093–110. doi:10.1016/j.neunet.2005.07.009.View ArticlePubMedGoogle Scholar
- Takarabe M, Kotera M, Nishimura Y, Goto S, Yamanishi Y. Drug target prediction using adverse event report systems: A pharmacogenomic approach. Bioinformatics. 2012; 28(18):611–8. doi:10.1093/bioinformatics/bts413.View ArticleGoogle Scholar
- Kuhn M, Campillos M, Letunic I, Jensen LJ, Bork P. A side effect resource to capture phenotypic effects of drugs. Mol Syst Biol. 2010; 6(1):343.PubMed CentralPubMedGoogle Scholar
- Qiu S, Lane T. A framework for multiple kernel support vector regression and its applications to siRNA efficacy prediction. IEEE/ACM Trans Comput Biol Bioinf. 2009; 6(2):190–9.View ArticleGoogle Scholar
- Cristianini N, Kandola J, Elisseeff A, Shawe-Taylor J. On kernel-target alignment. In: Advances in Neural Information Processing Systems 14. Cambridge MA: MIT Press: 2002. p. 367–73.Google Scholar
- Gönen M. Predicting drug-target interactions from chemical and genomic kernels using Bayesian matrix factorization. Bioinformatics (Oxford, England). 2012; 28(18):2304–10. doi:10.1093/bioinformatics/bts360.View ArticleGoogle Scholar
- Davis J, Goadrich M. The relationship between Precision-Recall and ROC curves. In: Proceedings of the 23rd international conference on Machine learning - ICML ’06. New York, NY, USA: ACM: 2006. p. 233–40. doi:10.1145/1143844.1143874.Google Scholar
- Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000; 28(1):27–30.PubMed CentralView ArticlePubMedGoogle Scholar
- Bento AP, Gaulton A, Hersey A, Bellis LJ, Chambers J, Davies M, et al. The ChEMBL bioactivity database: an update. Nucleic Acids Res. 2014; 42(D1):1083–90. doi:10.1093/nar/gkt1031.View ArticleGoogle Scholar
- Webster GF. Topical tretinoin in acne therapy. J Am Acad Dermatol. 1998; 39(2):38–44.View ArticleGoogle Scholar
- REIS A, VELHO G. Sulfonylurea receptor-1 (sur1): Genetic and metabolic evidences for a role in the susceptibility to type 2 diabetes mellitus. Diabetes Metab. 2002; 28(1):14–19.PubMedGoogle Scholar
- Huang Q, Bu S, Yu Y, Guo Z, Ghatnekar G, Bu M, et al. Diazoxide prevents diabetes through inhibiting pancreatic
*β*-cells from apoptosis via bcl-2/bax rate and p38-*β*mitogen-activated protein kinase. Endocrinology. 2007; 148(1):81–91.View ArticlePubMedGoogle Scholar