Predicting potential drug-drug interactions by integrating chemical, biological, phenotypic and network data

Background Drug-drug interactions (DDIs) are one of the major concerns in drug discovery. Accurate prediction of potential DDIs can help to reduce unexpected interactions in the entire lifecycle of drugs, and are important for the drug safety surveillance. Results Since many DDIs are not detected or observed in clinical trials, this work is aimed to predict unobserved or undetected DDIs. In this paper, we collect a variety of drug data that may influence drug-drug interactions, i.e., drug substructure data, drug target data, drug enzyme data, drug transporter data, drug pathway data, drug indication data, drug side effect data, drug off side effect data and known drug-drug interactions. We adopt three representative methods: the neighbor recommender method, the random walk method and the matrix perturbation method to build prediction models based on different data. Thus, we evaluate the usefulness of different information sources for the DDI prediction. Further, we present flexible frames of integrating different models with suitable ensemble rules, including weighted average ensemble rule and classifier ensemble rule, and develop ensemble models to achieve better performances. Conclusions The experiments demonstrate that different data sources provide diverse information, and the DDI network based on known DDIs is one of most important information for DDI prediction. The ensemble methods can produce better performances than individual methods, and outperform existing state-of-the-art methods. The datasets and source codes are available at https://github.com/zw9977129/drug-drug-interaction/. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1415-9) contains supplementary material, which is available to authorized users.


Background
Drugs may interact when multiple drugs are coprescribed. Drug-drug interactions (DDIs) may exert different effects, and adverse drug-drug interactions can lead to patient death or drug withdrawal [1][2][3][4]. DDI prediction can help to reduce unexpected effects as well as optimize the treatments in the drug design, clinical trials, and post-marketing surveillance.
Silico methods, in vitro methods, vivo experiments and clinical trials can identify DDIs, but they are laborintensive and time-consuming. Statistical methods [5][6][7][8][9] were developed to detect whether the combination of two drugs is associated with an increased risk of certain adverse events, by analyzing spontaneous reports, insurance claim databases and electronic medical records.
In recent years, researchers collected drug data from literatures, reports and etc., and constructed public databases [10][11][12][13][14][15][16][17] which facilitate the development of computational prediction methods. To the best of our knowledge, a great number of machine learning methods were proposed to predict DDIs. Existing methods are roughly classified into two types: similarity-based methods and classification-based methods. The similarity-based methods employed the assumption that similar drugs may interact with a same drug. Gottlieb et al. [18] built prediction models by considering seven kinds of drugdrug similarities. Vilar et al. proposed the substructure similarity-based prediction method [19] and the interaction profile fingerprint similarity-based prediction method [20]. Li et al. [21] developed a Bayesian network of combining drug molecular similarity and phenotypic similarity to predict the combination efficacy of drugs. By using drug-drug similarity indirectly, Park et al. [22] applied a random walk with restart to simulate signaling propagation from drug targets and make predictions; Zhang et al. [23] adopted the label propagation method to build prediction models based on drug chemical substructures, drug side effects and drug and off side effects. Classification-based methods formulate the drug-drug prediction as binary classification tasks. Cami et al. [24] represented drug-drug pairs as feature vectors, and use presence or absence of interactions as labels, and then built logistic regression models. Cheng et al. [25] applied five predictive models (naive Bayes, decision tree, k-nearest neighbor, logistic regression, and support vector machine) to build prediction models. Besides similarity-based methods and classification-based methods, there are several methods designed for specific purposes. Takarabe et al. [26] constructed a multi-level drug-drug interaction network, and analyzed, characterized and classified adverse drug-drug interactions. Huang et al. [27] developed a target-center system for each drug, which consists of drug targets and their neighbors in the PPI network and human tissue gene expression.
Since many DDIs are not detected or observed in clinical trials, this work is aimed to predict undetected or unobserved drug-drug interactions. Classification methods utilize two classes of data: annotated drug-drug interaction pairs and annotated non-interaction pairs to build classification models. In the binary classification, known interactions are used as positive instances, but other drug pairs may have undetected or unobserved interactions, which need to be predicted. In machine learning, similar problems are transformed as semi-supervised learning tasks. For this reason, we build DDI prediction models under the frame of semi-supervised learning.
In this paper, we collect drug substructure data, drug target data, drug enzyme data, drug transporter data, drug pathway data, drug indication data, drug side effect data, drug off side effect data and known drug-drug interactions. Multi-source data provide biological information, chemical information, phenotypic information and known interactions to characterize drug-drug interactions. To make use of diverse information, we adopt three representative methods, i.e., the neighbor recommender method [28,29], the random walk method and the matrix perturbation method [30], to build different prediction models. According to performances of prediction models, we evaluate the usefulness of different information sources for the DDI prediction. The study reveals that DDI network based on known DDIs can provide the important information for DDI prediction. Further, we present flexible frames of integrating different models with suitable ensemble rules, including the weighted average ensemble rule and the classifier ensemble rule, and develop ensemble models to achieve better performances. The experiments demonstrate that ensemble methods can combine diverse information to produce the high-accuracy performances, and outperform existing state-of-the-art methods.

Datasets
The FDA Adverse Event Reporting System (FAERS) is a database which contains adverse event reports and medication error reports submitted to FDA. Tatonetti processed adverse event reports in the AERS, and constructed a database named "TWOSIDES" [31] which contains side effects caused by the combination of drugs. There are 645 drugs and 63,473 distinct pairwise DDIs from unsafe co-prescriptions in TWOSIDES.
The biological information, chemical information and phenotypic information about drugs may be associated with drug-drug interactions. PubChem Compound database [12,15] can provide drug structures. DrugBank database [10,11,16,17] is a bioinformatics resource with drug targets, drug enzymes and drug transporters. KEGG database [13] is an information resource for protein pathways. Drug targets are mapped to KEGG to obtain drug pathways. SIDER database [14] contains 1430 drugs and 5880 side effect terms which are compiled from public documents and package inserts. Drug side effects and indications are available in SIDER. OFF-SIDES database [31] contains 1332 drugs and 10,093 "off-label" side effects.
We map drugs in TWOSIDES to SIDER, OFFSIDES, PubChem and DrugBank. As shown in Table 1, we obtain 548 drugs and 48,584 pairwise DDIs, and substructure data, target data, enzyme data, transporter data, pathway data, indication data, side effect data, off side effect data of these drugs are available. Based on the data, we conduct the comprehensive study to evaluate the usefulness of different data sources for DDI prediction, and discuss how to combine them for the highaccuracy prediction.

DDI prediction based on multi-source data
Multi-source data provide different information for the DDI prediction. Here, we describe how to build models based on different data.
Drug-drug similarities bring important clues for the DDI prediction, and different similarities can be extracted from multi-source data. Drug data are classified as four types, i.e., chemical data, biological data, phenotypic data and the drug-drug interaction network data (formed by known drug-drug interactions). On one hand, we calculate the drug-drug similarities in the biological space, chemical space and phenotypic space, by using drug substructures, drug targets, drug enzymes, drug transporters, drug pathways, drug indications, drug side effects and drug off side effects. On the other hand, we calculate the drug-drug similarities in the drug-drug interaction network. In order to utilize drug-drug similarities, we consider two representative methods [28,32]: the neighbor recommender method and random walk method, and build DDI prediction models.
We take drugs as nodes and known interactions as edges in the DDI network, and transform the DDI prediction problem as a missing link prediction task. The missing link prediction is an important topic of theoretical interest and practical significance in the complex network [33]. Recently, a novel method named "matrix perturbation method" [30] is proposed, which utilize the network to predict missing links (unobserved DDIs). The studies demonstrated that this method outperforms other missing link prediction methods. Therefore, we adopt the matrix perturbation method to predict potential DDIs based on the DDI network.
In the following context, Similarity-based DDI prediction based on multi-source data presents how to extract different drug-drug similarities from different data and how to develop similarity-based models; Matrix perturbation method for DDI prediction presents the missing link prediction method (matrix perturbation method).
Similarity-based DDI prediction based on multi-source data Drug-drug similarity based on biological data, chemical data and phenotypic data A drug can be represented as a binary feature vector, by using drug substructures, drug targets, drug enzymes, drug transporters, drug pathways, drug indications, drug side effects, or drug off side effects. Dimensions of the feature vector respond to presence or absence of components with values 1 or 0. For example, there are 881 types of drug substructures, and a drug can be transformed as an 881-dimensional vector.
Given a drug x and a drug y, their feature vectors are V x and V y , and the similarity between x and y is then calculated by Jaccard formula: where M 11 is the number of dimensions where V x and V y both have a value of 1; M 01 is the number of dimensions where V x has a value of 0 and V y has a value of 1; M 10 is the number of dimensions where V x has a value of 1 and V y has a value of 0. Therefore, we can obtain 8 drug feature-based drug-drug similarities, including substructure-based similarity, targetbased similarity, enzyme-based similarity, transporter-based similarity, pathway-based similarity, indication-based similarity, side effect-based similarity and off side effect-based similarity.

Drug-drug similarity based on known drug-drug interactions
By considering drugs as nodes and interaction as edges, known DDIs can form a DDI network. We calculate drugdrug similarities in the DDI network [33]. The adjacent matrix of the DDI network is denoted as A = (a ij ), and denotes the set of nodes linked to node. Several similarities between a drug x and a drug y can be defined.
Common neighbor similarity S CN (x, y) takes the number of common neighbors between two nodes, Adamic-Adar similarity S AA (x, y) is the counting of common neighbors by assigning the less connected neighbors more weights, Resource Allocation similarity S RA (x, y) is based on the complex network resource allocation dynamics, Katz similarity S Katz (x, y) sums over the collection of paths with exponential damping according to path lengths, where α is a parameter, and I is the identity matrix. |α| < 1/λ max is the condition for the compact form, and λ max is the largest eigenvalue of A. Average Commute Time similarity S ACT (x, y) is the average number of steps required by a random walker starting from one node to reach another, where L + is the pseudoinverse of the Laplacian matrix for the network. The random walk with restart similarity S RWR (x, y) is the probability that a random walker starting from an initial node x reaches y. The walker moves with the probability μ of returning to the initial node and the probability 1 − μ going to adjacent nodes, is the normalized transition matrix of the adjacency matrix A, and D is the degree matrix of A. Therefore, we obtain 6 DDI network-based drug-drug similarities, including common neighbor similarity, Adamic-Adar similarity, resource allocation similarity, Katz similarity, average commute time similarity and random walk with restart similarity.

Similarity-based methods for DDI prediction
Given a N × N similarity matrix S = (s ij ) for N drugs, known pairwise DDIs are denoted by an adjacent matrix A = (a ij ). The neighbor recommender method and the random walk method are briefly introduced as follows.
The neighbor recommender method [28,34] is one of most popular methods in recommender systems, which recommends items (movies, music, books, et al.) to users, or predicts the 'rating' or 'preference' that users would give to items. The neighbor recommender method takes the weighted average information of neighbors for prediction. Y ij = ∑ k = 1,k ≠ j N s ik a kj /∑ k = 1,k ≠ j N s ik is calculated for drug i and drug j which don't have known interaction, where s ik is the similarity between drug i and drug k , and a kj = 1 or 0 means interaction or noninteraction between drug k and drug j . We can calculate Y ji in this same way. The probability that drug i interacts with drug j score ji = score ij = Y ij + Y ji .
A random walk is a mathematical formalization of a path that consists of a succession of random steps. There are a great number of successful applications in the network analysis [35][36][37][38]. In random walk, a random walker starts from an initial node, and moves to neighbors with the probability μ and moves back to the initial node with the probability 1 − μ. The similarity matrix S is normalized as W = D − 1 S, where D is the degree matrix of S. The matrix form of the update is summarized as Y = μWY + (1 − μ)A, and it will converge to the solution:

Matrix perturbation method for DDI prediction
The matrix perturbation method assumes that random removal of a small proportion of links from a network will not change the network structure [30], which is reflected by eigenvectors of its adjacent matrix.
Let's introduce notations for the matrix perturbation method. Given the drug-drug interaction network G(V, E), V is the set of nodes, and E is the set of edges. The adjacent matrix is A = (a ij ), and the eigenvectors and eigenvalues of the adjacent matrix are denoted by x k and λ k , k = 1, 2, ⋯, N.
A fraction of links ΔE are randomly removed from E, and the set of remaining links E R = E − ΔE. Thus, we obtain the new network G R (V, E R ) with the adjacent matrix A R = A − ΔA, where ΔA is the adjacent matrix for removed links. Then, we calculate the eigenvectors x k R and eigenvalues λ k R of A R , k = 1, 2, ⋯, N. We denote that A = A R + ΔA, x k = x k R + Δx k and λ k = λ k R + Δλ k . In the network G(V, E), the relation of eigenvectors, eigenvalues and the adjacent matrix is written as, We estimate eigenvalues λ k = λ k R + Δλ k , and keep eigenvectors x k R unchanged. Then, we reconstruct the adjacent matrix of G(V, E) by summing eigenvalues and eigenvectors, The probability that drug i interacts with drug j score ij = score ji = Ã ij + Ã ji . More details are available in the publication [30].

Combining multi-source data for DDI prediction
Since we build different prediction models based on different data, it is natural to combine them for better performance. Ensemble learning is a useful technique that aggregates multiple machine learning models to achieve overall high prediction accuracy as well as good generalization [39]. Ensemble learning has been applied to a great number of applications in bioinformatics [29,40,41]. An ensemble learning system usually has two components: base predictors and ensemble rules. In our ensemble system, we adopt heterogeneous models {f i } i = 1 n based on multi-source data as base predictors. To integrate base predictors, we consider two popular ensemble rules: the weighted average ensemble rule and the classifier ensemble rule. Figure 1

Evaluation metrics
We adopt k-fold cross validation (k-CV) to evaluate prediction models. Known interactions are randomly split into k subsets with equal size. In each fold, one subset is used as the testing set; 80 and 20% of other interactions (k-1 subsets) are used as the training set and validation set. Base predictors are constructed on the training set, and parameters in the ensemble system are tuned by using the validation set. Then, the ensemble model makes predictions for the testing set. This procedure is repeated until each subset is ever used for testing. To avoid the bias of data split, we implement 20 independent runs of k-CV for each model, and average performances are adopted.
Here, we adopt several evaluation metrics to measure performances of prediction models, i.e., accuracy (ACC), precision, recall, F-measure (F), area under ROC curve (AUC) and the area under the precision-recall curve (AUPR). In our task, DDIs take a small proportion of all drug pairs, and thus AUPR, which takes into account both recall and precision, is used as the primary evaluation metric.

Performances of different models based on multi-source data
We extract 14 different similarities from multi-source data, and respectively adopt the neighbor recommender method and the random walk method to build 28 similarity-based prediction models. By formulating the original problem as a missing link prediction task, we adopt the matrix perturbation method to build the Fig. 1 The scheme of integrating multi-source data for DDI prediction prediction model based on known DDIs. Therefore, we construct 29 prediction models based on multi-source data. Since different models utilize different information for DDI prediction, performances of the models are indicators for the usefulness of information sources.
As shown in Table 2, these models produce different performances on the benchmark dataset in the cross validation. Among eight feature-based similarities, substructure similarity, side effect similarity, off side effect similarity and indication similarity lead to better performances than other similarities, indicating that drug substructures, drug side effects, drug off side effects and drug indications provide important information for the drug-drug interactions. Among the network topologybased similarities, RA and RWR can produce better results. The comparison shows that drug feature-based similarities as well as topological similarities can provide useful information to characterize drug-drug interactions and lead to useful models. The matrix perturbation method utilizes the DDI network as a whole to make predictions. Among all prediction models, the matrix perturbation method produces the best results, indicating that known DDIs provide one of most useful information to identify potential DDIs.
We also conduct 20 runs of 3-CV to evaluate prediction models, and results are shown in Table 3. The comparison between 3-CV results and 5-CV results demonstrates that prediction models have different performances under different experimental conditions, and a model cannot produce the best results in all cases. For example, the matrix perturbation method assumes that the topology of a network will not change if only a small proportion of links are removed. In 3-CV, more links are kept for testing, and the predictive power may be affected. Therefore, the matrix perturbation method is not the best predictor in 3-CV experiments. For this reason, we integrate different models to make robust predictions.

Performances of ensemble models
Based on multi-source data, we construct 29 prediction models including 28 similarity-based models and one perturbation matrix model. We use these models as base predictors, and respectively adopt the weighted average ensemble rule and the classifier ensemble rule to build ensemble models. We apply the genetic algorithm (GA) to determine optimal weights in the weighted average ensemble models. GA is implemented by using python package "deap".
The initial population has 100 chromosomes. In the population update, the elitist strategy is used for the selection operator, and default parameters are adopted for the mutation probability and crossover probability. The population update terminates when the change of best fitness scores is less than the default value of 1E-6 or the max generation number of 50 is reached.
To build classifier ensemble models, we train the logistic regression classifier to combine outputs of base predictors. The logistic regression is implemented by using python package "scikit-learn". Default parameters are used; L1 regularization and L2 regularization are respectively considered. In the following context, classifier ensembles models refer to logistic regression ensemble models. Table 4 shows 3-CV results and 5-CV results. In 5-CV experiments, the weighted average ensemble model, the classifier ensemble model (L1 regularization) and classifier ensemble model (L2 regularization) produce the AUPR scores of 0.795, 0.807 and 0.806; in 3-CV experiments, three models yield the AUPR scores of 0.832, 0.841 and 0.839. The comparison demonstrates that the classifier ensemble models produce better results than the weighted average ensemble model. The possible reason is that the weighted average ensemble method uses the linear function for ensemble learning and classifier ensemble method trains nonlinear function. Moreover, the classifier ensemble method with L1 regularization can produce better results than the classifier ensemble method with L2 regularization, for L1 regularization can produce the sparse model and enhance the generalization capability. Clearly, ensemble models produce better results than base predictors. In 5-CV experiments, the classifier ensemble method (L1) can improve the AUPR score of 0.782 (produced by the matrix perturbation model) to 0.806. Since we implement 20 runs of 5-CV for ensemble models and matrix perturbation models, we conduct t-test to test the difference of their performances in terms of AUPR score, and the statistical significance is observed (p-value =1.21E-39). In 3-CV experiments, the classifier ensemble method (L1) can enhance the AUPR score from 0.820 (produced by the indication-based random walk model) to 0.839, and we also observe the statistical significance of improvement between the classifier ensemble model (L1) and the indicationbased random walk model (p-value =3.12E-41).
Further, we investigate into details of the ensemble models based on 3-CV results and 5-CV results. Firstly, we analyze weights in the weighted average ensemble models determined by GA. There are 100 sets of weights for 20 runs of 5-CV; there are 60 sets of weights for 20 runs of 3-CV. We calculate the average weights for each predictor, and visualize the normalized weights in Fig. 2. Base predictors with high AURP scores may be assigned great weights. For example, the matrix perturbation model produces best 5-CV results, and thus gains the greatest weight in the ensemble models. We observe that several base predictors (such as RWR-based random  walk model) are not used in the ensemble models. The classifier ensemble method (L1) produces the sparse models, which integrate the subset of base predictors. According to 5-CV results, several base predictors (index: 1,10,15,21,22,27,28,29) are not used in the classifier ensemble model. In the view of computer science, multi-source data provide diverse information but also bring the redundant information. Combining base predictors is a combinatorial optimization problem. Therefore, the weighted average ensemble method and the classifier ensemble method (L1) use a subset of base predictors to develop ensemble models.

Comparison with existing state-of-the-art methods
Since this work is designed to predict undetected or unobserved DDIs, we adopt methods of the same type for comparison. Vilar used known interactions of most similar drugs to predict DDIs, and proposed the substructure similarity-based model [19] and interaction profile fingerprint (also known as common neighbors, CN) similarity-based model [20]. Zhang [23] adopted the label propagation algorithm to build substructure similarity-based model, side effect similarity-based model and off side effect similarity-based model. We name these models as Vilar's substructure-based model, Vilar's CN index-based model, substructure-based label propagation model, side effect-based label propagation model and off side effect-based label propagation model. These prediction models are implemented according to details in publications. All models are evaluated by 20 runs of cross validation under the same conditions.
As shown in Table 5, our ensemble methods produce better results than other state-of-the-art methods in terms of different metrics. The classifier ensemble method (L1) produces the best results in both 3-CV experiments and 5-CV experiments. Further, we adopt t-test to compare the ensemble methods with other state-of-the-art methods in terms of AUPR scores. Table 6 demonstrates that our ensemble methods produce significantly better results (p < 0.05 in terms of AUPR scores).
In one fold of 5-fold cross validation, we adopt 80% interactions (38,868) as the training set and the validations set, and use other interactions (9716) as the testing set. We build the prediction model based on the training set and the validations set, and then make predictions for non-interaction drug-drug pairs (111,010) to identify testing interactions (9716). Based on the result, we respectively count how many testing DDIs are identified in the top 10,000 predictions and top 15,000 predictions. As shown in Fig. 3, the classifier ensemble model (L1) can identify 7027 testing interactions when verifying top 10,000 predictions, and identify 7842 testing interactions when verifying top 15,000 predictions. In general, our ensemble models can identify 300~400 more interactions than other methods do.

Predicted novel interactions
In this paper, we use the benchmark dataset with 548 drugs and 48,584 pairwise drug-drug interactions from TWOSIDES database. There are 149,878 drugdrug pairs between these drugs. Besides 48,584 known pairwise DDIs, 101294 remaining drug pairs ("noninteraction pairs") may contain undetected or  Table 7 lists top 20 novel interactions predicted by our method, and a significant fraction of novel interactions (7 out of 20) are confirmed in DrugBank database. Further, we compare the ensemble model and the matrix perturbation model by testing their capability of finding out novel interactions. The top 1000 novel interactions predicted by the ensemble model and the matrix perturbation model are provided in supplementary material (see Additional file 1). For each method, we find evidences in DrugBank to confirm novel interactions. If we look up all 1000 interactions of the matrix perturbation model and the ensemble model, we can confirm 297 novel interactions and 318 novel interactions respectively (252 common interactions are shared). Further, based on the top 1000 novel interactions, we use the number of predictions as X-axis and the number of confirmed novel interactions in the predictions as Y-axis, and then visualize performances of two models (see Additional file 2). In general, the ensemble model can find out more novel interactions than the matrix perturbation model, indicating the usefulness of integrating multi-source data.

Conclusions
The prediction of drug-drug interactions is an important task in the drug discovery, which helps to reduce potential risks and understand the mechanism of  drug-drug interactions. This paper collects a wide variety of drug data, and designs the models based on multi-source data for the DDI prediction. Compared with existing DDI prediction methods, our methods produce better performances, and the statistical analysis demonstrates that the performance improvements achieved by our method are statistically significant. In conclusion, the proposed methods are promising for the DDI prediction.