CNN-DDI: a learning-based method for predicting drug–drug interactions using convolution neural networks

Background Drug–drug interactions (DDIs) are the reactions between drugs. They are compartmentalized into three types: synergistic, antagonistic and no reaction. As a rapidly developing technology, predicting DDIs-associated events is getting more and more attention and application in drug development and disease diagnosis fields. In this work, we study not only whether the two drugs interact, but also specific interaction types. And we propose a learning-based method using convolution neural networks to learn feature representations and predict DDIs. Results In this paper, we proposed a novel algorithm using a CNN architecture, named CNN-DDI, to predict drug–drug interactions. First, we extract feature interactions from drug categories, targets, pathways and enzymes as feature vectors and employ the Jaccard similarity as the measurement of drugs similarity. Then, based on the representation of features, we build a new convolution neural network as the DDIs’ predictor. Conclusion The experimental results indicate that drug categories is effective as a new feature type applied to CNN-DDI method. And using multiple features is more informative and more effective than single feature. It can be concluded that CNN-DDI has more superiority than other existing algorithms on task of predicting DDIs.

The task of predicting DDIs is vitally interrelated with similarities between drugs. The fundamental hypothesis of this task is that if drug A and drug B interact each other, causing a specific biological impact, drugs have similarity to drug A (or drug B) are possible to interact with drug B (or drug A) and causes same effect [16].
Cami et al. [17] utilized a logistic regression model to solve the DDIs' problem. On this basis, Gottlied et al. [18] exploited more different drug-drug similarities and proposed another logistic regression model. Two similarity-based models based on drug interaction profile fingerprints were proposed [16,19] and a heterogeneous network-assisted inference framework was introduced by Cheng et al. [20]. Some other algorithms were extended on the task of DDIs' prediction. For instance, TMFUF [12] is based on the triple matrix factorization, DDINMF [21] is based on the semi-nonnegative matrix factorization. Three algorithms were proposed in [22], including neighbor recommender algorithm, random walk algorithm, and the matrix perturbation algorithm. Further, they proposed a novel algorithms named 'Manifold Regularized Matrix Factorization' . In 2019, SFLLN was proposed in [23] based on linear neighborhood regularization using four types of drug features. It is a sparse feature learning ensemble method.
DeepDDI was proposed [10] to classify the DDIs' events from DrugBank [24]. Deep-DDI calculates features' similarity and reduces features' dimension by principal component analysis (PCA). Lee et al. [25] concentrated on concrete types of two drugs, not simply whether they interact or not. DDIMDL [11] is a multimodal deep neural network algorithm, which combines diverse drug features that predicting 65 types of DDI events.
Convolutional neural network (CNN) is a typical artificial neural network based on supervised learning, which has good performance on computer vision filed [26]. And it develops more network structures from CNN. They have been used extensively in bioinformatics [27,28]. Many studies apply deep learning method in the task of DDIs' prediction, and most of them choose deep neural network (DNN). But compared with deep neural network, CNN performs better in feature learning and can alleviate the degree of over-fitting effectively. Considering features selected contain noise and advantages of CNN, we decide to use CNN to solve the problem of DDIs' prediction.
In this paper, we propose a novel algorithm based on CNN, named CNN-DDI, to learn the best combination of drug features and predict DDI-associated events. CNN-DDI method contains two parts. One part is a feature selection framework. We utilize drug categories as another feature, and choose the best combination form of drug features. The other part is a CNN-based DDI's predictor. We utilize a new CNN to predict DDIassociated events based on features pairs selected from feature selection framework.

Evaluation criteria
Predicting DDI' events can be regarded as a multi-label classification problem. Therefore, the prediction results are divided into four kinds, true positive (TP), false positive (FP), true negative (TN) and false negative (FN). In addition, precision and recall criteria are common used evaluation criteria, which can evaluate the accuracy of results. Precision means in the classified positive samples, the proportion of TP samples. And recall means in all positive samples, the proportion of correct samples classified. The expressions are as follows: Based on precision and recall, Accuracy, F1-score, area under the precision-recall curve (AUPR) and area under the ROC curve (AUC) are utilized to evaluate the performance of the algorithm.
In the study, we adopt Accuracy, F1-score, micro-averaged AUPR and micro-averaged AUC as the evaluation metrics. Micro-averaged metrics means metrics are averaged after getting the results of all classes.

Performance
To analyze the effect of different similarity algorithms on performance of CNN-DDI, we utilize cosine similarity, Jaccard similarity and Gaussian similarity to calculate features' similarities. Table 1 shows the experimental results of our method on three similarity measures. It can be seen that using different similarity measures exhibits similar properties. CNN-DDI is robust to these three similarity measures, so Jaccard similarity measure is used in the experiments.
To demonstrate the superiority of drug categories and influence of different combination forms, we further test the performance of CNN-DDI model with different features' types. The experimental results are shown in Table 2. As for one feature, CNN-DDI using drug categories as the feature performs best, the AUPR score using drug categories is 0.9139, which is quite higher than the second highest score produced by drug targets (the value is 0.8470). Similarly, using drug categories achieves the highest scores of other five evaluation metrics. So the drug category is effective as a new feature type applied to CNN-DDI method. On the whole, using multiple features is informative and helps CNN-DDI perform better than single feature. The combination of four features has the highest AUPR score (the value is 0.9251) in all combinations. Thus it can be proved that every feature improves the performance of CNN-DDI to a certain extend.

Comparison experiments
We evaluate the effectiveness of our algorithm and four state-of-art algorithms. The four algorithms are random forest (RF), gradient boosting decision tree (GBDT), logistic regression (LR) and K-nearest neighbor (KNN). We measure feature similarities in the same manner. In the experiment, we set the decision tress number of RF to be 100 and the neighbor number of KNN to be 4. Table 3 shows CNN-DDI algorithm has better performance than other four methods in these 6 accuracy assessments. The score of ACC is 0.8871, it is better than the score of GBDT, RF, KNN and LR (0.8327, 0.7837, 0.7581 and 0.7558 respectively). And other evaluation metrics achieved by CNN-DDI are 0.9251, 0.9980, 0.7496, 0.8556 and 0.7220, respectively, which are significantly higher than the sores of other methods. The LR algorithm gets the worst performance, whose scores are 0.7558, 0.8087, 0.9950, 0.3894, 0.5617 and 0.3331, respectively. Compared with GBDT, which gets the second best performance, the ACC score is 0.8871, increased by 6.53%. And the score of AUPR is 0.9251, increased by 4.79%, all of other evaluation metrics have been improved in varying degrees.
And we compare our algorithm with DDIMDL. Considering DDIMDL using different features, we retrain DDIMDL model with features selected by CNN-DDI. As shown in Table 4, DDIMDL represents the original algorithm proposed by original paper [11]. DDIMDL* represents DDIMDL with features selected by CNN-DDI. It can be concluded that the drug category is effective as a new feature type, and CNN-DDI still performs better than DDIMDL in the case of using the same features.

Conclusions
In the work, we proposed a novel semi-supervised algorithm using a CNN architecture, named CNN-DDI, to predict drug-drug interactions. First, we extract feature interactions from drug categories, targets, pathways and enzymes as feature vectors. Then, based on the representation of feature, we proposed a new convolution neural network as the predictor of DDIs-associated events. The predictor consists of five convolutional layers, two full-connected layers and a softmax layer based on CNN.
To demonstrate the performance of our method, we compare it with other startof-the-art methods. The evaluation shows our method, CNN-DDI, has better performance than other existing state-of-art measures. Meanwhile, we discuss the contribution of combinational features and each single feature. Overall, CNN-DDI has more advantages on predicting DDIs' events. In consideration of consuming longer time, we will try to improve the efficiency of CNN-DDI in the future.

Methods
We propose a novel method called CNN-DDI to predict DDI-associated events. The method mainly contain two parts, combinational features selection module and CNN-based prediction module. As shown in Fig. 1, we combine four drug features and obtain a low dimensional as the CNN model inputs. Then a deep CNN model is built to calculate the probability of DDIs' types. In this section, we will thoroughly expound the structure and principle of CNN-DDI.

Data collection
DDIMDL proposed a data set that classifying DDIs' events into 65 types, not simply focusing on whether they interact or not. The data set includes 572 drugs and 74,528 DDIs-associated events collected from DrugBank. Which is a manually collected data source that provides drugs comprehensive information and unified syntax in describing DDIs.
To extend the information of DDIMDL, we extract drugs categories from Drug-Bank. 572 drugs have 1622 types of categories in DDIMDL.
In our paper, cross validation is utilized to demonstrate the effectiveness of our method. We set the fold number of cross validation is 5. In our experiments, we randomly divide the data set into five subsets, choose four subsets as the train set and another one as the test set. We test on the data set five times following the above steps, and the final result is the average of multiple results.

Drug-drug similarity
There are three common similarity measures, Jaccard similarity, cosine similarity and Gaussian similarity. To better measure the drug feature vectors' similarity, we analyze the difference of measures' results. Jaccard similarity calculates the intersection of components and the union. Gaussian similarity utilizes the Gaussian kernel function. And cosine similarity is used to calculate the cosine between two vectors in an inner product space [29]. Jaccard similarity can be calculated as follows: (3) Fig. 1 The framework of CNN-DDI algorithm.The algorithm mainly contain two parts, combinational features selection module and CNN-based prediction module. (1)Firstly, features vectors are selected from feature selection module using the four types of features. We encode features and generate binary vectors, each value of the vector represents whether the component exists. Then we calculate Jaccard similarity to measure the correlation between drugs. In this way, we get features vectors as the input of the prediction module.Secondly, features vectors are inputted into prediction module. The prediction module based on CNN consists of convolutional layers, full-connecteed layers and a softmax layer.Convolutional layers can enhance the ability of learning deep characteristics. Through the DDIs' predictor, we get the probabilities of all DDIs-associated events' types and select the event with the highest probability where x i and x j are feature vectors of two drugs, X and Y are the vector sets respectively. |X ∪ Y | represents the union of X and Y, |X ∩ Y | represents the intersection. Further, M represents the number of elements. Subscript 11 means the elements where x i and x j are 1, 01 means elements where x i is 0 and x j is 1, 10 means elements where x i is 1 and x j is 0.
Cosine similarity can be calculated as follows: where || · || represents the Euclidean norm. Gaussian similarity can be calculated as follows: where γ represents hyper parameters. And γ = 1/ n i=1 |x i |/n .

Feature selection mo2dule
Firstly, we evaluate the similarity between two drugs. The feature selection includes two steps: (1) calculating the similarity scores to evaluate correlation between drugs. (2) Generating feature vectors as the input to the prediction module. The drugs' feature can be represented as a binary vector, the value is 1 or 0. Value 1 means presence of components, value 0 means absence. For instance, the data set has 1622 types of categories. So the categories can be expressed as a 1622-dimensional bit vector, the value means that the drug belongs to the category or not. Similarly, we can extract four binary feature vectors from one drug corresponding four features. Then we calculate the similarity between two drugs' feature vectors by similarity measures. By this means, similarity matrices are generated as S = s ij , where the value of s ij is from 0 to 1. The closer the value is to 1, the higher the similar degree of drugs.

Prediction module using convolutional neural network
As shown in Fig. 1, CNN-based prediction module is the important part to predict DDIs' events. Features selected from selection module are input vectors into the prediction module. Considering features selected contain noise and advantages of CNN, we decide to use CNN in the prediction module.
CNN is widely used and performs well on computer vision, like image classification, image detection and image segmentation. And powered by advanced deep learning technology, more and more studies have explored its application in bio-informatics field [30]. Compared with the pure deep neural network, CNN has the following advantages: (1) the convolutional layer has less parameters by using connections' sparsity and parameters sharing. (2) The convolutional layer extracts information from global features and local features. On the task of DDIs' prediction, Results of classification are strongly related to not only global drug features but also part of features combination. So it can enhance the capability of features learn. Consequently, in this article, we apply CNN as the supervised model for distilling integrated features information to predict DDIs.
The structure of prediction model is shown in Fig. 2. The prediction model based on CNN includes five convolutional layers, two full-connected layers and a softmax layer. Among them, convolutional layers are mainly responsible for subspace feature extraction from the input vectors. Table 5 shows the specific configuration. The kernel size of each convolutional layer is same (3 × 1), and the filters' number is increasing layer-by-layer.
In addition, we add a residual block [31] to build one short connection between two layers. Figure 3 shows the structure of residual block. The output of residual block is expressed as follows: Fig. 2 The structure of prediction model where x is the input vectors, y is the output vectors. W 1 , W 2 are the weight vectors of two layers, b 1 , b 2 are the biases, and σ 1 is the activation function of first layer. The residual block strengthen the correlation of multi-layer features. The short connection's input vectors and output vectors must have the same dimensions, and the stacked convolutional layers' output vectors are added together. It should be noted that no additional parameters are added in the residual block.
The output of each convolutional layer is passed through an activation function that enhances positive vectors and inhibits negative vectors from previous layer. In the paper, the activation function we use is Leaky ReLU. Compare with other activations, ReLU can increase feature sparsity and decrease the possibility of vanishing gradient. The expression is as follows: where a represents hyper-parameters, a is set 0.2.
There are two full-connected layers after convolutional layers. The first full-connected layer has 267 hidden units and the second has 65 hidden units. Considering predicting DDI's events is a classification task, softmax function is used as the activation of the last full-connected layer. So the loss function of the prediction module is as follows: where K represents the number of events' types, y i represents the true value, 0 or 1.

The CNN-DDI algorithm
The algorithm mainly contain two parts, combinational features selection module and CNN-based prediction module. The pseudocode of CNN-DDI is shown in Algorithm 1.