 Research
 Open Access
 Published:
Multiview feature representation and fusion for drugdrug interactions prediction
BMC Bioinformatics volume 24, Article number: 93 (2023)
Abstract
Background
Drugdrug interactions (DDIs) prediction is vital for pharmacology and clinical application to avoid adverse drug reactions on patients. It is challenging because DDIs are related to multiple factors, such as genes, drug molecular structure, diseases, biological processes, side effects, etc. It is a crucial technology for Knowledge graph to present multirelation among entities. Recently some existing graphbased computation models have been proposed for DDIs prediction and get good performance. However, there are still some challenges in the knowledge graph representation, which can extract rich latent features from drug knowledge graph (KG).
Results
In this work, we propose a novel multiview feature representation and fusion (MuFRF) architecture to realize DDIs prediction. It consists of two views of feature representation and a multilevel latent feature fusion. For the feature representation from the graph view and KG view, we use graph isomorphism network to map drug molecular structures and use RotatE to implement the vector representation on biomedical knowledge graph, respectively. We design concatenatelevel and scalarlevel strategies in the multilevel latent feature fusion to capture latent features from drug molecular structure information and semantic features from biomedical KG. And the multihead attention mechanism achieves the optimization of features on binary and multiclass classification tasks. We evaluate our proposed method based on two open datasets in the experiments. Experiments indicate that MuFRF outperforms the classic and stateoftheart models.
Conclusions
Our proposed model can fully exploit and integrate the latent feature from the drug molecular structure graph (graph view) and rich biomedical knowledge graph (KG view). We find that a multiview feature representation and fusion model can accurately predict DDIs. It may contribute to providing with some guidance for research and validation for discovering novel DDIs.
Introduction
Drugdrug interactions (DDIs) are changes in the interactions among drugs taken simultaneously or continuously [1, 2]. In general, DDIs mainly include pharmacokinetic interactions and pharmacodynamics interactions. It is an extremely complex process to verify drug interactions among drugs in pharmacology. Clinically, DDIs is a doubleedged sword. Firstly, it has beneficial effects. DDIs could not only improve efficacy and reduce adverse effects among drugs, but also provide relief from drugs poisoning and prevent the development of drugs resistance. For example, when caffeine is combined with ergotamine, solubility is increased, absorption is increased and efficacy is improved. It is important to pay attention to drugs interactions in order to improve the quality of care and the safe and effective use of combination drugs. Meanwhile, DDIs may cause adverse drug interactions (ADRs), and literature [3] gives the ADRs rates. The rate reaches 100% when at least 6 kinds of drugs are taken simultaneously. Obviously, when a drug is coadministered with another and multiple drugs, it will cause many ADRs that may increase morbidity and mortality. Thus, identifying potential DDIs as much as possible is vital for safer and improved patient prescriptions [4]. Recently, many models have been developed on DDIs prediction. And the basic methods are the traditional laboratorybased models, which are timeconsuming and costly [5], causing the limitation of the ability to discover potential DDIs. Thus computational approaches provide practical strategies for predicting DDIs. They mainly include machine learningbased models and deep learningbased models.
For machine learningbased models, Most of them adopt integrating more data sources to capture drug properties, including similarity features, such as molecular structure [6,7,8], side effects [9, 10] and genomic similarity [11]. These methods rely on handcrafted features and domain knowledge. To alleviate this phenomenon, deep learningbased models gradually prevail, obtaining abstract features instead of handcrafted features. However, some works [12,13,14,15] only focus on the structure information or SMILES sequences [16] of drugs but ignore the rich semantical information related to drugs. Others [17,18,19], on the contrary, use knowledge graph (KG) to capture the rich biomedical information but ignore the molecular structural feature of drugs.
Although the above models have achieved good performance, they simply consider the drug structure information or the rich semantic feature brought by the knowledge graph to determine the final feature representation of drugs, thus limiting its predictive capability. Moreover, these methods mainly explore binary DDIs prediction, however, it is more valuable but challenging to predict multityped DDIs. MUFFIN [20] is proposed to achieve both binary DDI prediction and multiclass DDI prediction, which considers both drug molecular structure and rich semantic features in KG. Inspired by MUFFIN, this work not only extracts features from drug molecular structure but also considers the topological feature from biomedical KG. However, MUFFIN cannot distinguish different graph structures based on the drug molecular structure and lacks a better expressive power [21,22,23] that models symmetric relationships of biomedical KG. Thus, this work employs graph isomorphism network (GIN) [24] and RotatE [22] to distinguish different drug molecular structure and to obtain rich semantic features from biomedical KG. In addition, MUFFIN directly extracts the nodes embedding from the KG and is limited in obtaining rich potential semantic features on each entity from the KG.
To solve this limitation, this work design a latent feature fusion module to obtain rich latent semantic feature of each drug in KG. In a nutshell, this work presents a novel endtoend framework called multiview feature representation and fusion (MuFRF) model, which couples drug molecular structure with biomedical KG for DDIs prediction. This framework consists of three major building blocks. The first block is multiview feature extraction and representation, including the graph view feature representation obtained by the graph isomorphism network and KG view feature representation extracted by RotatE. Then, we design a multilevel latent feature fusion block, this block fuses the drugs’ internal (molecular structure) and external (biomedical KG) feature representation from concatenatelevel and scalarlevel perspectives. Concatenatelevel excavates the latent features from different concatenate operations between molecular structure representation and KG representation. Scalarlevel explores the finegrained latent fusion features using the elementwise add and elementwise product between structure representation and KG representation. And this multilevel architecture further utilizes a multihead attention module [25] to optimize this multigranularity latent feature fusion process. The final block is to predict the potential DDIs for binary classification and multiclass classification tasks. Experiments show that MuFRF achieves the highest results for two tasks, this also verifies the integration of molecular structure and semantic information from biomedical KG is essential. The main contributions of this work can be stated as follows:

We present a multiview feature representation and fusion (MuFRF) architecture for potential drugdrug interactions prediction. It can effectively extract the drug molecular structure information and rich semantic features from biomedical KG.

Based on the multihead attention mechanism, we propose a concatenatelevel and scalarlevel feature fusion method to fuse internal and external features from different granularity operations.

We deploy extensive experiments on binary and multiclass prediction tasks. The experimental results illustrate MuFRF is superior to classic and stateofart DDI prediction models.
Related work
Over the years, some research proposed to predict the potential DDIs by using the drug molecular structure which determines all of its pharmacokinetic (how it is handled by an organism) and pharmacodynamic (how it affects an organism) properties, and ultimately all of its interactions. Vilar et al. [7] utilized molecular structure similarity to identify new DDIs prediction. Cheng et al. [26] combined drug phenotype, treatment, structure and genome similarity with genomic similarity, and sent these similarity features into a HNAI framework for DDIs prediction. Ryu et al. [15] employed the name and structure information of the drugrelated component pair to accurately generate important DDI types and outputted them in humanunderstandable sentences. CASTER [13] developed the sequence pattern mining module which decomposes the SMILES string of drugs into discrete sets of common substructures, improving the performance of DDIs prediction.
These above models both imply drug molecular structure is vital for DDI prediction. However, they ignore rich semantic information in KG constructed by drugs and drugrelated entities. Abdelaziz et al. [27] constructed a comprehensive knowledge graph using the drug attributes and the relation among drugrelated entities, and developed the drug similarities based on this knowledge graph, and established a linear regression learning model in Apache Spark for predicting DDIs. Karim et al. [17] extracted largescale DDIs to construct a knowledge graph and trained ComplEx [28] for obtaining the drug embedding features, then the ConvLSTM network would handle these features for DDIs prediction. Dai et al. [29] first introduced adversarial autoencoders framework for DDIs prediction. The autoencoders guaranteed the highquality negative samples and the discriminator further extracted the embedding of drugs. Meanwhile, GumbelSoftmax relaxation was employed to solve the vanishing gradient problems. Lin and others [18] constructed a KG capturing semantic relations among entities and employed a graph neural network aggregating more neighbor information to obtain powerful entity embedding representation for DDIs prediction.
Although these KGbased models own good performance, However, they ignore the combination between the drug molecular structure and KG, causing the bottleneck of its predictive ability. In addition, current classical works consider this prediction as a binary classification task, however, it is more valuable but challenging to predict multityped DDIs. Thus, MUFFIN adopted MPNN [30] to obtain the molecular structure information from the molecular map and employed TransE [31] for semantic features from the knowledge graph on binary and multiclass classification tasks. The significant difference between our works and this literature is that our method can distinguish different drug molecular structures. Meanwhile, we present a novel latent feature fusion block that can capture not only drug molecular structure information but rich semantic features from the largescale biomedical KG.
Methods
Following, we start by modeling drugdrug interaction prediction into supervised binaryclass and multiclass classification problems and introduce notations used throughout this paper. Then we present specific procedures of our algorithm.
Problem formulation
Given \(G_{kg}\) representing the semantic features in the knowledge graph and DDI matrix Y denoting the molecular structure information for the DDI prediction problem. Our propose is to learn a prediction function \({\hat{y}}_{ij} = F\left( \left( d_{i}, d_{j} \right) \big  \theta ,~G_{kg},~Y \right)\) to judge how likely the drug pair \(\left( d_{i}, d_{j} \right)\) is mapped to binaryclass or multiclass classification, \(\theta\) is our proposed model’s parameter. The specific description of DDI matrix Y and knowledge graph (KG) is as follows.
DDI matrix
We denote the set of drugs as \(D=\left\{ d_{1},d_{2},\ldots ,d_{N_{d}} \right\}\) and the corresponding set of molecular structure diagrams as \(G_{drug} = \left\{ g_{1},g_{2},\ldots g_{N_{d}} \right\}\), where \(N_{d}\) denotes how many drugs in DDI matrix. For the binary classification task, the DDI matrix Y is constructed. Y is the set of \(y_{ij} \in \left\{ 0,1 \right\}\), \(y_{ij} = 1\) indicates that there exists a reaction between the drug \(d_{i}\) and drug \(d_{j}\). Note that when \(y_{ij} = 0\), it does not necessarily mean no interaction between these two drugs in KG, as it may be the potential interaction while it has not been found before. In multiclass prediction tasks, all relation types which are 81 in our DDI pairs are considered for this work.
Knowledge graph
In addition to the interactions between drug pairs, we consider the semantic information for drugrelated entities (e.g., targets), represented by a knowledge graph. Formally, the knowledge graph (KG) is presented into \(G_{kg} = \left\{ (h,r,t) \big  h,t \in E,r \in R \right\}\), each triple \(\left( h_{i},r_{i},t_{i} \right)\) indicates there is a relation \(r_{i}\) (such as drugdisease, drugtarget) between \(h_{i}\) and \(t_{i}\), where \(i \in \left\{ 1,2,\ldots N_{kg} \right\}\), \(N_{kg}\) is how many triples exist in the constructed KG.
Overview
Figure 1 illustrates the overview of the Multiview Feature Representation of Fusion (MuFRF) framework. MuFRF consists of three modules for DDIs prediction. In the Multiview feature extraction and representation module, we employ a graph isomorphism network (GIN) to dig the molecular structure information from the molecular map. Meanwhile, we utilize the RotatE to obtain the semantic features from KG (KG refers to biomedical KG in this work). We design a multilevel strategy in the feature fusion module from concatenatelevel and scalarlevel perspectives. Concatenatelevel uses convolution neural networks (CNNs) to extract latent features based on the different concatenate operations between molecular structure representation and KG representation. Scalarlevel utilizes autoencoder to excavate finegrained latent fusion features with the operation of elementwise add and elementwise product between structure representation and KG representation. And the multilevel strategy employs a multihead attention module to optimize this multigranularity latent feature fusion process. MuFRF obtains the final latent representations of given drug pair \((d_{i}, d_{j})\) in the classifier module. Then we use various classifiers to compute the possibility of DDIs prediction for the binary classification task, And this classifier module outputs the probability score of each relation for the multiclass DDIs prediction. Next, this work will present the detail of the proposed model.
Multiview feature representation module
Graphbased representation
The RDKit [32] tool converts SMILES into molecular objects. Next dgl [33] is used to convert molecular objects into bidirectional dgl molecular maps which the existing models can process to extract the structural information of the molecules. The classical methods all use the MPNN [30] framework to extract the structural information of molecules. Still, the methods under the MPNN framework cannot distinguish different graph structures according to the generated graph embedding. Thus, this work adopts the graph isomorphism network (GIN) to generate the structural representation of drugs. Similar to MPNN, it mainly consists of four parts: message function (M), aggregation function (SIGMA), update function (U), and readout function (R). The difference is aggregation, update, and readout functions in GIN are all injective functions, which guarantees GIN can distinguish different drug molecular structures. Figure 2 illustrates this reason.
The message function is the binary message function \(u\_add\_e\) of dgl, u is the source node, v is the target node, e is the edge, \(u\_add\_e\) operation is to combine the multisource node features with edge weights and aggregate them into the target node. The aggregation and readout functions are sum aggregators and the update function is a multilayer perceptron MLP, both of them are injective. Thus, we obtain a GIN framework based on MLP+SUM.
where GIN adjusts the weight of the target node in each iteration through a learnable parameter \(\varepsilon ^k\), and merges current node information with aggregated neighbor information to update the current node features. Node embedding from GIN can be applied in node classification and link prediction tasks. For graph classification task, a “readout” function is proposed in this work: the generation of entire graph embedding is derived from the embedding of individual nodes. The READOUT layer uses “concat+sum” to sum all the node features obtained in each iteration to obtain the features of the graph, and then stitch them together. “sum” is to sum the output node representations from all GIN layers.
KGbased representation
In this work, consider that the composed KG has various relation types, for instance, symmetry, antisymmetry, inversion, and composition. Still, the previous TransE, RESCAL, ConvE, and other models cannot solve the above relationship; Table 1 shows the detail. Therefore, in this work, we use RotatE to implement the vector representation of entities and relationships, which is inspired by Euler decomposition: \(e^{i\theta } = cos\theta + isin\theta\). Specifically, the embedding is \(e_{h} = \left( e_{h}^{(1)},e_{h}^{(2)},\cdots {,e}_{h}^{(d)} \right) \in C^{d}\) for each entity or relationship, C is a complex space of dimension d. Every element satisfies \(e_{h}^{k} = a_{k} + b_{k}i\), \(a_{k},b_{k} \in R, k = 1,\ldots ,v\). Here, we give the formula of the score function for a triple (h, r, t).
We observe the minimum value for this score function is 0, it represents \(e_h\circ e_r\) can completely replace \(e_t\). The lower the score, the closer the distance to t after h is rotated by the relation r in a complex embedding space, and the greater the possibility that there exists an edge of relation r between h and t. We utilize RotatE to extract multirelational information because it satisfies all relation types. The strategy of negative sampling has achieved good results in both knowledge graph embedding. Thus RotatE uses a similar negative sampling loss \(L_{kg}\):
where \(\gamma _1\) is a fixed value, \(\sigma\) represents the sigmoid function, \((h', r', t')\) denotes the ith negative sampled triple. RotatE embeds the multirelational information of all drugs through iterative training and then uses it as an input to the feature fusion module to continue mining latent features.
Feature fusion module
This work adopts a multilevel strategy with a multihead module to integrate graphview and KGview feature representations. The combined latent features own the interactive information on multifaceted drug features. After the feature extraction and representation module, we initially obtained the structural information \(h_G\) and the semantic feature \(e_h\) of the knowledge graph. We combine structural information with semantic features using concatenatelevel and scalarlevel strategies to obtain different latent features. We extract potential features indepth and then optimize these hidden features using the multihead attention mechanism. Finally, residual connections cascade the fused hidden features with the initial drug structure information and semantic features. Through the above steps, the final feature representation of all drugs will be sent into MLP to predict the DDI probability score, which can represent whether there is a reaction between two drugs.
Concatenatelevel
We first conduct different concatenate for initial drug structural information \(h_G\) and the semantic feature \(e_h\) of the knowledge graph of the drug. Concatenate (dim = 1) means column splicing by row, and concatenate (dim = 0) means row splicing by column. The initial drug structure information and drug semantic features have different dimensions, so the convolution operation is used to make both dimensions 100. Then we make convolution operations on the structural information and semantic features, respectively. Compared with the fully connected layer, the parameter sharing of CNN prevents computing resources wasting, and its translationinvariant nature guarantees the extraction of the locationinsensitive information of features. In this work, the structural information of all approved drugs can be expressed as \(n*k\), n indicates the total amount of approved drugs, where n is 2322 and k represents dimension (100). All approved drugs are entities in KG, entity Vectors are also represented as \(n*k\). The drug’s structural information and semantic features are spliced in rows to capture the feature vector of \(2n*k\). The column splicing is performed to gain the feature vector of \(n*2k\). Then, 2D CNN makes convolution on these vectors of row and column splicing, and the convolution kernel size is \(2*p(10)\). We obtain the matrix vector by combining the drug structure and semantic information as a latent vector. And we continue to input the obtained latent vector into a 1D CNN, the convolution kernel size is p(5), and then 1D adaptive average pooling is performed on it, and the latent vectors LF1 and LF2 with dimension 20 are uniformly obtained.
Scalarlevel
We make elementwise add and elementwise product for the initial graphbased and KGbased representation, respectively. Then we utilize an autoencoder to excavate the latent features. Autoencoder compresses the given feature vector into a latentspace representation and then reconstructs the target vector. As shown in Fig. 3, it includes two parts: the encoder compresses the given feature vector into a latent space representation. Decoder: This part tries to reconstruct the given feature vector based on the hidden space representation. Our main purpose is to obtain latent vectors. Perform elementwise add and elementwise product operations on the drug structure feature vector and drug semantic feature vector with a dimension of 100 to obtain the feature vector of \(n*k\), respectively, and then the autoencoder will capture the hidden features from the fused feature vector. This autoencoder has three hidden layers in our model, of which the second hidden layer is the final hidden feature we want, which is represented as LF3, LF4.
Multihead attention module
Through CNN and Autoencoder, we obtain four hidden features denoted as LF1, LF2, LF3, and LF4. They present different scales of drug structural and semantic features on concatenatelevel and scalarlevel, respectively. Furthermore, we make elementwise add to capture more latent feature based on these four hidden features, represented by the fifth hidden feature as LF5. The hidden features are cascaded and sent to an encoder, which mainly includes scale dotproduct attention, add and normalization, and feedforward operations, shown in Fig. 4. The calculation of this mechanism is mainly as follows.
where X denotes the latent feature vector after concatenating LF1, LF2, LF3, LF4 and LF5, \(W_{i}^{Q} \in R^{d_{in} \times d_{Q}}\), \(W_{i}^{K} \in R^{d_{in} \times d_{k}}\),\(W_{i}^{V} \in R^{d_{in} \times d_{V}}\), and \(W^{o} \in R^{{hd}_{v} \times d_{in}}\) are the parameter matrix, \(Q_i\), \(K_i\) and \(V_i\) represent the Q(Query), K(Key), V(Value) matrices. In this work, we take 4 heads attention for the multiclass classification task, \(d_{k} = d_{v} = d_{in}/h = 25\). Compared with the total calculated cost of fulldimension singlehead attention, the reduction of the dimension of each head guarantees the total calculated cost will not increase.
Classification
We concatenate the fusion latent feature \(X_{MultiHead_att}\) with the initial drug structure feature \(h_G\) and the initial drug entity feature \(e_h\), where \(X_{MultiHead_att}\) denotes local features and global features are \(h_G\) and \(e_h\). Thus, the global features and local features of all approved drugs can be obtained.
D is the feature representation of all approved drugs. For the DDIs prediction task, we send the final drug feature representation to a dense layer to determine the DDI probability value.
\({\hat{y}}_{ij}\) represents the probability value of DDIs for binary classification, where \(\sigma\) refers to the sigmoid function. For a multiclass prediction task, it is the probability score of each relation type, and \(\sigma\) is the softmax function.
Training
We minimize the crossentropy loss to optimize the parameters in the MuFRF framework for binary classification, described as follows:
where \(y_{ij} \in \left\{ 0,1 \right\}\) represents whether there exists reaction between drug pair \(( d_{i},d_{j})\) for the binary classification. We employ label smoothing crossentropy loss for multiclass prediction. Label smoothing makes the minimum of the target vector \(\varepsilon\). Therefore, the classification results are no longer just 1 or 0 but \(1  \varepsilon\) and \(\varepsilon\). The following formula gives the crossentropy loss function with label smoothing.
where \(\varepsilon\) is a small positive number (0.15 is selected in the experimental part), i is the correct class, and \(N_c\) is the number of classes. \(p = \left[p_{0},\ldots ,p_{N_{c}  1} \right]\) denotes a probability distribution, and each element \(p_i\) is the probability value that the sample belongs to the ith class. \(y = \left[y_{0},\ldots ,y_{N_{c}  1} \right]\) refers to the onehot representation of the sample label, when the sample belongs to class i, \(y_{i} = 1\), and otherwise \(y_{i} = 0\). Intuitively, label smoothing places restrictions on the logit value of the right class, making it more approach the logit value of the other classes. Thus, to a certain extent, it is used as a regularization technique and a way to combat model overconfidence. The algorithm of the entire MuFRF training is as shown in Algorithm 1.
Results
We mainly present the various experiments to demonstrate the effectiveness of the proposed model in this section.
Datasets
Binaryclass DDIs and KG dataset
DRKG [34] is a comprehensive biological knowledge graph relating genes, compounds, diseases, biological processes, side effects, and symptoms. It is made up of 97,238 entities and 5,874,261 triples. In this work, we capture binaryclass DDI data from DrugBank [35] and Hetionet data sets in DRKG, where the relation type is \(<ddiinteractionsin:: Compound: Coumpound>\). At the same time, the drugs in the data are all approved, ensuring that the drug has Graphbased embedding information. The number of triples is 1,170,940 pairs. The remaining triples are used as the KG dataset in this work, shown in Table 2.
Multiclass DDIs
The DDI multirelational data was collected from DeepDDI [15], which was extracted from DrugBank [35]. It is made up of 192 284 DDIs and 86 relation types. This work remains 172 426 DDI pairs and 81 relations to stay the same as the KG analyzed from DRKG, eliminating relations with less than 10 samples.
The DrugBank in DRKG only extracts a triplet set of 6 relation types (target, enzyme, carrier, ddiinteractorin, xatc, treats) from the original xml format of DrugBank, and part of the binary DDIs dataset is derived from the relation “ddiinteractorin”, that is, whether there is a reaction to the entity pair. The relationship between entity pairs in the multiclassification dataset extracted from the original DrugBank is divided into 86 relationship types. For example, “label 38 represents the diuretic activity of the latter may be decreased when drug a and b are taken together.”
Baselines
we compare some representative work with the proposed model.

DeepWalk [36] takes the sequence generated by the random walk as a sentence and inputs it to the skipgram algorithm [37] to capture node embedding. And we concatenate Each node’s representation and utilize the classifier to predict DDI.

LINE [38] is a network embedding model, which combines local and global network structure information to model the node embedding. And this model only uses existing DDIs.

DeepDDI [15] utilizes SMILES describing the drug chemical structural information to precisely predict important DDI types and outputs them in sentences that humans can understand. It provides guidance for drug development.

KGDDI [17] trains ComplEx to represent the drug embedding and the authors train the ConvLSTM [39] network, in which LSTM network can decode global relationships based on features handled by CNN, to predict DDIs.

KGNN [18] utilizes the rich semantic features from KG and employs a GNN to aggregate neighborhood information for updating the representation of the current entity in the KG.

MUFFIN [20] combines a messagepassing neural network with TransE to capture drug structure representation from the molecular map and semantic features from KG, which guarantees powerful drug representation for DDIs prediction.

RANEDDI [40] captures the multirelational information between drug entities in the DDI network and comprehensively considers the original information of neighbors and the information after relationship transformation to obtain the final embedded representation of drugs.

GRPMF [41] introduces an original regularization strategy to jointly encode prior expert knowledge and graph similarity inference for DDIs prediction.
To validate the significance of each component of MuFRF, we designed seven variants to implement the ablation study:

MuFRF_ST only employs GINs to extract drug representations with structural features.

MuFRF_KG uses RotatE to learn the feature representation of each node in the knowledge graph.

MuFRFc0 removes concatenatelevel (dim = 0) feature to fuse other latent features.

MuFRFc1 removes concatenatelevel (dim = 1) feature to represent the final drug representation.

MuFRFadd drops elementwise add operation to obtain the final latent feature embedding.

MuFRFp removes elementwise product operations for DDIs prediction.

MuFRFattn removes multihead attention module for feature fusion.
Evaluation metrics
To test the proposed model performance, the following four performance metrics are needed: the overall classification accuracy (Acc), Precision, Recall, the F1 score. TP indicates true positives, TN means true negatives, FP is false positives and FN represents false negatives.
Acc is the ratio of correctly predicted samples to the total number of samples, Precision is the proportion of correctly predicted interactions among all predicted interactions, and we define the proportion of correctly predicted DDI to the existing DDIs as Recall. In addition, the F1 score is used as a comprehensive criterion. F1 score can be regarded as a harmonic average of model precision and recall. Firstly, label 0 means that there is no interaction between the two drugs. While the potential interaction may has not been found before. We pay more attention on true positive and false negative prediction results to evaluate the performance of models. Here true positive means the model finishes the DDIs prediction correctly and false negative represents the proposed model fails to predict existing drugdrug interaction. In the evaluation metrics, recall is mainly calculated by false negative. Meanwhile F1 is calculated by precision and recall. The more true positive and less false negative, the better the model performance. In this work, besides of accuracy and precision, recall and F1 are the vital metrics to evaluate our proposed model. Their calculation are listed as the following formulas 1720.
Experimental setup
This work uses 100dimensional vectors to represent KG entities, relations and drug structure embedding. For drug molecular map, we adopt the pretrainedGIN to capture the graphbased representation [42]. For KG of drugs, we use the RotatE to model entity and relation representation into complex space. Then, we construct 2D CNN and 1D CNN to dig latent information further for feature fusion component, and their kernel size is set to 10 and 5. we define the hidden size of the last dense layer as 2048. And we determine the output neurons as 1 and 81 on binary and multiclass classification tasks. We design the contrast experiments among MuFRF and baseline models which are shown in Table 3. In addition, we deploy experiments for ablation analysis which also illustrated in Table 3. Moreover, we also design experiments of parameter optimization, the results correspond to Tables 4, 5 and 6, respectively. The experimental learning rate is defined as 0.0015 for binary classification task and 0.001 for multiclass, respectively. And detailed explanation will be given in the parameter analysis section. This work trains model with 200 epochs in Pytorch with Adam optimizer and is performed in an Intel Corel I7 and a GeFore GTX 1080 Ti Graphics Cards. The hyperparameters of baselines remain unchanged as given in their published work, and these works all employ fivefold crossvalidation. We divide this dataset into training set and test set, with the test set accounting for 20%.
Overall evaluation results
The row where MuFRF is located and the preceding row in Table 3 gives the experimental performance of MuFRF and all baselines which have been described above. Compared with all baselines, MuFRF reaches the highest results on each metric for binary and multiclass classification tasks. For example, for these two classification tasks, compared with MUFFIN, the accuracy of MuFRF is improved by at least 0.322%, the precision is increased by 0.332%, the recall and F1 are improved by the same percentage as precision; the mean accuracy of MuFRF is increased by 0.75%, the macroprecision and macrorecall are both improved by at least 2%, the macroF1 is increased by 2.1%. These findings demonstrate the effectiveness of MuFRF. Note that, DeepWalk and LINE reach a low point compared with other baselines, due to they only predict the known DDIs without considering any drug information. DeepDDI performs relatively less than MuFRF because it adopts structure similarity information as auxiliary information for drug representation. And KGDDI and KGNN models also do not outperform MuFRF, due to they do not excavate the information of molecular structure graph. In the binaryclass task, our model exhibits a slight gap than MUFFIN, and in the multiclass classification task, MuFRF has a more obvious advantage than MUFFIN, GRPMF and RANEDDI. MuFRF considers the multiview feature representation, including KGview and graphview, and fully excavates the latent features in comparison with all baselines, which makes it perform well on each metric for these two tasks.
Ablation study
We make an ablation study by comparing MuFRF with its seven variants, and the lower part of Table 3 exhibits the experimental results. MuFRF_ST and MuFRF_KG have relatively low performance compared with other variants that extract molecular structure information and semantic feature in KG. Moreover, MuFRF_ST has a better score than MuFRF_KG, which verifies the molecular structure information is vital for DDI prediction. Other variants show concatenate (dim = 0 or dim = 1), elementwise add, and elementwise product bring some improvement to all evaluation criteria in binary and multiclass classification tasks. MuFRFattn shows that there is an inevitable loss in two different prediction tasks in comparison with MuFRF, which verifies that the attention mechanism optimizes the feature captured by this multilevel strategy again.
In a nutshell, the MuFRF model outperforms all baselines and all variants of MuFRF, which completely implies that integrating Molecular structure information and semantic feature in KG is vital in all prediction tasks. And the proposed multilevel latent feature fusion between structure information and semantic features is essential for DDIs prediction.
Parameter analysis
Here, we will study whether the varieties of key parameters influence the performance of MuFRF. To show how different parameters will affect the model performance, experimental results on each metric are given for binary and multiclass classification tasks. The specific experiment results and analysis are as follows.
Impact of negative sample size
Figures 5 and 6 illustrate the effect of different negative triplets on each positive triplet during KG training. For the binary classification task, we fix the evaluate batch size as 2500 and exhibit the influence with the confusion matrix. In comparison with 32, 64, 256 sample size, each evaluation criterion of MuFRF reaches the peak when the sample size is 128. Line graphs plot the performance of different negative sample sizes across all metrics for the multiclass classification task. We can see that MuFRF obtains more valuable information with enough negative samples. However, there exists more noise in the KG representation process when the negative samples increases, this will be further investigated in future work.
Impact of the embedding dimension
How multiple embedding dimensions influence model performance is shown in Table 4. Specifically, this work investigates the effect when the embedding dimensions are changed from 32 to 128. The performance on MuFRF rises with increasing the embedding dimension from 32 to 100. Still, when the embedding dimension is 128, the performance declines compared to the previously selected dimension. Thus, we fix the embedding size as 100 dimensions in our experiment.
Impact of the nheads
To verify how the selection of heads in the multihead attention module influences each evaluation criterion on these two tasks, we conduct experiments with different selections of nheads. We have fixed the embedding size at 100 and the epoch at 50, respectively. Table 5 shows we employ two heads for binary classification and four heads for multiclass classification, the results on all criteria are optimal. Thus, in this work, we select two or four heads for the DDIs prediction to train the best model.
Impact of the hyperparameters
To guarantee our model could achieve good performance, learning rate and batch size are critical. We first determine the learning rate for the binary classification task, where we set the epoch and batch size as 50 and 1024, respectively. Figure 7a shows when the learning rate is 0.0005, the results on all criteria reach the peak. Thus, we identify the learning rate as 0.0005. Then, we only set the epoch to 1 to quickly train our model with different batch sizes. Figure 7b illustrates the results on all criteria are best when the batch size is 128. However, considering the number of our train data is 1873504, a small batch size of 128 costs too much time, thus we maintain the original batch size of 1024. As we all know, the learning rate setting should be proportional to the setting of batch size, which is the socalled linear scaling rule. And literature [43] explains this rationale. As shown in Fig. 7, the results of all standards are the best when the batch size is 128. However, They take too much training time on the experimental deployments. To reduce the training time, we design several experiments to find relations among batch size and performance and time cost. According to the experimental results shown in Fig. 7, we employ larger batch size such as 1024. The model performance still reaches 87.64 on F1 score when the batch size is 1024, then we will continue to increase our batch size linearly, the final batch size is 3072 due to the memory size of our lab server. The Linear Scaling Rule determines that as the batch size increases by a factor of k, the learning rate also increases by a factor of k. Finally, we determine the batch size as 3072, thus, the learning rate should be 0.0015 in this work for the binary classification task. For the Multiclass classification task, Fig. 7c and d present that scores of all criteria are highest with a learning rate of 0.001 and batch size of 1000.
Impact of the smooth value
We employ the label smooth cross entropy function for the multiclass classification task. When the smooth value is 0, this loss function is also the standard cross entropy function. To avoid overfitting and alleviate the impact of wrong labels, this work assigns the smooth value a small constant. We have fixed the learning rate of 0.001 and batch size of 1000. Six smooth values, which are 0.0, 0.1, 0.15, 0.25, and 0.3 are given, and 50 epochs are performed to train the prediction model. Table 6 indicates when the smooth value is 0.15, the results in all criteria are the best. Thus, the best prediction model is implemented when we fix a learning rate of 0.001, a batch size of 1000, and a smooth value of 0.15.
Discussion
This work will test the practical value of MuFRF and discuss the real application of MuFRF through the case study. Table 7 illustrates the predicted results for two common drugs, Selexipag and Vorapaxar. Selexipag is a nonprostanoid IP prostacyclin receptor agonist and it is taken for treating pulmonary arterial hypertension. Vorapaxar is a platelet aggregation inhibitor for reducing thrombotic cardiovascular events in patients with a history of myocardial infarction (MI) or peripheral arterial disease (PAD). For these drug pairs, we try to discover evidence for supporting them from DrugBank, PubMed, and Drug Interactions Checker tool provided by Drugs.com. Table 7 shows that some of the DDI pairs have evidence, that can signify the effectiveness of MuFRF. For the utterly new DDIs predicted by MuFRF, we expect to provide certain guidelines for future exploration and experimental validation.
Conclusions
this work develops a multiview feature representation and fusion (MuFRF) framework to achieve drugdrug interactions prediction on both binaryclass and multiclass classification tasks. MuFRF designs a multilevel latent feature fusion strategy and uses a multihead attention block to fully exploit and optimize the latent feature from the graph view and KG view. Ablation study can demonstrate the multilevel feature fusion between structure information in the molecular graph and semantic features in biomedical KG is effective. Moreover, the attention mechanism can effectively optimize the latent feature of all drugs. Compared with baselines on DDIs prediction, experiments show that our model is effective on two realworld datasets. This work mainly focuses on binary classification and multiclass classification tasks. However, there exists a multilabel phenomenon for DDIs pairs in TWOSIDES [10] from DRKG. Thus, we will further investigate multilabel classification in future work.
Availability of data and materials
The datasets used and analysed during the current study available from the corresponding author on reasonable request.
Abbreviations
 MuFRF:

Multiview feature representation and fusion
 DDIs:

Drugdrug interactions
 ADRs:

Adverse drug interactions
 KG:

Knowledge graph
 GIN:

Graph isomorphism network
 CNNs:

Convolution neural networks
 SMILES:

Simplifed Molecular Input Line Entry System
 MLP:

Multilayer perceptro
References
Giacomini KM, Krauss RM, Roden DM, Eichelbaum M, Hayden MR, Nakamura Y. When good drugs go bad. Nature. 2007;446:975–7.
Plumpton CO, Roberts D, Pirmohamed M, Hughes DA. A systematic review of economic evaluations of pharmacogenetic testing for prevention of adverse drug reactions. Pharmacoeconomics. 2016;34:771–93.
Clark MA, Harvey RA, Finkel R, Rey JA, Whalen K. Pharmacology. Philadelphia: Lippincott Williams & Wilkins; 2011.
Lee G, Park C, Ahn J. Novel deep learning model for more accurate prediction of drugdrug interaction effects. BMC Bioinform. 2019;20:1–8.
Whitebread S, Hamon J, Bojanic D, Urban L. Keynote review: in vitro safety pharmacology profiling: an essential tool for successful drug development. Drug Discov Today. 2005;10:1421–33.
Vilar S, Harpaz R, Uriarte E, Santana L, Rabadan R, Friedman C. Drugdrug interaction through molecular structure similarity analysis. J Am Med Inform Assoc. 2012;19:1066–74.
Vilar S, Uriarte E, Santana L, Lorberbaum T, Hripcsak G, Friedman C, Tatonetti NP. Similaritybased modeling in largescale prediction of drugdrug interactions. Nat Protoc. 2014;9:2147–63.
Takeda T, Hao M, Cheng T, Bryant SH, Wang Y. Predicting drugdrug interactions through drug structural similarities and interaction networks incorporating pharmacokinetics and pharmacodynamics knowledge. J Cheminformatics. 2017;9:1–9.
Jin B, Yang H, Xiao C, Zhang P, Wei X, Wang F. Multitask dyadic prediction and its application in prediction of adverse drugdrug interaction. In: Proceedings of the thirtyfirst AAAI conference on artificial intelligence. 2017. vol. 31, p. 367–373.
Tatonetti NP, Fernald GH, Altman RB. A novel signal detection algorithm for identifying hidden drugdrug interactions in adverse event reports. J Am Med Inform Assoc. 2012;19:79–85.
Zhou Y, Hou Y, Shen J, Huang Y, Martin W, Cheng F. Networkbased drug repurposing for novel coronavirus 2019ncov/sarscov2. Cell Discov. 2020;6:1–18.
Deac A, Huang YH, Veličković P, Liò P, Tang J. Drugdrug adverse effect prediction with graph coattention. arXiv preprint arXiv:1905.00534 2019.
Huang K, Xiao C, Hoang T, Glass L, Sun J. Caster: Predicting drug interactions with chemical substructure representation. In: Proceedings of the AAAI conference on artificial intelligence. 2020. vol. 34, p. 702–709.
Mohamed SK, Nováček V, Nounu A. Discovering protein drug targets using knowledge graph embeddings. Bioinformatics. 2020;36(2):603–10.
Ryu JY, Kim HU, Lee SY. Deep learning improves prediction of drugdrug and drugfood interactions. In: Proceedings of the national academy of sciences. 2018. vol. 115, p. 4304–4311.
Toropov AA, Toropova AP, Mukhamedzhanoval DV, Gutman I. Simplified molecular input line entry system (smiles) as an alternative for constructing quantitative structureproperty relationships (qspr). Indian J Chem Sect Inorg Phys Theor Anal Chem. 2005;44:1545–52.
Karim MR, Cochez M, Jares JB, Uddin M, Beyan O, Decker S. Drugdrug interaction prediction based on knowledge graph embeddings and convolutionallstm network. In: Proceedings of the 10th ACM international conference on bioinformatics, computational biology and health informatics (BCB ’19). 2019. p. 113–123.
Lin X, Quan Z, Wang ZJ, Ma T, Zeng X. Kgnn: Knowledge graph neural network for drugdrug interaction prediction. In: Proceedings of the twentyninth international joint conference on artificial intelligence. 2020. p. 2739–2745.
Zitnik M, Agrawal M, Leskovec J. Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics. 2018;34:457–66.
Chen Y, Ma T, Yang X, Wang J, Song B, Zeng X. Muffin: multiscale feature fusion for drugdrug interaction prediction. Bioinformatics. 2021;37:2651–8.
Li M, Sun Z, Zhang S, Zhang W. Enhancing knowledge graph embedding with relational constraints. Neurocomputing. 2021;429:77–88.
Sun Z, Deng Z, Nie J, Tang J. Rotate: Knowledge graph embedding by relational rotation in complex space. arXiv preprint arXiv:1902.10197 2019.
Vashishth S, Sanyal S, Nitin V, Agrawal N, Talukdar P. Interacte: Improving convolutionbased knowledge graph embeddings by increasing feature interactions. In: Proceedings of the AAAI conference on artificial intelligence. 2020. vol. 34, p. 3009–3016.
Xu K, Hu W, Leskovec J, Jegelka S. How powerful are graph neural networks? arXiv preprint arXiv:1810.00826 2018.
Vaswani A, Shazeer N, et al. N.P. Attention is all you need. In: Proceedings of the 31st international conference on neural information processing systems. 2017. p. 6000–6010.
Cheng F, Zhao Z. Machine learningbased prediction of drugdrug interactions by integrating drug phenotypic, therapeutic, chemical, and genomic properties. J Am Med Inform Assoc. 2014;21:278–86.
Abdelaziz I, Fokoue A, Hassanzadeh O, Zhang P, Sadoghi M. Largescale structural and textual similaritybased mining of knowledge graph to predict drugdrug interactions. J Web Semant. 2017;44:104–17.
Trouillon T, Welbl J, Riedel S, Gaussier R, Bouchard G. Complex embeddings for simple link prediction. In: International conference on machine learning. 2016. p. 2071–2080.
Dai Y, Guo C, Guo W, Eickhoff C. Drugdrug interaction prediction with Wasserstein adversarial autoencoderbased knowledge graph embeddings. Brief Bioinform. 2021;22(4):256.
Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE. Neural message passing for quantum chemistry. In: International conference on machine learning, 2017. p. 1263–1272.
Bordes A, Usunier N, GarciaDuran A, Weston J, Yakhnenko O. Translating embeddings for modeling multirelational data. In: Advances in neural information processing systems 26: annual conference on neural information processing systems 2013. vol. 26.
Landrum G. Rdkit documentation. Release. 2013;1(1–79):4.
Wang M.Y. Deep graph library: towards efficient and scalable deep learning on graphs. In: ICLR workshop on representation learning on graphs and manifolds. 2019.
Ioannidis VN, Song X, Manchanda S, Li M, Pan X, Zheng D, Ning X, Zeng X, Karypis G. Drkgdrug repurposing knowledge graph for covid19. arXiv preprint arXiv:2010.09600 2020.
Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, Sajed T, Johnson D, Li C, Sayeeda Z, et al. Drugbank 5.0: a major update to the drugbank database for 2018. Nucleic Acids Res. 2018;46(D1):1074–82.
Perozzi B, AlRfou R, Skiena S. Deepwalk: Online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. 2014. p. 701–710.
Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 2013.
Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q. Line: Largescale information network embedding. In: Proceedings of the 24th international conference on world wide web; 2015. p. 1067–1077.
Shi X, Chen Z, Wang H, Yeung DY, Wong WK, Woo WC. Convolutional lstm network: A machine learning approach for precipitation nowcasting. In: Proceedings of the 28th international conference on neural information processing systems. 2015. vol. 1. p. 802–810.
Yu H, Dong WM, Shi JY. Raneddi: relationaware network embedding for drugdrug interactions prediction. Inf Sci. 2022;582:167–80.
Jain S, Chouzenoux E, Kumar K, Majumdar A. Graph regularized probabilistic matrix factorization for drugdrug interactions prediction. arXiv preprint arXiv: 2210.10784 2022.
W Hu BL, Gomes J, Zitnik M, Liang P, Pande V, Leskovec J. Strategies for pretraining graph neural networks. arXiv preprint arXiv:1905.12265 2019.
Goyal P, Dollár P, Girshick R, Noordhuis P, Wesolowski L, Kyrola A, Tulloch A, Jia Y, He K. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677 2017.
Acknowledgements
Not applicable.
Funding
This work was partly supported by the Science and Technology Innovation 2030 “New Generation of Artificial Intelligence” Major Project under Grant No.2021ZD0111000 and partly by the Key Science and Research Program of Henan Province under Grant 21A520044.
Author information
Authors and Affiliations
Contributions
JW: Conceptualization, Methodology, Validation, Investigation, Writing—original draft. SZ: Software, Data curation, Validation. RL: Conceptualization, Methodology, Writing—review & editing. GC: Methodology, Writing—review & editing. SY: Supervision, Methodology, Writing—review & editing. LM: Project administration, Conceptualization. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Wang, J., Zhang, S., Li, R. et al. Multiview feature representation and fusion for drugdrug interactions prediction. BMC Bioinformatics 24, 93 (2023). https://doi.org/10.1186/s12859023052124
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12859023052124
Keywords
 Drugdrug interactions
 Graph representation
 Drug molecular structure
 Semantic information extraction
 Feature fusion