 Research
 Open Access
 Published:
Multitype feature fusion based on graph neural network for drugdrug interaction prediction
BMC Bioinformatics volume 23, Article number: 224 (2022)
Abstract
Background
DrugDrug interactions (DDIs) are a challenging problem in drug research. Drug combination therapy is an effective solution to treat diseases, but it can also cause serious side effects. Therefore, DDIs prediction is critical in pharmacology. Recently, researchers have been using deep learning techniques to predict DDIs. However, these methods only consider single information of the drug and have shortcomings in robustness and scalability.
Results
In this paper, we propose a multitype feature fusion based on graph neural network model (MFFGNN) for DDI prediction, which can effectively fuse the topological information in molecular graphs, the interaction information between drugs and the local chemical context in SMILES sequences. In MFFGNN, to fully learn the topological information of drugs, we propose a novel feature extraction module to capture the global features for the molecular graph and the local features for each atom of the molecular graph. In addition, in the multitype feature fusion module, we use the gating mechanism in each graph convolution layer to solve the oversmoothing problem during information delivery. We perform extensive experiments on multiple real datasets. The results show that MFFGNN outperforms some stateoftheart models for DDI prediction. Moreover, the crossdataset experiment results further show that MFFGNN has good generalization performance.
Conclusions
Our proposed model can efficiently integrate the information from SMILES sequences, molecular graphs and drugdrug interaction networks. We find that a multitype feature fusion model can accurately predict DDIs. It may contribute to discovering novel DDIs.
Introduction
DrugDrug interactions (DDIs) refer to the presence of one drug changing the pharmacological activity of another, which may produce some side effects and even injury or death. At the same time, multiple drug combinations to treat diseases are inevitable. So, it is crucial to predict potential DDI. Traditional methods of DDI prediction depend on in vivo and in vitro experiments. However, due to its limited environment, too small scale, cumbersome and expensive process, the ability to predicting DDI is greatly limited. Therefore, an efficient computational method is needed to predict DDI.
In the past several years, people have proposed methods based on machine learning [1,2,3,4] to solve this problem. Qiu et al. [5] summarized some methods based on machine learning. Deng et al. [6] used chemical structure to learn the representation of DDIs in representation module, and then predicted some rare events with few examples in comparing module. Deng et al. [7] predicted DDI using different drug features and constructed deep neural networks (DNN). Zhang et al. [8] predicted DDI using manifold regularization.
Recently, graphbased representation learning has been applied to DrugDrug interaction. Drugs are compounds, each of which can be represented by a molecular graph with the atom as the node and the chemical bond as the edge, or a Simplified Molecular Input Line Entry System (SMILES) sequence. In DrugDrug interaction networks, by treating the drug as the node and the interaction as the edge, DDI prediction can be regarded as link prediction tasks. Graph neural network (GNN) has made some progress in DDI prediction [9,10,11,12,13]. Feng et al. [14] predicted DDI using Graph Convolutional Network (GCN) and DNN. In addition, there are also many methods about multitype DDI prediction [15,16,17]. Nyamabo et al. [18] proposed to predict DDIs by the interactions between drug substructures. Then, Nyamabo et al. [19] used gating devices to learn the chemical substructures of drugs. Chen et al. [20] used the bilevel cross strategy to fuse the structural information and knowledge graph information of drugs.
Although the models mentioned have achieved significant results, there are still some limitations: (i) The models mentioned are generally limited to only considering the structure, sequence or interaction information of the drugs, without considering the synergistic effects between them. (ii) For molecular graphs, only applying GNN can extract the local features for the atoms of the molecular graph, but it is difficult to propagate the information in the graph remotely to capture the global features for the molecular graph. (iii) In drugdrug interaction networks, node features obtained by stacking multilayer GNNs will be smoothed and blurred, which loses the diversity of node features.
To address above issues, this paper proposes an endtoend learning framework for DDI prediction, namely MFFGNN. In MFFGNN, we first utilize deep neural networks to capture the intradrug features from SMILES sequences and molecular graphs. For SMILES sequences, MFFGNN applies the bidirectional gate recurrent unit neural network [21] to extract local chemical context information from the sequences. For molecular graphs, MFFGNN not only utilizes graph interaction networks [22] but also graph warp unit [23] to extract both the global features for the molecular graph and the local features for each atom of the molecular graph. In addition, MFFGNN takes the intradrug features as the initial features of the nodes in the DDI network and uses GCN encoder to fuse the intradrug features and external DDI features to update the drug representation. Finally, we predict the missing interactions in the DDI graph through Multilayer Perceptron (MLP).
Overall, the main contributions of this paper are summarized as follows:

We propose a novel model MFFGNN for DDI prediction, which fuses the topological information in molecular graphs, the interaction information between drugs and the local chemical context in SMILES sequences.

To better learn the topological structure of drugs, we propose a molecular graph feature extraction module (MGFEM) to extract the global features for the molecular graph and the local features for each atom of the molecular graph.

We conduct extensive experiments on three real datasets with different scales to demonstrate the superiority of our model.
Related works
Drugdrug prediction
DrugDrug prediction has always been a worthy research direction in pharmacology. Most of previous work depended on in vivo and in vitro experiments. However, they do not scale well due to the limitations of the laboratory environment [24]. Subsequently, machine learning has been proposed to solve this problem. Similaritybased methods calculated specific similarity measures [25,26,27,28,29], e.g., drug structure, targets, side effects, genomic properties, therapeutic, etc., while combined with machine learning models for drug prediction. Ryu et al. [30] predicted the type of drugdrug interactions using DNN based on the similarity of the chemical structure of drugs. Graphbased methods predicted drugdrug interactions by learning the molecular graph [31] or interaction graph [32]. Shang et al. [33] modeled drugs as nodes and DDI as links, so tasks as link prediction problems.
Graph neural network
Recently, as a neural network method on graph domain, the study of graph neural network (GNN) has received great attention. With the development of GNN, many variants based on GNN came out one after another [34,35,36]. Rahimi et al. [37] proposed to control the transmission of neighbourhood information through gating operation. With the increasing popularity of GNN, researchers are using GNN models for DDIs [38]. For example, Duvenaud et al. [39] used GNN to perform molecular modeling by extracting molecular circular fingerprints. Lin et al. [40] used knowledge graph neural network (KGNN) to mine their associated relations in knowledge graph to solve the DDI prediction problem. Bai et al. [41] proposed to learn drug feature representation by a Bilevel Graph Neural Network (BIGNN) to solve biological link prediction tasks. MIRACLE [42] is most relevant to our work.
Methods
Preliminaries
We define the drug set as \(D\!\!=\!\!\{d_{1}, \ldots , d_{n}\}\) and its corresponding SMILES sequence set as \(Q=\left\{ q_{1}, q_{2}, \ldots , q_{n}\right\}\), where n represents the number of drugs. We define the molecular graph as \({\mathbb {G}}=({\mathbb {V}}, {\mathbb {E}})\), where \({\mathbb {V}}\) and \({\mathbb {E}}\) represent the sets of atoms and chemical bonds, respectively, and interaction graph as \({\mathcal {G}}=({\mathbb {G}}, {\mathbb {L}})\), where \({\mathbb {L}}\) represents the links between drugs. We use \(d_{h}\) to define the dimension of the representation of the atom and chemical bond and \(d_{g}\) to define the dimension of the representation of the drug.
Problem description The DDI prediction problem is regarded as the link prediction task on the graph. The interaction graph \({\mathcal {N}}\) can be represented by an adjacency matrix \({\mathbf {A}} \in {\mathbb {R}}^{n \times n}\) with each element \(a_{i j} \in \{0,1\}\). Given two drug nodes, the DDI prediction problem is defined to predict whether there is an interaction between them.
Overview of MFFGNN
The framework of MFFGNN is shown in Fig. 1, which is divided into the following four modules. In Molecular Graph Feature Extraction Module (MGFEM), we use the graph interaction network with graph wrap unit to extract the topological structure features of the drug from a given molecular graph. In SMILES Sequence Feature Extraction Module (SSFEM), we employ the bidirectional gate recurrent unit to extract local chemical context from a given SMILES sequence. In Multitype Feature Fusion Module (MFFM), we apply GCN encoder to fuse the intradrug features and external DDI features to update the drug representation. Finally, we predict the missing interactions in the DDI graph through MLP.
Molecular graph feature extraction module
The Molecular Graph Feature Extraction Module (MGFEM) is shown in Fig. 2. Molecular graphs are an important expression for drugs. We use RDKit [43] tool to construct the molecular graph \({\mathbb {G}}\) based on SMILES sequence. First, we obtain the initial features \({\mathbf {v}}_{i}^{(in)}\) of each atom according to atom symbol, formal charge, whether the atom is aromatic, its hybridization, chirality, etc. Similarly, we obtain the initial features \({\mathbf {e}}_{ij}^{(in)}\) of each bond according on the type of bond, whether the bond is in a ring, whether it is conjugated, etc. Then, the initial atom and chemical bond features are transformed to \({\mathbb {R}}^{d_{h}}\) through a layer neural network, and the calculation process is as follows:
where \({\text {ReLU}}\) is the activation function, \({\mathbf {W}}_{v}^{(0)}\) and \({\mathbf {W}}_{e}^{(0)}\) are the learnable weight matrices. To fully extract atom and chemical bond features, we apply graph interaction networks [22]. In graph interaction network, firstly, the features of edge \(e_{ij}\) are updated according to the features of its connected nodes and itself, and the process is as follows:
where  is concatenation operation, \({\mathbf {W}}_{e}^{(l)}\) and \({\mathbf {b}}_{e}^{(l)}\) are the learnable weight matrix and the bias of the edge update, respectively. Then, the node features are updated according to the features of its connected edges and itself, and the calculation process is as follows:
where N(i) represents the neighbor of node i.
The above processes can only spread the features of atoms and chemical bonds locally, but cannot spread information globally. Therefore, we propose to extract the global features of the molecular graph by applying graph warp unit (GWU) [23]. The properties of the whole drug often influence drugdrug interaction prediction. The GWU consists of three parts: supernode, transmitter and warp gate.
Supernode: We add a supernode to the graph, which can connect every atom in the molecular graph. Then, the sum of all atom features is taken as the initial feature of the supernode, \({\mathbf {g}}^{(0)}\in {\mathbb {R}}^{d_{h}}\), that is:
Then, the features of the supernode are updated by a singlelayer neural network:
where \({\mathbf {W}}_{g}^{(l)}\) are the learnable weight matrix.
Transmitter: The transmitter part gathers information from the atoms and the supernode. Before propagating the atom features to the supernode, we need to transform the form of the information. Different atom features have different degrees of importance relative to the global features. Therefore, the transmitter part applies the multihead attention mechanism to aggregate different atom features. The calculation process is as follows:
where \({\mathbf {v}}_{v \rightarrow s}^{(l)}\) represents the information propagated from each atom to the supernode at the \(l^{th}\) layer, \(\alpha _{v, i}^{(k, l)}\) represents the significance score of node i at the \(k^{th}\) head and the \(l^{th}\) layer, \(\odot\) represents the product of the elements and \(k =1,2, \ldots , K\), K represents the number of heads. The information propagated from the supernode to each atom is calculated by the following formula:
where \({\mathbf {g}}_{s \rightarrow v}^{(l)}\) represents the information propagated from the supernode to each atom at the \(l^{th}\) layer.
Warp Gate: The warp gate combines the transmitted information and sets the gating coefficients to control the fusion of information. For each atom, gated interpolation is used to fuse the information from the supernode \({\mathbf {g}}_{s \rightarrow v}^{(l)}\) with the updated atom features \({\mathbf {v}}_{i}^{(l)}\):
where \(\alpha _{s \rightarrow i}^{(l)}\) represents the gating coefficient during the transmission from supernode to each atom and \({\mathbf {v}}_{s \rightarrow i}^{(l)}\) represents the information transmitted to each atom. For supernode, gated interpolation is used to fuse information from atoms \({\mathbf {v}}_{v \rightarrow s}^{(l)}\) with updated supernode features \(\tilde{{\mathbf {g}}}^{(l)}\):
where \(\alpha _{i \rightarrow s}^{(l)}\) represents the gating coefficient during the transmission from atom to supernode and \({\mathbf {g}}_{i \rightarrow s}^{(l)}\) represents the information transmitted to supernode. Finally, the updated features of each atom and supernode are calculated through the gated recurrent units (GRU) [44]:
By applying this module to the whole dataset, we obtain the feature matrix \({\mathbf {G}}\in {\mathbb {R}}^{n \times d_{g}}\) based on the molecular graph.
SMILES sequence feature extraction module
Drugs are commonly represented by the SMILES sequences, which are composed of molecular symbols. SMILES sequences also contain rich features compared with molecular graphs. The molecular graphs of the drug provide how the atoms are connected, while the SMILES sequences provide the functional information of the atoms and longterm dependency representations. To capture the local chemical context in SMILES sequences, we first utilized the embedding method to construct an atomic embedding matrix, and then input it into the Bidirectional Gate Recurrent Unit (BiGRU) neural network to obtain the entire drug representation. SMILES Sequence Feature Extraction Module (SSFEM) is shown in Fig. 3.
Nowadays, most methods encode SMILES sequence by label or onehot encoding. However, onehot encoding and label ignore the context information of the atom. Therefore, to explore the function of the atom in the context, we propose to encode SMILES sequences by an advanced embedding method, Smi2Vec [45]. Specifically, for SMILES sequences \(q_{1}\), we divide them into a series of atomic symbols by space. Then, we map each atom to an embedding vector according to the pretrained embedding dictionary. Finally, we aggregate the embedding vectors of atoms to obtain an embedding matrix \({\mathbf {X}} \in {\mathbb {R}} ^{{m} \times d_{h}}\), in which m is the number of atoms and each row is the embedding of an atom.
We apply a layer of BiGRU [21] on the embedding matrix \({\mathbf {X}}\). BiGRU trains the input data with two GRUs in opposite directions, as shown in Fig. 3. The current hidden state of BiGRU can be described as follows: \(\overrightarrow{{\mathbf {s}}_{t}}\!\!=\!\!{\text {GRU}}({\mathbf {x}}_{t}, \overrightarrow{{\mathbf {s}}_{t1}})\) and \(\overleftarrow{{\mathbf {s}}_{t}}\!\!=\!\!{\text {GRU}}({\mathbf {x}}_{t}, \overleftarrow{{\mathbf {s}}_{t1}})\) , where \({\text {GRU}}(\cdot )\) represents a nonlinear transformation of the input vector. Therefore, the hidden state \({\mathbf {s}}_{t}\) at time t can be expressed by the weighted sum of \(\overrightarrow{{\mathbf {s}}_{t}}\) and \(\overleftarrow{{\mathbf {s}}_{t}}\), which is expressed as follows:
where \({\mathbf {W}}_{t}\) and \({\mathbf {V}}_{t}\) represent the weights, and \({\mathbf {b}}_{t}\) represents the bias. Then, we use a fully connected layer as the readout layer to obtain the drug representation. By applying this module to the whole dataset, we obtain the sequencebased feature matrix \({\mathbf {S}} \in {\mathbb {R}}^{n \times d_{g}}\).
Note that we should input a fixsized matrix into the BiGRU layer. However, the length of the SMILES sequence varies. We use the approximately average length of the sequences in the dataset as the fixed length and apply zeropadding and cutting operations.
Multitype feature fusion module
We combine the feature matrices \({\mathbf {G}}\) and \({\mathbf {S}}\) obtained above to obtain the intradrug features, namely \({\mathbf {H}}={\mathbf {G}}\bigoplus {\mathbf {S}}\). In order to fuse the intradrug features with the external DDI features, we design a GCN encoder with the gating mechanism. Specifically, we take the intradrug features as the initial node features in the interaction graphs, and then update the node representation by multilayer GCN. The Multitype Feature Fusion Module (MFFM) is shown in Fig. 4.
For drug \(d_{i}\), the output of \(r^{t h}\) layer is as follows:
where \({\mathbf {W}}_{u}^{r}\) is learnable weight parameter. \(\tilde{{\mathbf {A}}}_{ij}\) is the component of the normalized adjacency matrix \(\tilde{{\mathbf {A}}}\). \(\tilde{{\mathbf {A}}}=\hat{{\mathbf {K}}}^{\frac{1}{2}}\left( {\mathbf {A}}+{\mathbf {I}}_{n}\right) \hat{{\mathbf {K}}}^{\frac{1}{2}}\) where \(\hat{{\mathbf {K}}}_{\mathbf {i i}}=\sum _{\mathrm {j}}\left( {\mathbf {A}}+{\mathbf {I}}_{n}\right) _{\mathrm {i j}}\). We can add multiple GCN layers to expand the neighborhood of label propagation, but it may also cause the increase of noisy information. Meanwhile, the neighborhoods of different orders contain different information. Therefore, we utilize the gating mechanism [37] to control how much neighborhood information is passed to the node. The process is as follows:
where \(T({\mathbf {c}}^{r1})\) represents the gating weight of the \((r1)^{t h}\) layer, \(({\mathbf {W}}^{r1}, {\mathbf {b}}^{r1})\) are weight matrix and bias variable of the \((r1)^{t h}\) layer. After multilayer GCN, we finally obtain the feature matrix \({\mathbf {Z}} \in {\mathbb {R}}^{n \times d_{g}}\) for drugs in DDI Network.
In addition, inspired by MIRACLE, the module uses the graph contrastive learning approach to balance the information inside and outside of the drug. For the drug \(d_{i}\), we take itself and its firstorder neighboring nodes as positive samples P and the nodes not in firstorder neighbors as negative samples N. We design a learning objective, which made external features of drug \(d_{i}\) consistent with internal features of positive samples and distinct from internal features of negative samples, defined as follows:
where \(f_{\mathrm {D}}(\cdot )\!: \!{\mathbb {R}}^{d_{g}} \!\times \!{\mathbb {R}}^{d_{g}} \!\longmapsto \!{\mathbb {R}}\) is the discriminator function, which scores agreement between the two vectors of the input. Here we set it to the point product operation.
DDI prediction
Firstly, we obtain an interaction link representation by multiplying two drug representation. Then, we input it into the MLP to get the prediction score:
where MLP consists of two fully connected layers.
Our learning objective is to minimize the distance between the predictions and the true labels. The specific formula is as follows:
where \(y_{i j}\) is the real label for drug pair \((d_{i},d_{j})\). Then, we unify the DDI prediction task and the contrastive learning task into a learning framework. Formally, the learning objective of our model is:
where \(\alpha\) is a hyperparameter used to control the magnitude of contrastive task.
Results
In this section, we design various experiments to demonstrate the superiority of the model MFFGNN.
Experimental setup
Datasets. To verify the validity of our model on datasets with different scales, we evaluate the proposed model in small, medium, and large datasets. In the smallscale dataset, the number of drugs is relatively small, but fingerprints of all drugs are available. In the mediumscale dataset, although the number of drugs is relatively large, there is only the same number of labeled DDI links as in smallscale dataset. In the largescale dataset, most of drugs lack many fingerprints. Detailed information about the datasets can be seen in Table 1.
Note that we removed the SMILES sequences that cannot construct the graph in the dataset.
Baselines To demonstrate the superiority of our model, we compare MFFGNN with the following stateoftheart models:

SSPMLP [30]: This approach used the names and structural information of drugdrug or drugfood pairs as inputs and applied Structural Similarity Profile (SSP) and MLP for classification. We name this model as SSPMLP.

MultiFeature Ensemble [46]: This approach combined multiple types of data and proposed a collective framework. We name this model as Ens.

GCN [48]: This approach applied GCN to perform semisupervised node classification. We use GCN to extract structural information of drugs for DDI prediction.

GAT [35]: This approach used GAT to perform node classification task. We apply GAT to extract drug features in interaction graph for DDI prediction.

SEALC/AI [49]: This approach performs semisupervised graph classification tasks from a hierarchical graph perspective. We apply this model to obtain drug features for DDI prediction.

NFPGCN [39]: This approach designs a GCN for learning molecular fingerprints. We name this model as NFPGCN.

MIRACLE [42]: This approach simultaneously learned the interview molecular structure information and intraview interaction information of drugs for DDI prediction.

MFs [50]: This approach only used molecular fingerprints as input to the DDI network to predict DDIs, we name this model as MFs.

We also consider several multitype DDI prediction methods and apply them to binary classification tasks, i.e. DPDDI [14], SSIDDI [18], DDIMDL [7], MUFFIN [20].
Implementation details For the division of the datasets, the splitting method is the same as MIRACLE [42]. We divide 80% of each dataset into the training set, 20% into the test set, and 20% of the training set are randomly sampled as the validation set. The dataset only contains positive drug pairs. For negative training samples, we select the same number of negative drug pairs [51].
We utilize Adam [52] optimizer to train the model and Xavier [53] initialization to initialize the model. We utilize the exponential decay method to set the learning rate, where the initial learning rate is 0.0001 and the multiplication factor is 0.96. The model applies a dropout [54] layer to the output of each intermediate layer, where the dropout rate is 0.3. We set the dimension of the atomlevel and druglevel representations as 256. We set \(K=2\) in the multihead attention mechanism. To evaluate the effectiveness of the model MFFGNN, we consider three metrics, including Area Under the Receiver Operating Characteristic curve (AUROC), Area Under the Precisionrecall Curve (AUPRC) and F1.
Comparison results
To verify the validity of the proposed MFFGNN, we compare MFFGNN with stateoftheart models for DDI prediction on three datasets with different scales. Over ten repeated experiments, we give the mean and standard deviation. The best results are highlighted in bold.
Comparison on the ZhangDDI dataset We compare the MFFGNN model with stateoftheart models on the ZhangDDI dataset, and the results are shown in Table 2. The results of these baselines are obtained from Table 2 in Ref. [42]. As can be seen, the methods considering multiple features, such as Ens, SEALC/AI, NFPGCN and MIRACLE, perform better than the methods considering only one feature. However, the MFFGNN has the best performance. MFFGNN considers not only the topological structure information in molecular graphs and the interaction information between drugs, but also the local chemical context in SMILES sequences. This indicates that multitype feature fusion can improve the performance of the model.
Comparison on the ChChMiner dataset Because the ChChMiner dataset lacks fingerprints and sideeffect information, we only compare the MFFGNN with the graphbased models, and the results are shown in Table 3. The results of these baselines are obtained from Table 3 in Ref. [42]. As shown in Table 3, MFFGNN outperforms all baselines in all metrics, indicating that MFFGNN still maintain its effectiveness on the dataset with few labeled data. In addition, we obtain labeled training data with different amounts by adjusting the proportion of the training set on the ChChMiner dataset. This can analyze the robustness of the MFFGNN. We compare MFFGNN with other methods, and the results are shown in Fig. 5a. The results show that MFFGNN has high performance even in a small amount of labeled data. The reason could be that (i) our model fuses topological structure, local chemical context and DDI relationships; (ii) our model extracts both the global features for the molecular graph and the local features for the atoms of the molecular graph; (iii) our model sets a gating mechanism for each graph convolution layer to prevent oversmoothing when stacking multilayer GCN.
Comparison on the DeepDDI dataset To verify the scalability of MFFGNN, we perform comparative experiments on the DeepDDI dataset, and the results are shown in Table 4. Because there may be missing information in the largescale dataset, we only choose the SSPMLP model. And the NFPGCN model has worse performance and space limitation. We also ignore the experimental results. We use 881 dimensional molecular fingerprints as the initial node features in the DDI graph for DDIs prediction. Meanwhile, we degrade multitype DDI prediction methods and obtain binary prediction results on DeepDDI dataset.
As shown in Table 4, MFFGNN has high AUROC, AUPRC and F1. The MFs model is relatively poor in all metrics, which only contains one drug feature. Single feature can not comprehensively represent drug information, which will ultimately affect the prediction results. However, MFFGNN integrates the features from drug sequences and molecular graphs to input into DDI graph, so that a more comprehensive drug information can be learned. Although the SSIDDI and MIRACLE models have higher AUROC metric than MFFGNN, MFFGNN has the highest AUPRC and F1 values. In general, the AUPRC metric is more important than the AUROC metric, because it penalizes false positive DDIs better. F1 focuses on the proportion that can correctly predict DDIs. The imbalance of the data in the DeepDDI dataset may have a negative impact on the AUROC metrics of our model. However, this does not affect the performance of MFFGNN.
Crossdataset evaluations To further evaluate that MFFGNN has good generalization performance, we perform crossdataset evaluations. One dataset serves as the training set, while the other two serve as test sets. Because of the poor performance of other methods, we compare MFFGNN to three methods, including GAT, SEALC/AI and MIRACLE, and the results are shown in Fig. 6. As shown in figures, MFFGNN outperforms the other methods in AUROC, AUPRC and F1. From the above results, it can be shown that our model can predict drugdrug interaction with steady accuracy, independent of the scale of the datasets. Through this experiment, we can also verify that MFFGNN has good generalization performance.
Ablation study
In order to verify the validity of each type of feature of drugs, we carry out DDI predictions using each type of feature or combination of feature on ChChMiner datasets. The experimental results are shown in Table 5. The best results are highlighted in bold.
As shown in Table 5, deleting any one of these three types of the features will damage performance. The performance is best when the three types of features are considered simultaneously. In addition, among single feature, considering only the interaction information between drugs or the topological information of the molecular graph, the model has the great performance. Among pairwise feature combinations, considering the interaction information between drugs and the topological information of the molecular graph simultaneously performs best, and pairwise feature combinations can significantly improve performance than single feature. This suggests that multifeature integration can better represent drugs and improve prediction results.
Our model considers the global features for the molecular graph and the local features for the atoms of the molecular graph. In order to study its effectiveness, we design a variant, namely GWU. GWU ignores the global information in molecular graphs. As shown in Table 6, deleting the global features will damage performance. To study the validity of contrastive learning, we design a variant, called Contrastive. This variant removes the contrastive learning from the framework. As shown in Table 6, Contrastive is inferior to MFFGNN in all metrics. The results show that contrastive learning is beneficial to assist drug feature learning.
MFFGNN contains a GCN encoder with the gating mechanism to fully utilize the neighborhood information of different order. In order to study its effectiveness, we conduct a comparative experiment based on whether there is gating or not, and the results are shown in Table 6. The performance of the model without gating is lower than that of the model with gating. It can be proved that GCN encoder with gating is beneficial to predict DDI. From Fig. 5b, we can intuitively see the effectiveness of each component of the proposed MFFGNN.
Parameter analysis
In this section, we analyze several key parameters in the model by performing experiments on the ZhangDDI dataset, including \(\alpha\) in the objective function of our model, the dimensionality of drug representation \(d_{g}\), sequence length \(L_{s}\), learning rate \(l_{r}\), the number of GCN layers \(L_{m}\) and k of the khead attention in the MGFEM module. We study the influence of different key parameters settings on MFFGNN by fixing other parameters.
In order to study the optimal setting of \(\alpha\) in the objective function of our model, we vary \(\alpha\) from 0.1 to 1.0 and fix other parameters, the results are shown in Fig. 7a. We observe that the three metrics are optimal when \(\alpha\) is set to 0.9. On the whole, the nonzero nature of \(\alpha\) proves the importance of contrastive learning in the model.
When training the BiGRU, we need to input a fixsized matrix. However, the length of SMILES sequences varies. Therefore, we fix the length of the input sequence at some value and apply zeropadding and cutting operations. To study the optimal setting of sequence length, we vary \(L_{s}\) from 50 to 250 and fix other parameters, the results are shown in Fig. 7b. Because most of the SMILES sequences in the dataset are less than 150 and greater than 100, the model performance is optimal when \(L_{s}=150\). When \(L_{s}=150\), most of the sequences do not need to be cut, and little information is lost. But, when \(L_{s}=100\), most of the sequences will lose information, and the performance is low. When the sequence length is greater than 150, even if zeropaddings are applied, the performance degradation could be trivial, because it contains enough sequence information.
In order to study the optimal setting of \(d_{g}\), we change it from 2 to 1024 and fix other parameters, and the results are shown in Fig. 7c. When \(d_{g}\) is set to 256, the three metrics are optimal, and the model achieves the best performance. Specifically, with the increase of the dimensionality of drug representation, MFFGNN can extract more useful information. However, a too high dimensionality may increase noise and lead to performance degradation. Similarly, in order to study the optimal setting of \(l_{r}\), we change \(l_{r}\) with \(\{0.01,0.001,0.0001,0.00001\}\) and fix other parameters, the results are shown in Fig. 7d. When \(l_{r}\) = 0.0001, the model performance is best.
In order to study the optimal setting of \(L_{m}\) and k of the khead attention in the MGFEM module, we change it from 1 to 4 and fix other parameters, the results are shown in Fig. 7e, f. For k of khead attention, when \(k=2\), the model performance is the best. As seen from the figure, as the \(L_{m}\) increases, the MFFGNN performance improves. When \(L_{m}=3\), the three metrics are optimal and the model achieves the best performance. However, too many layers may cause overfitting and lead to performance degradation.
Discussions
DrugDrug prediction has always been a worthy research direction in pharmacology. Most of the existing methods for predicting drugdrug interactions only consider single drug feature. However, single drug feature cannot comprehensively represent drug information, which will ultimately affect the prediction results. Our proposed model takes into account not only the topological structure information in molecular graphs and the interaction information between drugs, but also the local chemical context in SMILES sequences. Multiple drug features will represent the drug information more comprehensively. We perform DDI predictions using each type of feature or combination of features, and the experimenta results are shown in Table 5. When the three types of features are considered simultaneously, the model has the best performance.
When extracting information from the molecular graph, we extract the local feature of the atoms and the global feature of the whole molecular graph. This facilitates the remote propagation of the information in graph. We demonstrate the importance of the global features of the molecular graphs in the ablation experiments, and the results are given in Table 6. In addition, To verify evaluate that MFFGNN has good generalization performance, we perform crossdataset evaluations, and the results are given in Fig. 6. As shown in figures, our model can predict drugdrug interaction with stable accuracy, regardless of the scale of the dataset. However, our model also has some limitations, for example, it does not extend to multitype DDI prediction tasks. In future work, we will further generalize the model to predict multitype DDIs events.
Conclusions
In this paper, we propose a novel endtoend learning framework for DDI prediction, namely MFFGNN, which can efficiently fuse the information from drug molecular graphs, SMILES sequences and DDI graphs. The MFFGNN model utilizes the molecular graph feature extraction module to extract global and local features in molecular graphs. Moreover, in the multitype feature fusion module, we set up the gating mechanism to control how much neighborhood information is passed to the node. We perform extensive experiments on multiple real datasets. The results show that the MFFGNN model consistently outperforms other stateoftheart models.
Availability of data and materials
The datasets generated and/or analysed during the current study are available in the Github repository, https://github.com/kaola111/mff
Abbreviations
 MFFGNN:

Multitype Feature Fusion based on Graph Neural Network
 DDIs:

DrugDrug interactions
 SMILES:

Simplified Molecular Input Line Entry System
 GNN:

Graph neural network
 GCN:

Graph convolution network
 MLP:

Multilayer perceptro
 SSFEM:

SMILES sequence feature extraction module
 MGFEM:

Molecular graph feature extraction module
 MFFM:

Multitype feature fusion module
 GWU:

Graph warp unit
 BiGRU:

Bidirectional gate recurrent unit
References
Zhang W, Jing K, Huang F, Chen Y, Li B, Li J, Gong J. Sflln: a sparse feature learning ensemble method with linear neighborhood regularization for predicting drugdrug interactions. Inf Sci. 2019;497:189–201.
Yan C, Duan G, Zhang Y, Wu FX, Pan Y, Wang J. Predicting drugdrug interactions based on integrated similarity and semisupervised learning. IEEE/ACM Trans Comput Biol Bioinform. 2020;2:1147.
Zhang Y, Qiu Y, Cui Y, Liu S, Zhang W. Predicting drugdrug interactions using multimodal deep autoencoders based network embedding and positiveunlabeled learning. Methods. 2020;179:37–46.
Zhu J, Liu Y, Zhang Y, Li D. Attribute supervised probabilistic dependent matrix trifactorization model for the prediction of adverse drugdrug interaction. IEEE J Biomed Health Inf. 2020;25(7):2820–32.
Qiu Y, Zhang Y, Deng Y, Liu S, Zhang W. A comprehensive review of computational methods for drugdrug interaction detection. IEEE/ACM Trans Comput Biol Bioinform. 2021;3:7487.
Deng Y, Qiu Y, Xu X, Liu S, Zhang Z, Zhu S, Zhang W. Metaddie: predicting drugdrug interaction events with fewshot learning. Brief Bioinform. 2022;23(1):514.
Deng Y, Xu X, Qiu Y, Xia J, Zhang W, Liu S. A multimodal deep learning framework for predicting drugdrug interaction events. Bioinformatics. 2020;36(15):4316–22.
Zhang W, Chen Y, Li D, Yue X. Manifold regularized matrix factorization for drugdrug interaction prediction. J Biomed Inform. 2018;88:90–7.
Huang K, Xiao C, Hoang T, Glass L, Sun J. Caster: Predicting drug interactions with chemical substructure representation. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2020;34:702–9.
Li P, Wang J, Qiao Y, Chen H, Yu Y, Yao X, Gao P, Xie G, Song S. An effective selfsupervised framework for learning expressive molecular global representations to drug discovery. Brief Bioinform. 2021;22(6):109.
Wang F, Lei X, Liao B, Wu FX. Predicting drugdrug interactions by graph convolutional network with multikernel. Brief Bioinform. 2022;23(1):511.
Feeney A et al. Relation matters in sampling: A scalable multirelational graph neural network for drugdrug interaction prediction. arXiv preprint arXiv:2105.13975 2021.
Purkayastha S, Mondal I, Sarkar S, Goyal P, Pillai JK. Drugdrug interactions prediction based on drug embedding and graph autoencoder. In: 2019 IEEE 19th International Conference on Bioinformatics and Bioengineering (BIBE), 2019;547–552 . IEEE
Feng YH, Zhang SW, Shi JY. Dpddi: a deep predictor for drugdrug interactions. BMC Bioinform. 2020;21(1):1–15.
Dai Y, Guo C, Guo W, Eickhoff C. Drugdrug interaction prediction with wasserstein adversarial autoencoderbased knowledge graph embeddings. Brief Bioinform. 2021;22(4):256.
Lyu T, Gao J, Tian L, Li Z, Zhang P, Zhang J. Mdnn: a multimodal deep neural network for predicting drugdrug interaction events. Science. 2019;5:1147.
Yu Y, Huang K, Zhang C, Glass LM, Sun J, Xiao C. Sumgnn: multityped drug interaction prediction via efficient knowledge graph summarization. Bioinformatics. 2021;37(18):2988–95.
Nyamabo AK, Yu H, Shi JY. Ssiddi: substructuresubstructure interactions for drugdrug interaction prediction. Brief Bioinform. 2021;22(6):133.
Nyamabo AK, Yu H, Liu Z, Shi JY. Drugdrug interaction prediction with learnable sizeadaptive molecular substructures. Brief Bioinform. 2022;23(1):441.
Chen Y, Ma T, Yang X, Wang J, Song B, Zeng X. Muffin: multiscale feature fusion for drugdrug interaction prediction. Bioinformatics. 2021;7:1148.
Bahdanau D et al. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 2014.
Battaglia PW, Pascanu R, Lai M, Rezende D, Kavukcuoglu K. Interaction networks for learning about objects, relations and physics. Science. 2016;2:7740.
Ishiguro K, Maeda Si, Koyama M. Graph warp module: an auxiliary module for boosting the power of graph neural networks in molecular graph analysis. arXiv preprint arXiv:1902.01020 2019.
Duke JD, et al. Literature based drug interaction prediction with clinical assessment using electronic medical records: novel myopathy associated drug interactions 2012.
Takeda T, Hao M, Cheng T, Bryant SH, Wang Y. Predicting drugdrug interactions through drug structural similarities and interaction networks incorporating pharmacokinetics and pharmacodynamics knowledge. J Cheminform. 2017;9(1):1–9.
Vilar S, Uriarte E, Santana L, Lorberbaum T, Hripcsak G, Friedman C, Tatonetti NP. Similaritybased modeling in largescale prediction of drugdrug interactions. Nat Protoc. 2014;9(9):2147–63.
Fokoue A, Sadoghi M, Hassanzadeh O, Zhang P. Predicting drugdrug interactions through largescale similaritybased link prediction. In: European Semantic Web Conference, 2016;774–789 . Springer
Ma T, Xiao C, Zhou J, Wang F. Drug similarity integration through attentive multiview graph autoencoders. arXiv preprint arXiv:1804.10850 2018.
Kastrin A, Ferk P, Leskošek B. Predicting potential drugdrug interactions on topological and semantic similarity features using statistical learning. PLoS ONE. 2018;13(5):0196865.
Ryu JY, et al. Deep learning improves prediction of drugdrug and drugfood interactions. Proc Natl Acad Sci. 2018;115(18):4304–11.
Xu N et al. Mrgnn: Multiresolution and dual graph neural network for predicting structured entity interactions. arXiv preprint arXiv:1905.09558 2019.
Ma T et al. Genn: predicting correlated drugdrug interactions with graph energy neural networks. arXiv preprint arXiv:1910.02107 2019.
Shang J, Xiao C, Ma T, Li H, Sun J. Gamenet: Graph augmented memory networks for recommending medication combination 2018.
Hamilton, W.L., Ying, R., Leskovec, J.: Inductive representation learning on large graphs. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017;1025–1035
Veličković P et al. Graph attention networks. arXiv preprint arXiv:1710.10903 2017.
Schlichtkrull M, Kipf TN, Bloem P, Van Den Berg R, Titov I, Welling M. Modeling relational data with graph convolutional networks. In: European Semantic Web Conference, 2018;593–607 . Springer
Rahimi A et al. Semisupervised user geolocation via graph convolutional networks. arXiv preprint arXiv:1804.08049 2018.
Zitnik M, et al. Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics. 2018;34(13):457–66.
Duvenaud D et al. Convolutional networks on graphs for learning molecular fingerprints. arXiv preprint arXiv:1509.09292 2015.
Lin, X., et al.: Kgnn: Knowledge graph neural network for drugdrug interaction prediction. In: IJCAI, 2020;380:2739–2745.
Bai Y et al. Bilevel graph neural networks for drugdrug interaction prediction. arXiv preprint arXiv:2006.14002 2020.
Wang Y et al. Multiview graph contrastive representation learning for drugdrug interaction prediction. In: Proceedings of the Web Conference 2021, 2021;2921–2933.
Landrum G. RDKit: a software suite for cheminformatics, computational chemistry, and predictive modeling. London: Academic Press; 2013.
Chung J, Gulcehre C, Cho K, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 2014.
Quan Z et al. A system for learning atoms based on long shortterm memory recurrent neural networks. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2018;728–733. IEEE
Zhang W, et al. Predicting potential drugdrug interactions by integrating chemical, biological, phenotypic and network data. BMC Bioinformatics. 2017;18(1):1–12.
Marinka Zitnik SM, Rok Sosič, Leskovec J. BioSNAP Datasets: Stanford Biomedical Network Dataset Collection. http://snap.stanford.edu/biodata 2018
Kipf TN, Welling M. Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 2016
Li J et al. Semisupervised graph classification: A hierarchical graph perspective. In: The World Wide Web Conference, 2019;972–982
Kim S, Chen J, Cheng T, Gindulyte A, He J, He S, Li Q, Shoemaker BA, Thiessen PA, Yu B, et al. Pubchem 2019 update: improved access to chemical data. Nucleic Acids Res. 2019;47(D1):1102–9.
Chen X, Liu X, Wu J. Drugdrug interaction prediction with graph representation learning. In: 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2019;354–361. IEEE
Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 2014
Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010;249–256. JMLR Workshop and Conference Proceedings
Srivastava N, et al. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014;15(1):1929–58.
Acknowledgments
Not applicable.
Funding
This work was supported by the Artificial Intelligence Program of Shanghai (2019RGZN01077), and the National Nature Science Foundation of China (12001370).
Author information
Authors and Affiliations
Contributions
CH, YL, HL and XQ conceived the experiments, CH and YL conducted the experiments, HL, HZ, YM, LL and XZ analysed the results. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
He, C., Liu, Y., Li, H. et al. Multitype feature fusion based on graph neural network for drugdrug interaction prediction. BMC Bioinformatics 23, 224 (2022). https://doi.org/10.1186/s12859022047632
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12859022047632
Keywords
 Multitype feature fusion
 Graph neural network
 Gating mechanism
 Link prediction