 Research
 Open Access
 Published:
A novel method for drugtarget interaction prediction based on graph transformers model
BMC Bioinformatics volume 23, Article number: 459 (2022)
Abstract
Background
Drugtarget interactions (DTIs) prediction becomes more and more important for accelerating drug research and drug repositioning. Drugtarget interaction network is a typical model for DTIs prediction. As many different types of relationships exist between drug and target, drugtarget interaction network can be used for modeling drugtarget interaction relationship. Recent works on drugtarget interaction network are mostly concentrate on drug node or target node and neglecting the relationships between drugtarget.
Results
We propose a novel prediction method for modeling the relationship between drug and target independently. Firstly, we use different level relationships of drugs and targets to construct feature of drugtarget interaction. Then, we use line graph to model drugtarget interaction. After that, we introduce graph transformer network to predict drugtarget interaction.
Conclusions
This method introduces a line graph to model the relationship between drug and target. After transforming drugtarget interactions from links to nodes, a graph transformer network is used to accomplish the task of predicting drugtarget interactions.
Background
It is well known that there are tens of thousands of diseases that threaten human health. Drug discovery is an important research area that urgently needs to be explored . At the same time, the rapid development of computer technology has sparked a wave of interdisciplinary collaboration. In particular, with the aid of machine learning and deep learning, bioinformatics can effectively improve the efficiency of drug discovery. Drugtarget interactions (DTIs) prediction aims to identify the targets of drug molecules, which plays a crucial role in the drug discovery process and has become a hot topic in computeraided drug discovery [1]. Compared with traditional drug discovery models, DTIs prediction can effectively reduce the cost of drug discovery [2].
Traditional methods of DTIs prediction are mostly based on machine learning. Recent works, such as the fuzzy bipartite local model [3], multioutput prediction method [4], and superior Bayesian personalized ranking method [5] are representative methods. Ding et al. [3] developed a fuzzy bipartite local model based on a fuzzy least squares support vector machine and multicore learning to predict DTIs. They first applied multicore learning to fuse multiple drugs and targets, and finally used fuzzy bipartite local models to infer unknown DTIs. Pliakos et al. [4] proposed DTIs prediction as a multioutput prediction problem and solved it by learning an ensemble of multioutput biclustering trees on a reconfigured network. Ye et al. [5] proposed an Adversarial Bayesian Personalized Ranking model that first generated ternary biasedorder relations for drug targets, then used the biasedorder relations to train a drug and target latent factor matrix, and finally obtained the score ranking for DTIs prediction from the inner product of latent factors.
With the rapid development of deep learning methods, DTIs methods have been proposed as a deep learning approach for target prediction and drug repurposing in heterogeneous druggenedisease networks, which greatly facilitates target identification and advances the process of drug repurposing. Sun et al. [6] proposed an autoencoderbased DTI prediction method that projects drug features to the protein space via a multilayer encoder and then to the disease space via a decoder. Xuan et al. [7] proposed methods to integrate multiscale adjacent topologies, multiple similarities, associations, and drug and proteinrelated interactions, which used a fully connected selfencoder learning framework to learn lowdimensional feature representations of nodes in heterogeneous networks, and then applied a multilayer convolutional neural network to generate the final predictions. Howevertraditional methods are frequently used for small samples, and extracting complex graph structure information is difficult.
Since DTIs networks can be modeled as networks, many network based methods have emerged at this stage to predict DTIs. Manoochehri et al. [8] proposed a network topologybased framework for predicting interacting and noninteracting drugtarget pairs that is capable of learning complex drugtarget topological features. Jin et al. [9] proposed the multiresolutional collaborative heterogeneous graph convolutional AutoEncoder method for DTIs prediction, which fused and assigned weights to embeddings of various types of links and continuously added adjacent embeddings by gated recurrent units before fusing them together to form the final embedding. Yue et al. [10] proposed a method for bipartite DTI relations based on heterogeneous network embedding that decomposed a heterogeneous DTI network into three subnetworks. A random forest model was used to predict new DTIs by combining the features of a bipartite DTI network for drugtarget interactions, a drugbased similarity network, and a targetbased similarity network. However, current deep learning methods are simpler for drugprotein interactions and cannot extract deeplevel interaction information.
The existing DTI prediction methods are excellent, but there are still some problems. Researchers normally solely consider drugprotein interactions and overlook drugprotein interactions between two drugprotein pairs. In addition, the relationship between nodes and the whole heterogeneous graph is often neglected. In this paper, we introduce a line graph with drugprotein pairs as vertices and propose a drugtarget interaction prediction method based on a graph transformer network (DTIGTN). The main contributions of our method are as follows.

Current approaches for DTIs prediction are limited to simple drugprotein interactions. To address this problem, we constructed a drugprotein pair interaction line graph with the drugprotein interactions as vertices, which allows us to extract more information about drugprotein pair interactions.

Traditional models place more emphasis on the node’s neighbor relationship and less emphasis on the node’s relationship to the whole heterogeneous graph. To solve this problem, we employ the GTN model to determine the relationship between each interaction node and the entire heterogeneous graph.

Our method contributes to increasing the efficiency of DTIs prediction. The experimental results on the Peng et al. [11] dataset show that our method performs well on both AUROC and AUPR metrics.
The full paper is divided into five parts, which are organized as follows. In background, we introduce the background of the study and presents the main research contents and contributions of this paper in view of some current problems of drugtarget interaction prediction and the current status of domestic and international research. There are the related works in the field of drug target prediction and the shortcomings of the current work in Related Works. In methods, we propose a drugtarget prediction method based on the GTN model, which transforms the drugtarget map into a drugprotein pair line graph and predicts and evaluates it by the GTN model. In addition we describe in detail each module of our method. In experiment, we present the data set used in this paper, the validation metrics and the final results of multiple experiments. After extensive review, it is found that the drugprotein pairs with the highest prediction scores have practical significance, thus confirming the effectiveness of this method. In conclusion, we summarize the entire work and point out the limitations of this study and the outlook for future work.
Related works
Drugtarget interactions (DTIs) prediction plays an important role in finding potential therapeutic compounds. Moreover, DTIs prediction is an indispensable step in drug repositioning [12] and drug discovery [13]. DTIs prediction is also helpful to identify new ligands for new drugs and targets by identifying the interactions between drug compounds and protein targets. DTIs prediction methods can be roughly divided into traditional methods and deep learning methods. Among a large number of deep learning methods, networkbased methods perform well in predicting DTIs. Therefore, the following focuses on the traditional methods and the networkbased methods in deep learning.
Traditional DTIs prediction methods
Traditional DTIs prediction methods are mainly divided into two categories: (1) methods based on molecular docking simulation [14]. (2) Ligandbased approaches [15]. Based on basic biophysical principles and the crystal structure of the target binding site, molecular docking methods often yield good prediction of druggability. In contrast to conventional ligandprotein docking, reverse ligandprotein docking aims to seek potential protein targets by screening an appropriate protein database [16]. Ligandbased approaches are often designed based on the principle of structuredependent properties. These methods use structural similarity to search similar compounds in terms of activities or treatment mechanisms. Although the abovementioned methods have shown high prediction accuracy. Those molecular docking methods rely on the threedimensional structure of the target protein [17]. The results of ligandbased methods may be less than ideal when there are insufficient data on known ligands [18]. However, most current networkbased approaches ignore the information relationship between the nodes and the whole heterogeneous graph.
Network based methods
In recent years, many networkbased methods have been proposed to predict potential DTIs because DTIs networks can be modeled as networks. There are some advantages of these methods that do a better use of Network Structure Information [19]. Manoochehri et al. [8] proposed a semisupervised bipartite graph model. The model integrated drugdrug and proteinprotein relationships into a bipartite graph. Jin et al. [9] proposed a multiresolution collaborative heterogeneous graph convolution autoencoder for DTIs prediction that collaboratively aggregated the learned embeddings from different types of links in heterogeneous drugtarget networks, thus leading to more interpretable embeddings for each drug and target node. Tang et al. [20] proposed a heterogeneous network edge denoising model based on association exponential kernel matrix and potential global association. This method transformed the DTIs prediction problem into a noise reduction problem on heterogeneous networks. The heterogeneous network was constructed by combining drug and target kernel matrices and the existing DTIs network. Furthermore, the method not only used the information of associations of the nearest neighbors to perform DTIs prediction, but also incorporated the global association between drugs and targets to reduce the sparsity of DTIs network and improve prediction accuracy. Yue et al. [10] proposed a heterogeneous network embedding DTIs model, which can extract distinct features from every subnetwork of the heterogeneous DTIs network and concatenate these features by the topological information between the subnetworks. This method makes better use of the characteristics of DTIs relationships between both sides and assists similar information and targets related to drugs.
In recent years, graph neural networks have become another hot topic of graph mining. Due to the rapid development of graph machine learning, different graph neural networks have benn proposed [21]. Among them, heterogeneous graph neural network (HGN) [22], hraph attention networks (GAT) [23], Topology adaptive graph convolutional networks (TAG) [24], and residual gated graph convnets (RGG) [25] are representative models [25]. Graphs provide a universal way to represent data, and many other types of data can also be transformed into graphs. Drug side effect prediction and DTIs identification are essentially edge prediction problems. Cheng et al. [26] proposed an endtoend deep learning approach based on a graph attention network and multiple selfattention mechanisms to predict DTIs. The feature extraction of drugs and proteins is improved by using graph attention network and a multihead selfattention mechanism. However, they only use onedimensional data to represent the structural characteristic information of drugs and proteins, and much advanced characteristic information of drugs and proteins is lost in prediction. Peng et al. [11] improved the prediction method by learning lowdimensional vector representations of features from heterogeneous networks, and adopting convolution neural networks (CNN) as classification models. Wang et al. [27] proposed a simple and efficient ligand protein binding prediction model based on a residual graph neural network (GNN) and attention. In this network, the complex graph features are learned through the residual GNN. They integrate these features into the attention module to form a complex protein vector for multilayer perceptron processing. However, most graph neural networkbased models only examine the relationships between drugs and proteins and ignore many of the relationships between each group of drugs and proteins. Based on these shortcomings, this paper proposes a graph transformerbased method for predicting DTIs, taking into account the relationship between each group of drugprotein pairs and the information of nodes and the full graph, as a way to predict the interactions between drug targets.
Methods
We propose a drugtargeted interaction prediction method based on graph transformer network (DTIGTN). It not only introduces line graphs fusing the relationships between each group of drugprotein pairs, but it also allows GTN models to extract relationships between nodes and the entire heterogeneous graph. Figure 1 depicts the DTIGTN workflow. We first aggregate multiple drug and protein information sources using Jaccard similarity coefficients to generate similarity matrices for multiple drug and protein networks and then randomly walk the similarity matrices using the restart random walk (RWR) method to generate highdimensional feature vectors for drugs and proteins. Finally we use principal component analysis (PCA) models to reduce the highlatitude feature vectors of dimers.
The second stage is to create the drugprotein pair interaction line graph. To do so, we first created the drugprotein pairs by selecting the medications and proteins that have an interaction relationship based on the drugprotein adjacency matrix information. Then, using certain guidelines, we generate the edges between the drugprotein pairs as nodes. If the components of two drugprotein pairs have the same drug or protein, the edges between them are formed. Following completion of the preceding process, the interaction line graphs of drugprotein pairs and node features are combined and input into the GTN model so that features can be extracted, and the fully connected layer predicts the association between each two drugprotein pairs to generate prediction results and prediction probabilities.
Heterogeneousnetworkbased feature extractor
Heterogeneous networks are constructed based on the following two types of networks. The first type is drugrelated networks, including drugdrug interactions, drugdisease associations, drugside effect associations, and drug similarity (based on the chemical structure of the drug). The other type is the proteinrelated network, including proteindisease association, proteinprotein interaction, and protein similarity (based on the primary sequence of the protein). First, we apply the Jaccard similarity method to each association matrix and interaction matrix to construct a similarity matrix.
In the drugdisease interaction matrix, for example, two rows of the adjacent matrix represent sets A and B, which represent the interactions between two different drugs and all diseases. The Jaccard coefficient of these two sets is the ratio of the size of the intersection of A and B to the size of the concurrent set of A and B. It is a measure of the similarity of two sets. This is how it is defined:
The Jaccard similarity coefficient is used to compare the similarity and difference between finite sample sets. The greater the value of the Jaccard coefficient, the greater the similarity of the samples. The similarity matrix represents the similarity between each drug or protein node and all features in the column nodes. For example, element \(S_{i,j}\) in the original adjacency matrix represents the similarity between row i and row j.
In the next step, the RWR method [28] is used for each similarity matrix. The basic idea of the random wander method is to traverse a graph starting from a vertex or a series of vertices. At any vertex, the traverser will randomly jump to any vertex in the graph with probability P, which is called the jump occurrence probability. A probability distribution is derived after each tour, which shows the probability that each vertex in the graph will be visited.
The RWR method is an improvement on the random wandering method. The traverser starts from a node in the graph and faces two choices at each step, randomly selecting an adjacent node or returning to the starting node. The RWR method captures the multifaceted relationships between two nodes and the entire graph structure.
According to the RWR principle, the greater the similarity between two nodes, the greater their transfer probability. Thus, if two nodes’ distribution states are similar, they can be considered to be in a similar position with respect to other nodes in the network. This is because the RWR principle states that the greater the similarity between two nodes, the greater the likelihood of a leap between them [29].
Taking the drugdisease similarity matrix \(A_{i,j}\) as an example, we can obtain the drugdisease transition transfer matrix B based on \(A_{i,j}\), where the elements \(B_{i,j}\) describe the transition probabilities of drug and disease node j, defined as follows:
Then, the final drugdisease diffusion state matrix is obtained by iterative convergence as follows:
During the random wander, each element stores the probability of entering the disease node after iteration from drug node i,\(s_i^t\) is the result after t is iterated, \(p_r\) denotes the probability of restart, and \(e_i\) is represented as an ndimensional unit matrix.
After transforming all similarity matrices into diffusion state matrices, all diffusion state matrices of a drug network and a protein network are stitched together to yield two drug network and protein network diffusion state matrices. The rows of the drug diffusion matrix represent different drugs, and the columns represent the four nodes of drug, disease, side effect, and drug, with the element \(d_{i,j}\) representing the probability of transfer between the drug and node j. The protein diffusion state matrix’s rows represent different proteins, and the columns represent protein, disease, and protein nodes, with the element \(p_{i,j}\) representing the transfer probability between the protein and node j.
Principal component analysis feature selector
The diffusion state matrix vector obtained in the previous step is highdimensional, noisy, and incomplete. To obtain the basic features, we manipulate the data using the PCA model [30], and the main processes of the PCA model are shown in the supplementary information.
The goal of PCA is to map highdimensional data into a lowdimensional space by linear projection, and to maximize the information content of the data in the projected dimension, to use fewer data dimensions while retaining the characteristics of more original data points. Therefore, PCA reduces the dimensionality of the original features while keeping the “information content” as much as possible. In this study, we reduce both drug and protein features to 125 dimensions. In this study, we reduced both drug and protein features to 125 degrees.
Graph transformer based interaction predictor
The transformer model, introduced by Google in 2017, is still widely used today. This model was first used for machine translation tasks, and it allowed for fast parallelism using the selfattention mechanism. The most criticized drawback of RNNs is slow training, and the transformer model can improve on this drawback. Dwivedi et al. [31] extended the transformer model to graphs to preserve the properties of the graph. Specifically, given the node feature \(H^{(l)} = \{ H^{(l)}_1,H^{(l)}_2,\cdots ,H^{(l)}_n \}\), the multihead attention of each edge from j to i is calculated as follows.
Among Formula (4) is the exponential scale dotproduct function and d is the hidden size of each head. For the Cth head attention,first transform the source feature and distant feature into \(q_{c,i}^{(l)}\in \mathbb {R}^{d}\) and \(k_{c,i}^{(l)}\in \mathbb {R}^{d}\) using different trainable parameters \(W_{c,q}^{(l)}, W_{c,k}^{(l)},b_{c,q}^{(l)},b_{c,q}^{(l)}\),and then encode the edge features \(e_{i,j}\) and add them to the key vector as additional information in each layer.
After obtaining the multihead attention of the graph, message aggregation is performed for distance j to source i:
where C is the number of multiheaded attentions,  is the connection to attentions, and \(v_c\) is used instead of the distance feature \(h_j,j\in \mathbb {R}^{d}\) for weighted sum.
Furthermore, according to Shi et al. [32]. Use a multiheaded attention matrix instead of the original normalized adjacency matrix as the transfer matrix for message passing, use a gated residual connection between layers to prevent the model from being too smooth, and finally apply graph transformer on the final output layer to apply averaging on the multiheaded output and remove the nonlinear transformation.
Experiment
Dataset
We evaluated the performance of the DTIGTN method using a drugtarget interaction prediction task.
We obtain the dataset from Peng’s paper [11], which contains 12,015 nodes and 1,895,445 edges. In this dataset all isolated nodes are excluded. This heterogeneous network integrates four types of nodes (drug, protein, disease and side effect) and six types of edges (drugprotein interaction, drugdrug interaction, drugdisease association, drugside effect association, proteindisease association and proteinprotein interaction). Peng et al. also extract information from known DTIs and drugdrug interactions based on multiple databases to extract multiple information, drug nodes from the DrugBank database [33] and protein nodes and protein interactions from the Human Protein Reference Database [34]. Disease nodes, drugdisease and proteindisease associations were extracted from the Comparative Toxicogenomics Database [35]. Side effect nodes and drug side effect associations were obtained from the side effect resource [36].
First, we create some drugrelated and proteinrelated similarity matrices. Drugrelated similarity matrices include the drugdrug similarity matrix, drugdisease similarity matrix, drugside effect similarity matrix and drug similarity matrix. Proteinrelated similarity matrices include the proteindisease similarity matrix, proteinprotein similarity matrix and protein similarity matrix.
We next use the RWR algorithm to stitch together the diffusion state matrices of the drug and protein networks, resulting in two diffusion state matrices representing the drug and protein, respectively. The rows of the drug diffusion matrix represent different drugs, the columns represent proteins, diseases, side effects and drug nodes, and the values in the matrix represent the associations between the drugs and the four biological entities. The rows of the protein diffusion state matrix represent the different proteins. The columns indicate protein, disease, and drug nodes, and the values in the matrix show the associations between the proteins and the three biological entities. We next used the PCA model to downscale the drug diffusion state matrix and protein diffusion state matrix, yielding 708 drug feature vector matrices with 125 dimensions and 1512 protein feature vector matrices with 125 dimensions, respectively.
In the next step, we construct the line graph. First, the drug and protein nodes with the presence of edges are used as a new pair of drugprotein pair nodes according to the drugprotein interaction relationship, so that each pair contains information about the drug and the protein. Next, the edges of the line graph are constructed based on the relationship between each group of drugprotein pairs, and a new adjacency matrix representing the relationship between the drugprotein pair nodes is obtained. Finally, we obtain the new drugprotein pair node features based on splicing the 125dimensional drug features with the 125dimensional protein features.
Following completion of the preceding steps, the training and test sets were divided, with 80% of the positive and negative samples used as the training set, 10% of the positive and negative samples used as the validation set, and 10% of the positive and negative samples used as the test set. The known drugprotein interaction pairs were used as positive samples based on the known drugprotein interaction matrix, with a total of 40,058 positive samples, and the same number of negative samples as positive samples were randomly selected. The final experimental results were calculated as the mean plus or minus the standard deviation of the five training predictions, ensuring that the experimental results were accurate.
Parameters of models
For the RWR model, according to the parameters of the Peng et al. [11] model, we restart with a probability of 0.5 and a number of 20 iterations. Our original drug feature input dimension is 2832, and our protein feature input dimension is 4536, and we use the PCA model to reduce dimensionality. The dimensionality is chosen as shown in Fig. 2, and the value of AUROC/time varies with dimensionality, with 125 dimensions providing the best balance. The final dimension was set to 125. The GTN model was run for 2000 batches and optimized with the Adam method at an initial learning rate of 0.001, with the loss calculated as a crossentropy loss.
Evaluation metrics
Model testing and comparison are performed using AUROC [37] and AUPR [38] scores, which are commonly used evaluation criteria for machine learning and represent the area under the ROC curve and PR curve, respectively. The higher the score is, the higher the prediction accuracy of the model and the better the performance of the model.
The ROC curve is a curve with the probability of false positives (FPR) as the horizontal axis and the probability of true positives (TPR) as the vertical axis.
Using the classification gives the probability of a positive class for each instance. Then, by setting a threshold value such as 0.6, a probability greater than or equal to 0.6 is considered a positive class, and a probability less than 0.6 is considered a negative class. The corresponding set of (FPR, TPR) can be calculated, and the coordinate points in the plane can be obtained. As the threshold value decreases, an increasing number of instances are classified as positive classes, but these positive classes are also mixed with true negative instances, i.e., TPR and FPR will both increase. The coordinate point (0,0) corresponds to the maximum threshold value, and the coordinate point (1,1) corresponds to the minimum threshold value. The ROC curves are depicted in the Supplementary Material.
THE PR curve is a curve with recall as the horizontal axis and precision as the vertical axis.
The PR curves still reflect the classification performance well in the case of large differences in positive and negative sample proportions, as shown in the AUPR schematic in the Supplementary Information.
Baselines
The DTIs prediction task can be viewed as a binary classification problem, where known drugprotein pair interactions can be considered positive samples and unknown drugprotein pair interactions can be considered negative samples. In the experimental procedure, all positive samples were collected first, and then the number of positive samples was used as an example to randomly sample the negative samples. Next, 80% of the positive and negative sample pairs in the dataset were randomly selected as the training set to train the model parameters, 10% of the data were used as the validation set to adjust the hyperparameters of the model and for initial evaluation of the model capabilities, and finally the remaining 10% of the data were used as the test set to evaluate the generalization ability of the final model. In our experiments we compared DTIGTN with six stateoftheart graph neural network methods. Including (1) SSCGCN: Instead of using Laplacian Matrix to convolve the graph, this model uses Chebyshev polynomials as the convolution kernel, and the larggest feature is that it does not need to decompose the feature vector. (2) GAT: The shortcomings of previous problems such as graphbased convolution are addressed by using masked selfattentive layers. By sacking layers (in which nodes are able to aggregate the features of their neighbors), different weights can be assigned to different nodes in the neighborhood without any expensive matrix operations or prior knowledge of the graph structure. (3) GCN : proposes a scalable semisupervised learning method for graph structure data, which is based on an efficient variant of convolutional neural networks that can directly manipulate graphs. (4) EGC : uses a new adaptive filtering method that achieves lower memory consumption and latency and is suitable for gas pedal implementation. (5) Hypergraph: introduces hypergraph convolution and hypergraph attention in the family of graph neural networks. Hypergraph convolution defines the basic formula for performing convolution on hypergraphs, while hypergraph attention further enhances representation learning by utilizing the attention module. (6) ResGatedGraphConv: The LSTM and ConvNets models for graphs are proposed, iterating over the graph multiple times and introducing the idea of residual networks to enable the model to scale to graphs of arbitrary size. (7) In GNNFiLM, the representation of the target node of an edge is used to compute a transformation that can be applied to all incoming messages, allowing featurewise modulation of the passed information.
Performance evaluation on predicting drugtarget interactions
To ensure the accuracy of the experimental results and avoid pseudorandom results, all models are trained five times under the same conditions, with the results averaged and standard deviations added and subtracted. The final AUROC and AUPR values for each model are shown in Table 1. The AUROC value of DTIGTN is 0.9973, which is 0.0017 higher than that of the next best model DTIFilm. The AUPR value is 0.0018 higher than that of DTIFilm. In the drugtarget interaction prediction task, DTIGTN outperformed the other six stateoftheart DTIs prediction methods.
Meanwhile, Fig. 3 depicts the trends in training loss and ROC values for various methods during the training process. According to the two figures, the training loss of all seven models gradually decreases and the ROC value gradually increases as the epoch value increases, but when compared, their convergence speed differs. The DTIGTN method, which is faster and better than the other models, begins to converge after approximately 200 rounds.
In addition, we also compare DTIGTN with other models and its classical learning methods. (1) DTICDF: In this method the prediction performance of DTIS is further improved by using path classificationbased multisimilar features of DTIs heterogeneous graphs and a depthcascaded deep forestbased model (CDF). (2) DTICNN: In this method a selfcoding model with restarted random wandering and denoising is used to handle incomplete, highdimensional heterogeneous features of the data source. A deep cnn model is used to process lowdimensional feature vectors and predict the probability of interaction between each pair of drugs and proteins. (3) Random forest: In this method for each node, m features are randomly selected and the decision of each node in the decision tree is determined based on these features. Based on these m features, the best way to split them is calculated so that each tree is constructed. (4): K nearest neighbors: In this method given the training dataset, for a new input instance, find the K instances that are closest to the instance, then the new input instance belongs to the same class as the majority of these K instances.
Similarly, the experimental results were averaged over five trials plus or minus the standard deviation, and the final results are shown in Table 2. DTICDF performed the best among the other classical methods, but GTIGTN outperformed it by 0.0075 and 0.0075 for AUROC and AUPR, respectively, when compared to the other four classical methods, and DTIGTN also performed the best in the drugtarget interaction prediction task.
Table 3 compares the prediction results of our DTIGTN method to those of other models that do not use line graphs, and in this experiment, we add two representative graph neural network models for comparison. (1) NEDTP [39]: This method uses 15 heterogeneous information networks to build a similarity network, and after extracting topological information using random wandering, the gradient boosting decision tree model is used to complete the classification task. (2) Moltrans [40]: This method uses a knowledge inspired substructural pattern mining algorithm and an augmented transformer encoder to capture the relationships between substructures for a more accurate prediction of DTI interactions. The AUROC and AUPR values for the model without using the line graph in Table 3 are lower than those of the model using the line graph in Tables 1 and 2, demonstrating the effectiveness of our use of the line graph.The AUROC and AUPR values in Table 3 for the model without the line graph are lower than those in Tables 1 and 2, demonstrating the effectiveness of our use of the line graph.
Figure 4 depicts the AUROC change curves of the training, validation, and test sets during model training. The figure shows that the AUROC of the test set, which is not used in training at all, and the AUROC of the validation set are roughly equal, indicating that model training has good generalization ability and there is no risk of overfitting.
Ablation study
The ablation experiments of our model are shown in Table 4, with the GTN module removed and the line graph removed. As shown in the table, the mean AUROC and mean AUPR of the model decreased by 0.0237 and 0.0300 with the GTN module removed, and by 0.1147 and 0.0812 with this part of the transformed line graph removed. This demonstrates the utility of our GTN model and the line graph conversion module.
Case study
We divided the dataset into a training set and a test set. Predictions were made for all drug target pairs in the test set. We selected three pairs of drugprotein pairs with the top 3 prediction scores from the model prediction results for validation, and the results and scores are shown in Table 5. Each drugprotein pair includeed two drugprotein interactions. The prediction results for the three groups of drugprotein pairs are shown in Table 5. Each set of drugprotein pairs corresponds to two predicted results. For example, if A and C are drugs, B and D are proteins, and AB and CD are a set of predicted drugtarget pairs, it is demonstrated that drug A interacts with protein D and drug C interacts with protein B. Our prediction results were checked in Drugbank, and the test results were further analyzed.
5Hydroxytryptamine receptor 1A is abbreviated as HTR1A. The two pairs of drugprotein pairs with the highest predicted results were PropranololHTR1A and PindololHTR1A. Propranolol [41] has a significant affinity for HTR1A. Pindolol [42] is a beta adrenoceptor antagonist. It facilitates frontocortical dopaminergic and adrenergic transmission primarily by activation of beta 1/2ARs and, to a lesser degree, by stimulationing HTR1A receptors. In addition, the selective HTR1A receptor antagonist can slightly attenuate the pindololinduced increase in DA and NAD levels.
second set of drugprotein pairs in the prediction results are ZolmitriptanHTR1A and NaratriptanHTR1A. Zolmitriptan [42] is a novel 5hydroxytryptamine receptor agonist with proven efficacy in the acute treatment of migraine with or without preceding aura. Naratriptan [43] has a central effect in the trigeminovascular system, selectively inhibiting afferent activity in cardiovascular neurones, via HTR1B, HTR1D and HTR1A receptors.
Muscarinic acetylcholine receptor M3 is abbreviated as CHRM3. The last pair of drugprotein pairs are MivacuriumCHRM3 and PancuroniumCHRM3. Mivacurium [44] is a shortacting nondepolarizing neuromuscular blocking agent. Muscle relaxants cause bronchospasm via histamine release or by acting on muscarinic receptors. Pancuronium [45] is a neuromuscular blocker used as an adjunct to general anesthesia to facilitate tracheal intubation. Neuromuscular blocking drugs can inhibit not only nicotinic but also muscarinic receptors and thereby affect not only skeletal but also smooth muscle tone.
Conclusion
We propose a novel drugtarget prediction model based on graph transformer network (DTIGTN) in this paper. Firstly, we use seven different level relationships of drugs and targets to construct features of drugtarget interaction with jaccard similarity and random walk with restart method. Then, we use line graph to transform drugtarget interaction from nodes into links of a new graph. After that, we introduce graph transformer network to predict drugtarget interaction. We compare our model with other representative models on AUROC and AUPR values. The experiment results on DTIs network show our model is comparable with other models. Our DTIGTN method can provide a new pattern for understanding drugtarget interaction relationship.
Availability of data and materials
The dataset and code used in the current study are available at the github repository [https://github.com/q498756498/DTIGTN].
Abbreviations
 DTI:

Drugtarget interactions
 AUROC:

The area under the receiver operating characteristics curve
 AUPR:

The area under the Precision and Recall curve
 TPR:

True positive rate
 FPR:

False positive rate
 PCA:

Principal component analysis
 GTN:

Graph transformer network
 RWR:

Random walk with restart
References
Kim IW, Jang H, Kim JH, Kim MG, Kim S, Oh JM. Computational drug repositioning for gastric cancer using reversal gene expression profiles. Sci Rep. 2019;9(1):1–10.
Ganotra GK, Wade RC. Prediction of drugtarget binding kinetics by comparative binding energy analysis. ACS Med Chem Lett. 2018;9(11):1134–9.
Ding Y, Tang J, Guo F. Identification of drugtarget interactions via fuzzy bipartite local model. Neural Comput Appl. 2020;32(14):10303–19.
Pliakos K, Vens C. Drugtarget interaction prediction with treeensemble learning and output space reconstruction. BMC Bioinform. 2020;21(1):1–11.
Ye Y, Wen Y, Zhang Z, He S, Bo X. Drugtarget interaction prediction based on adversarial Bayesian personalized ranking. BioMed Res Int. 2021;2021:6690154.
Sun C, Cao Y, Wei JM, Liu J. Autoencoderbased drugtarget interaction prediction by preserving the consistency of chemical properties and functions of drugs. Bioinformatics. 2021;37(20):3618–25.
Xuan P, Zhang Y, Cui H, Zhang T, Guo M, Nakaguchi T. Integrating multiscale neighbouring topologies and crossmodal similarities for drugprotein interaction prediction. Brief Bioinform. 2021;22(5):119.
Manoochehri HE, Nourani M. Drugtarget interaction prediction using semibipartite graph model and deep learning. BMC Bioinform. 2020;21(4):1–16.
Jin X, Liu M, Wang L, He W, Huang Y, Xie M. Multiresolutional collaborative heterogeneous graph convolutional autoencoder for drugtarget interaction prediction. In: 2020 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE. 2020. p. 150–3.
Yue Y, He S. DTIHeNE: a novel method for drugtarget interaction prediction based on heterogeneous network embedding. BMC Bioinform. 2021;22(1):1–20.
Peng J, Li J, Shang X. A learningbased method for drugtarget interaction prediction based on feature representation learning and deep neural network. BMC Bioinform. 2020;21(13):1–13.
Luo Y, Zhao X, Zhou J, Yang J, Zhang Y, Kuang W, Peng J, Chen L, Zeng J. A network integration approach for drugtarget interaction prediction and computational drug repositioning from heterogeneous information. Nat Commun. 2017;8(1):1–13.
Huang Y, Zhu L, Tan H, Tian F, Zheng F. Predicting drugtarget on heterogeneous network with corank. In: International conference on computer engineering and networks. Springer; 2018. p. 571–81.
Pagadala NS, Syed K, Tuszynski J. Software for molecular docking: a review. Biophys Rev. 2017;9(2):91–102.
Boyles F, Deane CM, Morris GM. Learning from the ligand: using ligandbased features to improve binding affinity prediction. Bioinformatics. 2020;36(3):758–64.
Kolben T, Jeschke U, Reimer T, Karsten N, Schmoeckel E, Semmlinger A, Mahner S, Harbeck N, Kolben TM. Induction of apoptosis in breast cancer cells in vitro by Fas ligand reverse signaling. J Cancer Res Clin Oncol. 2018;144(2):249–56.
Wan F, Hong L, Xiao A, Jiang T, Zeng J. NeoDTI: neural integration of neighbor information from a heterogeneous network for discovering new drugtarget interactions. Bioinformatics. 2019;35(1):104–11.
D’Souza S, Prema K, Balaji S. Machine learning models for drugtarget interactions: current knowledge and future directions. Drug Discov Today. 2020;25(4):748–56.
Wu Z, Li W, Liu G, Tang Y. Networkbased methods for prediction of drugtarget interactions. Front Pharmacol. 2018;9:1134.
Tang C, Zhong C, Chen D, Wang J. Drugtarget interactions prediction using marginalized denoising model on heterogeneous networks. BMC Bioinform. 2020;21(1):1–29.
Asif NA, Sarker Y, Chakrabortty RK, Ryan MJ, Ahamed MH, Saha DK, Badal FR, Das SK, Ali MF, Moyeen SI, et al. Graph neural network: a comprehensive review on noneuclidean space. IEEE Access. 2021;9:60588–606.
Zhang C, Song D, Huang C, Swami A, Chawla NV. Heterogeneous graph neural network. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining. 2019. p. 793–803.
Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y. Graph attention networks. 2017. arXiv preprint arXiv:1710.10903.
Du J, Zhang S, Wu G, Moura JM, Kar S. Topology adaptive graph convolutional networks. 2017. arXiv preprint arXiv:1710.10370.
Bresson X, Laurent T. Residual gated graph convnets. 2017. arXiv preprint arXiv:1711.07553.
Cheng Z, Yan C, Wu F, Wang J. Drugtarget interaction prediction using multihead selfattention and graph attention network. IEEE/ACM Trans Comput Biol Bioinform. 2021. https://doi.org/10.1109/TCBB.2021.3077905.
Wang S, Shan P, Zhao Y, Zuo L. Gandti: a multitask neural network for drugtarget interaction prediction. Comput Biol Chem. 2021;92: 107476.
Lee I, Nam H. Identification of drugtarget interaction by a random walk with restart method on an interactome network. BMC Bioinform. 2018;19(8):9–18.
Buraczewski D, Dyszewski P, Iksanov A, Marynych A. Random walks in a strongly sparse random environment. Stoch Process Appl. 2020;130(7):3990–4027.
Shlens J. A tutorial on principal component analysis. 2014. arXiv preprint arXiv:1404.1100.
Dwivedi VP, Bresson X. A generalization of transformer networks to graphs. 2020. arXiv preprint arXiv:2012.09699.
Shi Y, Huang Z, Feng S, Zhong H, Wang W, Sun Y. Masked label prediction: unified message passing model for semisupervised classification. 2020. arXiv preprint arXiv:2009.03509.
Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, Sajed T, Johnson D, Li C, Sayeeda Z, et al. Drugbank 5.0: a major update to the drugbank database for 2018. Nucleic Acids Res. 2018;46(D1):1074–82.
Xu WF, Ma YC, Ma HS, Shi L, Mu H, Ou WB, Peng J, Li TT, Qin T, Zhou HM, et al. Cotargeting ck2\(\alpha\) and ybx1 suppresses tumor progression by coordinated inhibition of the pi3k/akt signaling pathway. Cell Cycle. 2019;18(24):3472–90.
Davis AP, Grondin CJ, Johnson RJ, Sciaky D, McMorran R, Wiegers J, Wiegers TC, Mattingly CJ. The comparative toxicogenomics database: update 2019. Nucleic Acids Res. 2019;47(D1):948–54.
Wadhwa S, Gupta A, Dokania S, Kanji R, Bagler G. A hierarchical anatomical classification schema for prediction of phenotypic side effects. PLoS ONE. 2018;13(3):0193959.
Sachdev K, Gupta MK. A comprehensive review of feature based methods for drug target interaction prediction. J Biomed Inform. 2019;93: 103159.
Donghua Y, Maozu G, Xiaoyan L, Shuang C. Predicted results evaluation and query verification of drugtarget interaction. J Comput Res Dev. 2019;56(9):1881.
An Q, Yu L. A heterogeneous network embedding framework for predicting similaritybased drugtarget interactions. Brief Bioinform. 2021;22(6):275.
Huang K, Xiao C, Glass LM, Sun J. Moltrans: molecular interaction transformer for drugtarget interaction prediction. Bioinformatics. 2021;37(6):830–6.
Srinivasan AV. Propranolol: a 50year historical perspective. Ann Indian Acad Neurol. 2019;22(1):21.
Yaman B, Bal R. Pindolol potentiates the antidepressant effect of venlafaxine by inhibiting 5ht1a receptor in DRN neurons of mice. Int J Neurosci. 2021;132(1):23–30.
TfeltHansen P. Naratriptan is as effective as sumatriptan for the treatment of migraine attacks when used properly. a minireview. Cephalalgia. 2021;41(14):1499–505.
AbdelMoety EM, Abou AlAlamein AM, Fawaz E, Abd ElRahman MK. A companion diagnostic for personalizing mivacurium at the pointofcare. J Electrochem Soc. 2020;167(8): 087510.
Shaya D, Isaacs L. Acyclic cucurbit [n] uriltype containers as receptors for neuromuscular blocking agents: structurebinding affinity relationships. Croat Chem Acta Arhiv za kemiju. 2019;92(2):163.
Acknowledgements
The authors thank Ms. Hongmei Wang for validation and Dr. Guishen Wang and Dr. Chen Cao for revising the paper, as well as the editors and reviewers for their valuable comments and suggestions.
Funding
The research is supported by the Educational Department of Jilin Province of China (Grant No. JJKH20210752KJ). The research is supported by natrual science foundation free orientation general project of the Jilin Provincial Department of Science and Technology “Research of the method of named entity recongnition and summary generation of judicial proceedings based on graph attention mechanism”.
Author information
Authors and Affiliations
Contributions
WHM verified the work, GF implemented the algorithm and wrote the paper, DMY wrote the paper, WGS designed the algorithm and revised the paper, CC contributed the idea and revised the paper. All authors reviewed the manuscript. All authors read approved the final manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Wang, H., Guo, F., Du, M. et al. A novel method for drugtarget interaction prediction based on graph transformers model. BMC Bioinformatics 23, 459 (2022). https://doi.org/10.1186/s1285902204812w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1285902204812w
Keywords
 Drugtarget interaction
 Graph attention network
 Line graph