Skip to main content

DGDTA: dynamic graph attention network for predicting drug–target binding affinity



Obtaining accurate drug–target binding affinity (DTA) information is significant for drug discovery and drug repositioning. Although some methods have been proposed for predicting DTA, the features of proteins and drugs still need to be further analyzed. Recently, deep learning has been successfully used in many fields. Hence, designing a more effective deep learning method for predicting DTA remains attractive.


Dynamic graph DTA (DGDTA), which uses a dynamic graph attention network combined with a bidirectional long short-term memory (Bi-LSTM) network to predict DTA is proposed in this paper. DGDTA adopts drug compound as input according to its corresponding simplified molecular input line entry system (SMILES) and protein amino acid sequence. First, each drug is considered a graph of interactions between atoms and edges, and dynamic attention scores are used to consider which atoms and edges in the drug are most important for predicting DTA. Then, Bi-LSTM is used to better extract the contextual information features of protein amino acid sequences. Finally, after combining the obtained drug and protein feature vectors, the DTA is predicted by a fully connected layer. The source code is available from GitHub at


The experimental results show that DGDTA can predict DTA more accurately than some other methods.

Peer Review reports


Drug–target interaction (DTI) prediction is a critical task in drug discovery and drug repositioning [1, 2]. Structural changes to a drug can significantly alter its binding affinity with proteins [3], making it important to predict whether a drug can bind to a specific protein. However, the traditional high-throughput screening experiments used to detect this activity are expensive and time-consuming [4]. Therefore, computing methods for DTI prediction have become popular and effective [5, 6].

DTI calculation methods focus on binary classification [2, 7], and the main goal is to determine whether a drug and a target interact with each other [8]. However, the assumed binding strength values of the given protein and the drug compound are continuous and referred to as their binding affinity. The drug–target pair prediction task is described as an affinity prediction problem [8] in which, the binding affinity score is directly used, thus creating a more realistic experiment. In addition, regression-based models are more advantageous in approximating the strength of DTIs [9], making them more conducive to the discovery of new drug compounds in the limited drug research space.

Recently, some methods [10, 11] for predicting drug–target affinity (DTA) have been developed. SimBoost [11] enhances the performance of learning-based methods by extracting features from drugs, targets, and drug–target pairs and providing them to gradient-enhanced supervised learning methods. Affinity is characterized by an inhibition constant (\({K}_{i}\)), dissociation constant (\({K}_{d}\)), changes in free energy measures (\(\delta G\),\(\delta H\)), half-maximal inhibition constant (\(I{C}_{50}\)) [12], half-maximal activity concentration (\(A{C}_{50}\)) [13], KIBA score [14] and scoring. Stronger affinity readings indicate greater DTIs [15]. In the KronRLS [10, 16] model, the Kronecker products of a drug and target are constructed by drug and protein pairs to calculate the kernel K of the pairs, which is entered into a regularized least-squares regression model (RLS) to predict the binding affinity.

With the success of deep learning, various deep networks have been used for DTA prediction [8, 13], and have achieved better performance than machine learning. Some prediction methods are summarized in Table 1. In the DeepDTA [8] model, one-dimensional sequences of drugs and proteins are fed into a convolutional neural network (CNN) to extract the features of drugs and their targets through the (simplified molecular input line entry system) SMILES string representations of the drugs, and good results have been achieved. The PADME [13] model combines molecular graph convolution of compounds and protein features and uses fixed-rule descriptors to represent proteins, improving the predictive performance of the model. The model is more scalable than traditional machine learning models. WideDTA [17] builds on DeepDTA [8] by representing drugs and proteins as words, learning more potential characteristics of drugs and proteins. However, since the convolution window of a CNN is fixed, this network is unable to extract the features of contextual information. To represent molecules in a natural way that preserves as much molecular structure information as possible, thus allowing the model to better learn the relevance of the underlying space, an increasing number of approaches are utilizing graph neural networks to predict DTA. MT-DTI [18] introduces the attention mechanism in drug representation and takes more account of the correlation between different molecules, which improves the prediction performance of DTA and greatly increases the interpretability. In DeepGS [19], the topological structure information of a drug is extracted by using a graph attention network (GAT) [20], while the local chemical background of the drug is captured by using a bidirectional gated recurrent unit (Bi-GRU) [21] and combined with the protein sequence features extracted by a CNN for prediction. rzMLP [22] uses a gMLP model to aggregate input features with constant size, and uses a ReZero layer to smooth the training process for that block. The model is able to learn more complex global features while avoiding poor predictions due to a too deep model. EnsembleDLM [23] aggregates predictions from multiple deep neural networks, not only obtaining better predictions, but also exploring how much data deep learning networks need to achieve better prediction performance. GANsDTA [24] employs a generative adversarial network (GAN) [25] to extract features of protein sequences and compound SMILES in an unsupervised manner. Because GAN’s feature extractor does not require labeled data, the model is able to accommodate unlabeled data for training. Because GAN’s feature extractor does not require labeled data, the model is able to accommodate unlabeled data for training. The model can use more datasets to learn protein and drug features, thus achieving correspondingly better feature representation and prediction performance. GraphDTA [26] modelled drugs as molecular graphs with one-dimensional drug sequences, then put the graph into several graph network models and obtained deep learning models, which were excellent at the time. GraphDTA [26] demonstrated that representing drugs as graphs can further improve the prediction capabilities of deep learning models in terms of DTA.

Table 1 Prediction methods

However, two problems remain that prevent accurate DTA. (1) The GAT model used by some contemporary methods is a restricted form of static attention, and the attention coefficient function of the nodes in the drug graph is monotonic, which leads to the inability to comprehensively extract drug features. (2) When processing protein sequences, the contextual association information of amino acid sequences is not acquired, and the protein association features are thus ignored.

To solve the above problem, this paper proposes a method named dynamic graph DTA (DGDTA). In DGDTA, each drug is considered a graph of interactions between atoms and edges, and a dynamic attention score is used to consider which atoms and edges in the drug graph play more critical roles in predicting DTA. Compared with static attention, DGDTA is able to extract a more comprehensive drug signature. To better obtain the contextual features of amino acid sequences in proteins, DGDTA introduces bidirectional long short-term memory (Bi-LSTM) [27] to extract more comprehensive amino acid sequence features in combination with drugs. Through validations conducted on the Davis [28] and KIBA [14] datasets, DGDTA achieves better performance than the competing methods in terms of results. In this paper, a dynamic graph attention network example is given to further improve the representativeness and effectiveness of drug molecule maps. The experimental results demonstrate the effectiveness of DGDTA.


DGDTA is a method for predicting DTA based on a deep learning network, and its architecture (shown in Fig. 1), is divided into three main steps. (1) Obtaining drug features. DGDTA uses the SMILES [29] as the drug compound input, and transforms the drug into a drug graph consisting of atoms and edges with reference to the natural properties of the drug. According to the literature, a two-layer graph network structure has better feature extraction performance. DGDTA uses a two-layer dynamic graph attention network (GATv2) [30] and a combination of GATv2 and a graph convolutional network (GCN) to obtain drug graph features, and DGDTA is divided into two versions: DGDTA-AL and DGDTA-CL. (2) Extracting protein features. DGDTA uses a combination of Bi-LSTM and multilayer convolutional networks to obtain more comprehensive protein amino acid sequence information while considering the contextual relationships among the amino acid sequences. (3) Performing DTA prediction. The observed connections among drug features and protein features during extraction used to determine DTA via a fully connected layer. The details of DGDTA are described in the following parts.

Fig. 1
figure 1

General architecture of DGDTA

Obtaining drug features

With the development of graph neural networks for DTA, many approaches have been presented. When using a graph to represent a drug, it is difficult to accurately extract graph features due to the complexity of drug graphs. DGDTA adopts a dynamic GAT to obtain drug features. Through SMILE code, drug’s atomic composition, and the valence charge number of atoms can be inferred, which can further judge drug information such as the number of hydrogen bonds, and then used for the drug’s feature representation in affinity prediction. To better extract drug features, DGDTA uses the SMILES [29] sequences of drugs as inputs, and uses RDKit to extract the atoms and interactions from the SMILES sequences. Then, DGDTA constructs a graph for each drug based on its SMILES sequence. A drug graph is denoted as \(G=(V,E)\), where \(V\) is a node represented by a drug atom, and \(E\) represents the set of edges between nodes. Each node is represented by an n-dimensional vector from DeepChem [31]. This n-dimensional vector includes the atomic symbols, the number of adjacent hydrogen atoms, the number of adjacent atoms, the implicit valence of the atoms (implicit valence) and whether the bonds are aromatic. One node is represented as \(\text{d}=\left\{{\text{f}}_{1},{\text{f}}_{2},{\text{f}}_{3}\dots ,{\text{f}}_{n}\right\}\). By representing the atoms d of each drug as the vertices of the drug graph, the features \(D=\left\{{\text{d}}_{1},{\text{d}}_{2},{\text{d}}_{3}\dots ,{\text{d}}_{D}\right\}\) of each drug are obtained. To obtain more information about the graph structure in n-dimensional space, this paper adopts a dynamic attention mechanism for the graph:

$$e\left({d}_{i},{d}_{j}\right)={a}^{T}LeakyReLU\left(W\left[{d}_{i}\parallel {d}_{j}\right]\right) \quad j\in {\mathcal{N}}_{\mathcalligra{i}}$$

\(e\left( {d_{i} ,d_{j} } \right)\) denotes the importance of the features of neighbour node \(j\) to node \(i\), where \({\mathcal{N}}_{\mathcalligra{i}}\) represents the neighbours of node \(i\), \(a\in {\mathbb{R}}^{{2d}^{{\prime }}}\), \(W\in {\mathbb{R}}^{{2d}^{{\prime }}\times d}\) are learned, and II denotes vector concatenation. Utilizing the \(softmax\) function to normalize all neighbours, we can obtain the following attention function:

$${a}_{ij}=softmax\left(e\left({d}_{i},{d}_{j}\right)\right)=\frac{exp\left(e\left({d}_{i},{d}_{j}\right)\right)}{{\sum }_{k\in {\mathcal{N}}_{\mathcalligra{i}}}exp\left(e\left({d}_{i},{d}_{k}\right)\right)}$$

Combining Eqs. (1) and (2), the coefficients of attention are expressed as:

$${a}_{ij}=\frac{{a}^{T}LeakyReLU\left(W\left[{d}_{i}\parallel {d}_{j}\right]\right)}{{\sum }_{k\in {\mathcal{N}}_{\mathcalligra{i}}}exp\left({a}^{T}LeakyReLU\left(W\left[{d}_{i}\parallel {d}_{j}\right]\right)\right)}$$

After integrating the feature information of the neighbouring nodes, we apply the nonlinear parameter \(\sigma\), to obtain the output features of each node:

$${d}_{i}^{{\prime }}=\sigma \left({\sum }_{j\in {\mathcal{N}}_{\mathcalligra{i}}}{a}_{ij}W{d}_{j}\right)$$

Nodes are represented as the weighted averages of their neighbouring feature vectors. To further solidify the learning process of dynamic graph self-attention and improve the learning effect, the attention is extended to multiheaded attention.

$${d}_{i}^{{\prime }{\prime }}=\sigma \left(\frac{1}{H}\sum _{h=1}^{H}{\sum }_{j\in {\mathcal{N}}_{\mathcalligra{i}}}{{a}_{ij}}^{h}{W}^{h}{d}_{j}\right)$$

\(H\) independent attention mechanisms connect the semantic feature vectors of the nodes through Eq. (5), and obtain an updated drug feature representation \({\text{D}}^{\left(1\right)}=\left\{{\text{d}}_{1}^{\left(1\right)},{\text{d}}_{2}^{\left(1\right)},{\text{d}}_{3}^{\left(1\right)}\dots ,{\text{d}}_{D}^{\left(1\right)}\right\}\). Based on a combination of research and experiments, a two-layer graph network structure is able to obtain more accurate prediction results. First, the graph network in the second layer uses a dynamic graph neural network and obtains the drug feature representations \({\text{D}}^{\left(2\right)}=\left\{{\text{d}}_{1}^{\left(2\right)},{\text{d}}_{2}^{\left(2\right)},{\text{d}}_{3}^{\left(2\right)}\dots ,{\text{d}}_{D}^{\left(2\right)}\right\}\); this version is named DGDTA-AL. After many experiments and comparisons, the graph network in the second layer is replaced with a GCN, whose propagation rules are as follows:

$${H}^{(l+1)}=\sigma \left({\stackrel{\sim}{D}}^{-\frac{1}{2}}\stackrel{\sim}{A}{\stackrel{\sim}{D}}^{-\frac{1}{2}}{H}^{\left(l\right)}{W}^{\left(l\right)}\right)$$

\({H}^{\left(l\right)}\) denotes the nodal feature matrix of \({l}^{th}\), where \(\stackrel{\sim}{A}=A+I\), \(A\) is the adjacency matrix, \(I\) is the unit matrix, \(\stackrel{\sim}{D}=D+I\), \(D\) is the degree matrix, and \(W\) is a trainable weight. A drug feature representation \({\text{D}}^{{\left(2\right)}^{{\prime }}}=\left\{{\text{d}}_{1}^{{\left(2\right)}^{{\prime }}},{\text{d}}_{2}^{{\left(2\right)}^{{\prime }}},{\text{d}}_{3}^{{\left(2\right)}^{{\prime }}}\dots ,{\text{d}}_{D}^{{\left(2\right)}^{{\prime }}}\right\}\) is obtained. The GCN is applied to the full graph via the Laplacian matrix, which captures the connectivity relationships between the graph nodes and updates the node features of the full graph. In this paper, this version is named DGDTA-CL. We use the rectified linear unit (\(ReLU\)) activation function after each layer and use global maximum pooling in the last layer to obtain the vector representation of the drug.

Extracting protein features

A protein sequence is a string of ASCII characters represented as amino acids. In many methods, one-hot codes are used to represent drugs and proteins, as well as other biological sequences, such as DNA and RNA. We use one-hot encoding to represent the atoms of the drug and incorporate atomic properties for drug initialization. Because drug molecules are shorter and simpler in structure than proteins, we utilize one-hot encoding to expand the dimensionality of the drug’s representation. This enables model to capture specific information associated with each drug atom. For protein, in order to prevent feature singularity, we employ different approaches for the initialization. In this paper, we map each amino acid to a numerical value and represent one protein as a sequence of integers. And then an embedding layer is added to the sequence, where each character is represented by a 128-dimensional vector. For training purposes, the sequences are cut or padded to a fixed sequence with a length of 1000. If the sequence is short, it is padded with 0 values. In this paper, the embedding representation (\(\text{c}\in {\mathbb{R}}^{{\mathcalligra{d}}_{p}}\), where \(d\) is the dimensionality of the protein embedding) is a Bi-LSTM layer that captures the dependencies the characters in a sequence of length \(n\) (\(C=\left[{\text{c}}_{1},{c}_{2}\dots {\text{c}}_{n}\right]\)). We obtain \({p}_{i}\in {\mathbb{R}}^{{2\mathcalligra{d}}_{1}}\), where \({d}_{1}\) denotes the number of output cells used in each LSTM cell.

$$\overrightarrow{{p}_{i}}= \overrightarrow{LSTM}({\text{c}}_{\text{i}},{p}_{i-1})$$
$${p}_{\text{i}}= \overrightarrow{{p}_{i}}\parallel \overleftarrow{{p}_{i}}$$

The vector \(P\) is composed of the output vectors generated by the Bi-LSTM; i.e., \(P=\left[{p}_{1},{p}_{2}\dots {p}_{n}\right]\). Finally, we use a one-dimensional convolutional layer to learn different levels of abstract features to obtain a vector of protein sequences representations.

Performing DTA prediction

The prediction layer connects the learned drug vector representation with the vector representation of the protein sequence. Then, they are used as inputs and the output \(y\) is obtained from the fully connected layer.


where \({W}_{output}\) denotes the weight matrix of the fully connected layer and \({b}_{output}\) denotes the bias of the fully connected layer.

We choose the mean square error (MSE) loss as the loss function, which has the advantage of a function curve that is smooth, continuous and derivable everywhere, making it convenient for use in the gradient descent algorithm. As the error decreases, the gradient also decreases, which is more conducive to convergence and more stable.

$$MSE=\frac{1}{n}\sum _{i=1}^{n}{({Y}_{i}-{y}_{i})}^{2}$$

where \({Y}_{i}\in {\mathbb{R}}^{\text{B}}\), \({y}_{i}\in {\mathbb{R}}^{\text{B}}\) denotes the predicted affinity value between the \(i\)th sample and the label of the affinity value in the sample, and \(\text{B}\) denotes the batch size.

Model training

DGDTA takes drug SMILES strings and protein amino acid sequences as inputs. In this paper, Python 3.9, PyTorch 1.12.1 and PyG2.1 are used to implement dynamic GAT and LSTM. In this paper, the number of layers in the graph neural network is set to 2, Bi-LSTM is applied, the number of hidden states is set to 10, and the dropout parameter is set to 0.2. Then, the proposed method is trained on the above dataset for 1000 epochs, and the adaptive moment estimation (Adam) optimizer is used with a learning rate of 0.0005. The devices that are used for the experiments are an Intel(R) Xeon(R) Platinum 8260 CPU @ 2.30 GHz and an NVIDIA GeForce RTX 3090 GPU.


In this section, we present the dataset used, the evaluation metrics, an ablation study and the results of a comparison with state-of-the-art methods. This section also illustrates the advantage of the dynamic GAT and gives an example of a real drug–target combination.

Dataset and evaluation metrics

We use the Davis [28] and KIBA [14] datasets to evaluate the performance of the method proposed in this paper. The numbers of drugs and targets in the dataset, and the sample sizes for training and testing during the experiments are shown in Table 2. In this paper, the concordance index (CI; the larger the better) [32] and MSE (the smaller the better) are also used as the main indicators for evaluating the performance of the tested models. In this paper, the GAT and GAT_GCN models are chosen as baseline1 and baseline2 of the ablation study, respectively.

Table 2 Datasets

Ablation study

In the ablation study, we analyse the effectiveness of the innovative elements of our method. In this section, to be as fair as possible, we use the same training and testing sets as those employed by the baselines and the same evaluation metrics. In this paper, a dynamic graph neural network is incorporated into the drug graph, and Bi-LSTM is added to extract protein amino acid sequence features to further improve the model accuracy. The popular GRU model is added as a comparison method. GRU and LSTM are important variants of recurrent neural networks, and they have strong memory and long-distance dependence capturing ability when processing sequence data. GRU has higher computational efficiency with reduced parameter settings compared to LSTM, but this also leads to some loss of information at longer distances in some cases. In order to better capture the contextual association information of amino acid sequences and further prove the effectiveness of LSTM method, GRU is introduced as a comparison in the ablation study. And the results of the ablation study are shown in Figs. 2 and 3.

Fig. 2
figure 2

Comparison between baseline1 and different models at 200 and 1000 epochs

Fig. 3
figure 3

Comparison between baseline2 and different models at 200 and 1000 epochs

Figure 2 shows that on the Davis and KIBA datasets, the DTA prediction results obtained by Model-2 using the dynamic GAT achieve a higher CI and a smaller MSE than those of baseline 1 in the same number of epochs. Model-1 with the addition of Bi-LSTM method is also better than baseline1. Based on Model-2, Bi-LSTM is used to improve the ability to extract contextual protein amino acid sequence features. The evaluation score of Model-4 is improved further, while the prediction result is better than that of the GRU in Model-3 with the same parameters. Model-4 achieves the best results in the 200-epoch and 1000-epoch comparisons conducted on both datasets, and Model-4 is the DGDTA-AL method illustrated in 2.1. As shown in Fig. 3, Model-8 obtains the highest CI and the lowest MSE in the comparison with baseline 2 over the same number of epochs; Model-8 is the DGDTA-CL method.

In this paper, the results obtained by different models in the ablation study are presented in Table 3. On the Davis dataset, DGDTA-AL achieves the best results (in bold), reaching 0.899 and 0.225 CI and MSE values, respectively, which are improvements of 0.7% and 0.7% over those of baseline. DGDTA-CL achieves a CI of 0.902 and an MSE of 0.125 on the KIBA dataset, which are improvements of 1.1% and 1.4% over those of baseline 2, respectively. The results of the ablation study demonstrate the effectiveness of the innovative elements proposed in this paper.

Table 3 Ablation study on the Davis and KIBA datasets

Comparison with the state-of-the-art methods

In this section, Table 4 shows the experimental results obtained by DGDTA and the comparison methods. To be consistent with the ablation experiment in 3.2, we use the same datasets and evaluation metrics. Based on this, we added the \({r}_{m}^{2}\) evaluation metric. As shown in Table 4, DGDTA-AL is better than the mainstream DTA methods in terms of the CI, MSE and \({r}_{m}^{2}\) on the Davis dataset. Compared with DeepGLSTM [33], which has the best results among the comparison methods, the CI and MSE of the proposed approach are improved by 0.6% and 1.1%, respectively. Additionally, the CI and MSE are improved by 0.9% and 0.4%, respectively, over those of the excellent MATT-DTI [34] method. And, \({r}_{m}^{2}\) reaches 0.707. As shown in Table 4, DGDTA-CL achieves a more significant improvement in its results on the KIBA dataset. Compared with the DeepGLSTM [33] method, DGDTA-CL attains 1.2% and 1.8% performance improvements in terms of the CI and MSE metrics, and 1.3% and 2.5% CI and MSE improvements are achieved over the MATT-DTI [34] method, respectively. And, \({r}_{m}^{2}\) reaches 0.809. Figure 4 plots the CI scores obtained by the methods in the table for both datasets to further demonstrate the performance improvement provided by the DGDTA method. The experimental results show that DGDTA is better than the comparative methods, and the use of a dynamic graph with attention to extract drug features and effective contextual protein information is significant for predicting DTA.

Table 4 Comparison with the state-of-the-art methods
Fig. 4
figure 4

CI comparison among the experimental methods on the Davis and KIBA datasets

Advantages of the DGDTA model

A dynamic GAT suggests that a traditional GAT is only a computationally constrained form of “static” attention: for any query node, the attention function is monotonic with respect to the key fraction [30]. As shown in the GAT heatmap presented in Fig. 5, the ordering of the attention coefficients is global, and all queries focus primarily on the 7th key.

$$e\left({d}_{i},{d}_{j}\right)=LeakyReLU\left({a}^{T}\left[W{d}_{i}\right]\parallel \left[W{d}_{j}\right]\right) j\in {\mathcal{N}}_{\mathcalligra{i}}$$

Formula (10) is the method for calculating the attention coefficients in the GAT, indicating the importance of the feature of node \(j\) to node \(i\). As \({\mathcal{N}}_{\mathcalligra{i}}\) is limited, there exists \(a\) node \({j}_{max}\) where the attention distribution \(a\) only calculates static attention from \({j}_{max}\) due to it being the maximum value. To overcome the monotonicity restriction of the key score, Formula (12) is transformed into Formula (1). This variant is more expressive than the GAT, as shown in the attention maps of GATv2 in Fig. 5. Since static attention cannot have different correlations for different keys and different queries, if there is one key that has a higher attention score than the others, then no query can ignore the score of this key, which results in very limited static attention.

Fig. 5
figure 5

Attention coefficients of the GAT and GATv2

Among the datasets, Davis contains 2457 positive samples and 27,599 negative samples, the total number of samples is small, and the label distribution in the dataset is unbalanced. KIBA has 22,729 positive samples and 95,525 negative samples, so it contains more samples than Davis, but most of the labels in KIBA are very concentrated, and the label distribution is relatively normal. These problems create barriers for the model in terms of affinity prediction. Dynamic graph attention pays different amounts of attention to different queries in the attention score, enabling it to better distinguish the similarities and differences between samples. It is more discriminative during drug graph extraction and alleviates the imbalance problem in the given dataset. Figure 6 shows the MSE changes exhibited by the DGDTA-AL, DGDTA-CL, baseline 1 and baseline 2 models on Davis and KIBA at 200 and 500 epochs. Blue and green represent our proposed models with faster decreasing trends. The results demonstrate the more significant improvement yielded by the dynamic GAT in terms of predicting DTA.

Fig. 6
figure 6

MSE trend

Example of a realistic drug–target combination

To further demonstrate the validity of the proposed method, this paper gives an example to show the 3D model produced for a tested sample in reality. As shown in Fig. 7, the targeted drug (sunitinib) inhibits receptor tyrosine kinases (RTKs), where certain receptor tyrosine kinases are involved in tumour growth, pathological blood vessel formation and tumour metastasis. In biological and cytometric assays, sunitinib has been shown to inhibit tumour growth, cause tumour regression and inhibit tumour metastasis. In this paper, the bound small drug molecules are scaled up on the right side, and the drug and its binding target correspond to the drug ‘DB5329102’ and the target ‘ITK’ in the test set, respectively; this is done to verify the validity and practicality of the model proposed in this paper in practical applications through known drug–target binding examples.

Fig. 7
figure 7

Visualization of the binding of a drug ‘DB5329102’ and a target ‘ITK’


In this paper, DGDTA is proposed based on the dynamic graph attention model and is divided into two versions, DGDTA-AL and DGDTA-CL, to predict the affinity values between drugs and proteins. Ablation experiments are performed on the Davis and KIBA datasets, and the proposed approach is compared with the DTA models that are popular today. The experimental results show that DGDTA can achieve better prediction performance and demonstrate that the dynamic graph attention model can extract more comprehensive feature representations from molecular drug maps.


DGDTA can effectively predict DTA via deep learning, and it can obtain high CI and MSE metrics on experimental datasets, but it still has shortcomings. First, while dynamic graph attention models attain good prediction performance, they also require increased prediction time and computational cost. Second, drugs and proteins have very complex spatial structures, and much characteristic drug and protein information is lost in one-dimensional sequences.

In the future, further consideration will be given to fusing other characteristic drug information, such as their side effects, physicochemical properties, and deep structures. This will contribute to improving the performance of drug–target binding prediction models from various aspects.

Availability of data and materials

The Davis and KIBA data can be downloaded from The software and sample result as part of this project are readily avail- able from GitHub at Project name: DGDTA. Project home page: Operating system(s): Linux or other unix-like systems. Programming language: python 3.x. License: GNU GPL v3. Any restrictions to use by non-academics: license needed.



Dynamic graph DTA


Drug–target binding affinity


Graph attention network


Bidirectional long short-term memory


Simplified molecular input line entry system


Convolutional neural network


Bidirectional gated recurrent unit


Generative adversarial network


Dynamic graph attention network


Receptor tyrosine kinases


Graph convolutional network


  1. Strittmatter SM. Old drugs learn new tricks. Nat Med. 2014;20(6):590.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Affinity2Vec. drug–target binding affinity prediction through representation learning, graph mining, and machine learning. Sci Rep. 2022;12(1):1–18.

    Google Scholar 

  3. Wan S, Kumar D, Ilyin V, Homsi UA, Coveney PV. The effect of protein mutations on drug binding suggests ensuing personalised drug selection. Sci Rep. 2021;11(1):13452.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Ashburn TT, Thor KB. Drug repositioning: identifying and developing new uses for existing drugs. Nat Rev Drug Discov. 2004;3(8):673–83.

    Article  CAS  PubMed  Google Scholar 

  5. Guan J, Tian K, Wang Y, Shao M, Zhou S. Boosting compound-protein interaction prediction by deep learning. Methods Companion Methods Enzymol. 2016;110:64–72.

    Article  Google Scholar 

  6. Hua Y, Song X, Feng Z, Wu XJ, Kittler J, Yu DJ. CPInformer for efficient and robust compound–protein interaction prediction. IEEE/ACM Trans Comput Biol Bioinform. 2022;20:285–96.

    Google Scholar 

  7. Thomas KL. Predicting new molecular targets for known drugs. Nature. 2009;462(7270):175–81.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Hakime Z. Arzucan Z, Elif O. DeepDTA: deep drug–target binding affinity prediction. Bioinformatics. 2018;34(17):i821–9.

    Article  Google Scholar 

  9. Krisztian B, Ladislav P, Júlia K. Modified linear regression predicts drug–target interactions accurately. PLoS ONE. 2020;15(4):e0230726.

    Article  Google Scholar 

  10. Cichonska A, Ravikumar B, Parri E, Timonen S, Pahikkala T, Airola A, Wennerberg K, Rousu J, Aittokallio T. Computational-experimental approach to drug–target interaction mapping: a case study on kinase inhibitors. PLoS Comput Biol. 2017;13(8):e1005678.

    Article  PubMed  PubMed Central  Google Scholar 

  11. He T. SimBoost: a read-across approach for drug–target interaction prediction using gradient boosting machines. J. Cheminform. 2016;9:1-14.

    CAS  Google Scholar 

  12. Wang YB, Yi HC, Yang S, Chen ZH, Zheng K. A deep learning-based method for drug–target interaction prediction based on long short-term memory neural network. BMC Med Inf Decis Mak. 2020;20(Suppl 2):49.

    Article  Google Scholar 

  13. Feng Q, Dueva E, Cherkasov A, Ester M. PADME: a deep learning-based framework for drug–target interaction prediction. 2018.

  14. Tang J, Szwajda A, Shakyawar S, Xu T, Aittokallio T. Making sense of large-scale kinase inhibitor bioactivity data sets: a comparative and integrative analysis. J Chem Inf Model. 2014;54(3):735–43.

    Article  CAS  PubMed  Google Scholar 

  15. Cer RZ, Mudunuri U, Stephens R, Lebeda FJ. IC50-to-Ki: a web-based tool for converting IC50 to Ki values for inhibitors of enzyme activity and ligand binding. Nucleic Acids Res. 2009;37(Web Server issue):W441–445.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Cichonska A, Pahikkala T, Szedmak S, Julkunen H, Airola A, Heinonen M, Aittokallio T, Rousu J. Learning with multiple pairwise kernels for drug bioactivity prediction. Bioinformatics. 2018;34:509–18.

    Article  Google Scholar 

  17. Öztürk H, Ozkirimli E, Özgür A. WideDTA: prediction of drug–target binding affinity. 2019.

  18. Shin B, Park S, Kang K, Ho JC. Self-attention based molecule representation for predicting drug–target interaction. In: Machine learning for healthcare conference: 2019. PMLR: p. 230–248.

  19. Lin X. DeepGS: deep representation learning of graphs and sequences for drug–target binding affinity prediction. 2020.

  20. Velikovi P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y. Graph attention networks. 2017.

  21. Cho K, Merrienboer BV, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078. 2014.

  22. Qiu Z, Jiao Q, Wang Y, Chen C, Zhu D, Cui X. rzMLP-DTA: gMLP network with ReZero for sequence-based drug–target affinity prediction. In: 2021 IEEE international conference on bioinformatics and biomedicine (BIBM): 2021. IEEE. p. 308–313.

  23. Kao P-Y, Kao S-M, Huang N-L, Lin Y-C. Toward drug–target interaction prediction via ensemble modeling and transfer learning. In: 2021 IEEE international conference on bioinformatics and biomedicine (BIBM): 2021. IEEE. p. 2384–2391.

  24. Lingling Z, Junjie W, Long P, Yang L, Jun Z. GANsDTA: predicting drug–target binding affinity using GANs. Front Genet. 2019;10:1243.

    Google Scholar 

  25. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial networks. Commun ACM. 2020;63(11):139–44.

    Article  Google Scholar 

  26. Nguyen T, Le H, Quinn TP, Nguyen T, Venkatesh S. GraphDTA: predicting drug–target binding affinity with graph neural networks. Bioinformatics. 2020;37:1140–7.

    Article  Google Scholar 

  27. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80.

    Article  CAS  PubMed  Google Scholar 

  28. Davis MI, Hunt JP, Herrgard S, Ciceri P, Wodicka LM, Pallares G, Hocker M, Treiber DK, Zarrinkar PP. Comprehensive analysis of kinase inhibitor selectivity. Nat Biotechnol. 2011;29(11):1046–51.

    Article  CAS  PubMed  Google Scholar 

  29. Weininger D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inform Comput Sci. 1988;28(1):31–6.

    Article  CAS  Google Scholar 

  30. Brody S, Alon U, Yahav E. How attentive are graph attention networks? 2021.

  31. Ramsundar B. Deep learning for the life sciences: applying deep learning to genomics, microscopy, drug discovery, and more. O’Reilly Media, Inc. 2019.

  32. Mithat G, Glenn H. Concordance probability and discriminatory power in proportional hazards regression. Biometrika. 2005;92(4):965–70.

    Article  Google Scholar 

  33. Mukherjee S, Ghosh M, Basuchowdhuri P. Deep graph convolutional network and LSTM based approach for predicting drug–target binding affinity. 2022.

  34. Yuni Z, Xiangru C, Yujie L, Xuedong L, Dezhong P. Deep drug–target binding affinity prediction with multiple attention blocks. Brief. Bioinform. 2021;22(5):bbab117.

    Article  Google Scholar 

  35. Karim A, Parvin R, Antti P, Massoud A, Ghasemi JB, Ali MN. DeepCDA: deep cross-domain compound-proteinaffinity prediction through LSTM and convolutional neural networks. Bioinformatics. 2020;36:4633–42.

    Article  Google Scholar 

Download references


Not applicable.


The authors declare that they have no competing interests. This work has been supported in part by the National Natural Science Foundation of China under Grant No. 61972134, Young Elite Teachers in Henan Province No. 2020GGJS050, Doctor Foundation of Henan Polytechnic University under Grant No. B2018-36, Innovative and Scientific Research Team of Henan Polytechnic University under No. T2021-3, Innovation Project of New Generation Information Technology under No. 2021ITA09021.

Author information

Authors and Affiliations



HZ, HH and JL participated in the design of the study and the analysis of the experimental results. HH and XL performed the implementation, prepared the tables and figures, and summarized the results of the study. ZW and JW checked the format of the manuscript. All authors have read and approved the final manuscript for publication.

Corresponding author

Correspondence to Junwei Luo.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhai, H., Hou, H., Luo, J. et al. DGDTA: dynamic graph attention network for predicting drug–target binding affinity. BMC Bioinformatics 24, 367 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: