 Research
 Open access
 Published:
Drug repositioning based on residual attention network and free multiscale adversarial training
BMC Bioinformatics volume 25, Article number: 261 (2024)
Abstract
Background
Conducting traditional wet experiments to guide drug development is an expensive, timeconsuming and risky process. Analyzing drug function and repositioning plays a key role in identifying new therapeutic potential of approved drugs and discovering therapeutic approaches for untreated diseases. Exploring drugdisease associations has farreaching implications for identifying disease pathogenesis and treatment. However, reliable detection of drugdisease relationships via traditional methods is costly and slow. Therefore, investigations into computational methods for predicting drugdisease associations are currently needed.
Results
This paper presents a novel drugdisease association prediction method, RAFGAE. First, RAFGAE integrates known associations between diseases and drugs into a bipartite network. Second, RAFGAE designs the Re_GAT framework, which includes multilayer graph attention networks (GATs) and two residual networks. The multilayer GATs are utilized for learning the node embeddings, which is achieved by aggregating information from multihop neighbors. The two residual networks are used to alleviate the deep network oversmoothing problem, and an attention mechanism is introduced to combine the node embeddings from different attention layers. Third, two graph autoencoders (GAEs) with collaborative training are constructed to simulate label propagation to predict potential associations. On this basis, free multiscale adversarial training (FMAT) is introduced. FMAT enhances node feature quality through small gradient adversarial perturbation iterations, improving the prediction performance. Finally, tenfold crossvalidations on two benchmark datasets show that RAFGAE outperforms current methods. In addition, case studies have confirmed that RAFGAE can detect novel drugdisease associations.
Conclusions
The comprehensive experimental results validate the utility and accuracy of RAFGAE. We believe that this method may serve as an excellent predictor for identifying unobserved diseasedrug associations.
Background
Drugs play important roles in treating diseases and promoting the health of organisms [1]. However, traditional drug development is an extremely lengthy and expensive process [2]. Recent studies have estimated that the average development cost to approve a new drug is $2.6 billion and the average development time is 10 years [3]. Drug repositioning, which involves discovering new therapeutic outcomes for previously approved drugs, is considered an important alternative to traditional drug development [4,5,6,7,8]. This approach shortens drug development and research cycles to 7 years, reduces costs to $295 million, and is more reliable than novel drug development [9]. Therefore, using known drugs for new disease treatments is gaining popularity [10, 11]. Traditional methods of discovering abnormal clinical manifestations through manual screening of clinical drug databases requires extensive experimentation. With the continuous accumulation of a wide variety of biological data, numerous computational methods based on data mining techniques have gained traction [12].
Matrix factorization aims to approximate the initial matrix by decomposing it into the product of two lowrank matrices, which are represented by hidden factor vectors in the kdimension. The inner product of the drug and disease vectors represents the association between them. Previous studies have shown that matrix decomposition methods are effective computational methods for drugdisease association prediction [13,14,15,16,17]. For example, the similarity constrained matrix factorization method for the drugdisease association prediction (SCMFDD) method, proposed by Zhang et al., maps the associations between diseases and drugs into two lowranking spaces and reveals the basic features. Then, drug similarity and disease similarity are introduced as increasing constraints [18]. Furthermore, Yang et al. proposed the multisimilarities bilinear matrix factorization (MSBMF) approach, which connects multiple disease and drug similarity matrices and extracts the effective latent features in the similarity matrix to infer associations between diseases and drugs [19]. In addition, Zhang et al. proposed a new drug repositioning method by using Bayesian inductive matrix completion (DRIMC), which uses the complement of Bayesian inductive matrices. This method integrates multiple similarities into a fused similarity matrix, where similarity information is described by similarity values between a drug or disease and its knearest neighbors. Finally, the diseasedrug association is predicted via induction matrix completion [20].
Networks can represent the complex relationships among entities, and the methods used to construct biological networks can effectively utilize information from multiple biological entities to represent the degree of association between them [21]. The networkbased method has produced good results in drug repositioning [22,23,24]. For instance, Zhao et al. first constructed a heterogeneous information network by combining drugdisease, proteindisease and drugprotein bioinformatics networks with disease and drug biology information. Then, the combined features of the nodes were learned from a biological and topological perspective via different representations. Moreover, random forest classifiers can be used to predict unknown associations [25]. Zhang et al. proposed a multiscale neighborhood topology learning method for drug repositioning (MTRD) to learn and integrate multiscale neighborhood topologies. This method involves the construction of different drugdisease heterogeneous networks to discover new drugdisease associations [26]. In addition, Luo et al. proposed a method named MBiRW that uses similarity matrices and known associations to construct heterogeneous networks and predicts unknown associations via the double random walk algorithm [27].
Although matrix factorization methods achieve good performance, they are weak in the interpretability of associations between diseases and drugs, whereas network methods are biased in representing higherorder networks. To solve these problems, several pioneering studies have focused on developing deep learningbased drug repositioning models [28,29,30,31,32,33]. For example, Zeng et al. first integrated multiple diseasedrug biological networks and designed a multimodal deep autoencoder named deep learningbased drug repositioning (deepDR) for learning higher order neighborhood information of drugdisease associations [34]. Subsequently, Yu et al. constructed a graph convolutional network (GCN) architecture with attention mechanisms, i.e., the labelaware GCN (LAGCN). First, this method uses known drugdisease associations, diseasedisease similarities and drugdrug similarities to construct heterogeneous networks and applies GCNs to the network. Next, the embeddings from multiple GCN layers are integrated via layer attention mechanisms. Finally, drugdisease pairs are scored on the basis of the integrated embeddings [35]. Feng et al. proposed Protein And Drug Molecule interaction prEdiction (PADME), a novel method to combine molecular GCNs for compound featurization with protein descriptors for drugtarget interaction prediction [36]. Moreover, Meng et al. proposed a drug repositioning approach based on weighted bilinear neural collaborative filtering (DRWBNCF) on the basis of neighborhood interaction and collaborative filtering. Instead of using all neighbors, this method uses only the nearest neighbors, thus filtering out noise and yielding more precise results [37]. Recently, Gu et al. proposed a method named relationsenhanced drugdisease association (REDDA) for learning node features of heterogeneous networks and topological subnetworks. This method employs heterogeneous networks as the backbone and combines the backbone with three attention mechanisms [38]. Deep learningbased methods mainly construct heterogeneous networks by using supplementary information about diseases and drugs and learn the features of diseases and drugs by applying deep learning algorithms to these networks.
However, these deep learningbased approaches tend to have oversmoothing problems caused by the homogenization of node embeddings and are highly dependent on the input quality. In this paper, we present a novel method of drug repositioning named RAFGAE. This method combines residual networks, graph attention networks (GATs), graph autoencoders (GAEs) and adversarial training to predict unknown associations between diseases and drugs. First, we use disease semantic similarity, drug structural similarity and diseasedrug associations to construct the initial input features. GATs are used to facilitate the learning of disease and drug embeddings in each layer and combine the embedding of different layers via attention mechanisms. Moreover, the initial residual and adaptive residual connections are adopted to alleviate the oversmoothing problem. Then, two GAEs are constructed on the basis of the disease space and drug space, and the information in these spaces can be integrated through synergistic training. Finally, the scores of the two GAEs are linearly combined by a balancing parameter to calculate the final prediction scores. On this basis, adversarial training is introduced to reduce invalid information and data noise, improving the input quality. The main contributions of RAFGAE can be summarized as follows:

RAFGAE is a complete deep learning approach that can effectively predict the associations between diseases and drugs.

RAFGAE designs the Re_GAT framework, which includes multilayer GATs and two residual networks. Multilayer GATs are utilized to learn the node embeddings by aggregating information from multihop neighbors, and two residual networks are used to alleviate the deep network oversmoothing problem. Then, an attention mechanism is introduced to combine the node embeddings of different attention layers.

RAFGAE performs adversarial training that may eliminate abnormal values, missing values and noise, increasing the input quality and prediction accuracy when extracting associations between diseases and drugs.

Our comprehensive experimental results demonstrate that the proposed RAFGAE method significantly outperforms five stateoftheart methods on the benchmark dataset.
Results and discussion
Algorithm performance comparison
To verify the performance of RAFGAE, we compare it with five recently proposed methods.

DRWBNCF [37], a method for drug repositioning on the basis of neighborhood interaction and collaborative filtering, uses only the nearest neighbors, rather than all neighbors, to filter out noisy information. A new weighted bilinear GCN encoder is then proposed.

LAGCN [35], a layer attention GCN method for drug repositioning, encodes a heterogeneous network combining known drugdisease associations, disease similarity and drug similarity information. To integrate all useful information, a layer attention mechanism is introduced into multiple GCN layers.

In bounded nuclear norm regularization (BNNR) [39], a heterogeneous network is constructed. This network combines known drugdisease associations, disease similarity and drug similarity information. The method tolerates noise by adding a regularization term to balance the rank properties and approximation error.

The neural inductive matrix completion with GCN (NIMCGCN) method [40], a method for the prediction of miRNAdisease associations) first employs GCN to learn the features of diseases and miRNAs from the disease and miRNA similarity networks. Then, neural induction matrix completion is applied for association matrix completion.

SCMFDD [18] (a similarity constraint matrix completion method for the prediction of drugdisease associations) projects known drugdisease association information into two lowrank spaces, revealing potential disease and drug embeddings, and then introduces drug featuredbased and disease semantic similarities as constraints for drugs and diseases in the lowrank spaces.
The above methods also involve similaritybased graph neural network models. The parameters in these methods are set to either the optimal values via a grid search (for DRWBNCF, λ is selected from {0.1, 0.2, ..., 0.9}; for BNNR, α and β are chosen from {0.01, 0.1, 1, 10}; and for SCMFDD, k is selected from{5%, 10%, ..., 50%}) or the values recommended by the authors (for LAGCN, α = 4000, β =0.6, and γ = 0.4; and for NIMCGCN, α = 0.4, l = 3, and t = 2). Furthermore, to ensure a meaningful and relevant comparison, each of the comparison methods is initially evaluated via the same 10fold crossvalidation approach and on the same benchmarking sets as those for our proposed method, RAFGAE. This approach allows us to conduct a comprehensive and rigorous assessment of the performance of all the methods.
The area under the curve (AUC) values in Fig. 1 and Table 1 show a comparison of the model performance. On the Fdataset, RAFGAE achieves the highest AUC score of 0.9343, which is 7.28%, 4.50%, 3.13%, 4.31%, and 4.01% higher than those of SCMFDD, LAGCN, BNNR, NIMGCN, and DRWBNCF, respectively. Similarly, on the Cdataset, RAFGAE achieves the highest AUC score of 0.9346. By comparing the model proposed in this paper with other models, it is evident that introducing residual connections and adversarial training can enhance the predictive performance of our model. Overall, the above experiments show that RAFGAE is an excellent predictor of diseasedrug relationships.
Ablation study
To quantitatively evaluate the importance of the two modules (the Re_GAT framework and the FMAT module) to RAFGAE, ablation experiments are conducted. The details of these variants of RAFGAE are listed below:

RAFGAE: The comprehensive RAFGAE framework consists of three main components: the Re_GAT framework, the FMAT module, and the GAE module.

GAE: The RAFGAE variant that includes only the GAE module.

FGAE: The RAFGAE variant that includes the FMAT and GAE modules but excludes the Re_GAT framework.

RAGAE: The RAFGAE variant that includes Re_GAT framework and the GAE module but excludes the FMAT module.
According to Fig. 2 and Table 2, it is clear that RAFGAE achieved the highest AUC and area under the precision–recall (AUPR) curve values on both the Fdataset and the Cdataset. The RAGAE and FGAE results show the impacts of global neighborhood node information aggregation and adversarial feature enhancement on the RAFGAE performance, respectively. In addition, the GAE results demonstrate that combining the Re_GAT framework and the FMAT module can improve the predictive performance of the RAFGAE model. In comparing FGAE and RAGAE to GAE, the performance results imply that both the Re_GAT framework and the FMAT module can improve the model performance. The poor performance of GAE suggests that the use of multilayer attention networks to aggregate global information and the incorporation of residual architectures to address the potential oversmoothing problem can enhance the accuracy of drugdisease association prediction. Furthermore, the results indicate that the inclusion of the adversarial training module improves the input quality, thereby satisfying the requirements of deep neural networks for highquality input features. These results demonstrate that the RAFGAE structure is reasonable.
Performance evaluation
To assess the effectiveness of RAFGAE in predicting known associations, tenfold cross validation (CV) is applied. In tenfold CV, the dataset is divided into ten folds. Nine folds are used as the training set, and the remaining fold is used to validate the performance of RAFGAE. This process is repeated 10 times, with each fold used as the testing fold once. Several important indicators are used to evaluate the performance of RAFGAE. The receiver operating characteristic (ROC) curve, which is based on the falsepositive rate (FPR) and the true positive rate (TPR), is utilized. As the benchmark datasets used in this experiment are imbalanced, we also use the PR curve and calculate the area under the PR curve (AUPR) as two additional indicators. To further evaluate the overall performance of the prediction model from multiple perspectives, the F1 score and the Mathews correlation coefficient (MCC) are calculated.
The ROC and PR curves for the Fdataset are shown in Fig. 3. RAFGAE achieves mean AUC and AUPR values of 0.9343 and 0.5270, respectively. The detailed results, including the F1score and MCC, are presented in Table 3. The results based on the Cdataset are shown in Table 4. As shown in Tables 1 and 2, the newly proposed RAFGAE model obtains good performance on the above two datasets, proving the effectiveness and robustness of this model.
Parameter adjustment
Since the hyperparameter settings can influence the performance of RAFGAE, we used tenfold CV on the Fdataset to analyze the impact of different parameter settings. In the Re_GAT framework, the weight α of the initial residual connection and the weight β of the adaptive residual connection can directly affect the result of feature fusion. To fully integrate adjacent node information and mitigate the oversmoothing problem, we adjust the α and β values within the following range: α ϵ {0.1 ~ 0.9} and β ϵ {0.1 ~ 0.9}. As shown in Fig. 4, when α = 0.3 and β = 0.7, the AUC reaches its maximum value.
In addition, the features of diseases and drugs are extracted via GATs. The Re_GAT framework computes and aggregates different multilayer features via the GAT. We discuss the impact of GATs with different numbers of layers on association prediction. Figure 5 presents the results of the ROC curve analysis on the basis of tenfold CV.
To optimize the initial parameters, we use the Adam optimizer [41]. As in previous studies [42, 43], we set the dropout and weight decay parameters to 0.5 and 10^{–5}, respectively. We also evaluate the model performance by changing the dimensions of the GAE hidden layers. With the other parameters unchanged, the AUC value of RAFGAE generally increases as the embedding dimension of the GAE hidden layer increase and tends to stabilize when the dimension reaches 256. Finally, we set the embedding dimension of the hidden layer to 256. These results are shown in Fig. 6.
Case studies
To evaluate the practical ability of RAFGAE to predict unknown indications of approved drugs as well as new therapies for existing diseases, we train the RAFGAE model using all known associations as training data, and predict potential associations for known diseases or drugs. The predicted ranking of unknown indications of approved drugs and unknown therapies for existing diseases is validated on the public database, namely, the Comparative Toxicogenomics Database (CTD) [44].
To assess the ability of RAFGAE to discover new indications, we select two representative medicinal products. Table 5 shows the confirmation information for the top 10 candidate diseases and the known drugdisease associations. Among them, doxorubicin is a cytotoxic anthracycline antibiotic that is widely used to treat various cancers, including Kaposi sarcoma and metastatic cancer related to AIDS. Of the top 10 positive predictions, there were 7 tumorrelated diseases that have been verified via reliable databases. Levodopa is a precursor of dopamine and is commonly used in the treatment of Parkinson's syndrome and Parkinson's syndromerelated disorders because of its ability to cross the blood–brain barrier. As shown in Table 5, reliable sources have identified 7 of the top 10 associated diseases. This evidence suggests that RAFGAE can be trained on and can learn from existing biological information and can identify association markers that are not captured in the training set.
To validate the practical ability of RAFGAE to discover novel therapies, we select breast neoplasms and smallcell lung cancer as experimental cases. On the basis of the RAFGAE prediction results, the 10 drugs with the highest prediction scores are validated via the CTD. Table 6 shows similar results for the top 10 positive predictions. Breast neoplasms are among the most common malignancies in women and the leading cause of cancerrelated disease in women. As shown in Table 6, 9 of the top 10 drugs were verified via reliable sources. The high incidence rate and high mortality of small cell lung cancer worldwide make this complex tumor a difficult medical problem. In summary, 6 drugs have been confirmed by evidence from authoritative sources among the top 10 predicted drugs ranked by prediction score. In summary, case studies have shown that RAFGAE can identify the associations between diseases and drugs that are unknown in training datasets but that have been validated in realworld studies. Moreover, RAFGAE can make reliable predictions regarding unconfirmed potential associations between diseases and drugs. Therefore, RAFGAE has a noteworthy ability to uncover novel therapies/indications for existing diseases/drugs.
Conclusions
In this paper, a deeplearning methodology named RAFGAE is developed for elucidating drugdisease associations. The key innovation of RAFGAE is that it combines the Re_GAT framework and the FMAT algorithm, facilitating the learning of neighbor node information and enhancing the initial node features in the diseasedrug bipartite network. Then, two GAEs with collaborative training are applied to integrate the disease and drug spaces for association prediction. Notably, unlike some previous predictors that consider only loworder neighbor information, the Re_GAT framework can account for both highorder and loworder neighbor information by using multilayer GATs. Moreover, residual networks are introduced to mitigate model data oversmoothing, enabling the full employment of graph structure information hidden in the bipartite network. To enhance the initial features of nodes and make the model more robust, the FMAT algorithm is employed. This algorithm adds gradientbased adversarial perturbation to the input characteristics. In addition, we construct two GAEs with collaborative training for label propagation, enabling the full integration of the drug and disease space information for association prediction and improving the robustness of the RAFGAE model.
With tenfold CV, the RAFGAE model achieves an AUC score of 0.9343, which is better than the AUC scores of five stateoftheart predictors. Furthermore, the case study results show that RAFGAE can reposition several representative drugs for human diseases and can be applied as a reasonable and effective tool for predicting the relationships between diseases and drugs.
We propose a computational drug repurposing method. This method can effectively identify candidate drugs with potential for treating different diseases and has the potential to uncover new indications for approved drugs that were previously unexplored. RAFGAE can guide wet laboratory experiments, accelerating drug development, reducing costs, and expanding treatment options. The method combines multilayer neural networks with residual connections to capture global information and alleviate oversmoothing problems. We also employ adversarial perturbations to improve the input quality. This novel combination of techniques provides a new perspective for future research and can also serve as a valuable reference for similar studies, such as predicting the associations between ncRNAs and diseases, microbiomedisease associations, and screening ncRNA drug targets.
However, RAFGAE has certain limitations. In this study, the negative and positive samples of the benchmark dataset are unbalanced, and we use all the negative samples as negative samples for training the proposed model. However, these unknown samples considered negative samples may be potential correlations, which greatly impacts the prediction accuracy of the model. In the future, we will select negative samples to further improve the model accuracy. In terms of biological data, we simply apply the interaction network between drugs and diseases without establishing a more informative biological regulatory network, which may further improve performance. In future research, we will introduce other biological entities, such as proteins, pathways, and genes. In scenarios where drugs share the same or similar indications but lack structural similarity, the transmission of structural similarity information through a multilayer neural network can give rise to an "information leakage" problem, leading to a distorted view of the algorithm's performance in realistic drug repurposing settings. In our future research, we plan to address the problem of information leakage further by incorporating multiple drug similarities, such as target protein domain similarity, GO target protein annotation similarity, side effect similarity, and GIP similarity. This broader range of drug similarities can provide a more comprehensive features for drug repurposing. Similarly, incorporating disease similarities, such as disease ontology similarity, can help improve the accuracy and reliability of repositioning predictions by leveraging additional diseaserelated information.
Methods
Data preparation
We employ two benchmark datasets established by investigators. The first dataset is the Fdataset, which corresponds to Gottlieb's gold standard dataset [45]. The Fdataset contains 1933 known associations between diseases and drugs, including 313 diseases collected from the OMIM database [46] and 593 drugs obtained from the DrugBank database [47]. The second dataset is the Cdataset [24], which includes 2532 known associations between 409 diseases collected from the OMIM database and 663 drugs obtained from the DrugBank database. Table 7 summarizes the benchmark datasets in our proposal.
In this study, we calculated the drug structure similarity matrix X_{dr} via the simplified molecular input line entry system (SMILES) chemical structure [48], which is represented as the Tanimoto index of chemical fingerprints of the drug pair via the Chemical Development Kit [49]. The disease semantic similarity matrix X_{di} is computed from the semantic similarity of the disease phenotypes via information from the medical descriptions of the disease pairs [50].
RAFGAE
After collecting the required data from different sources, we propose a prediction model with three individual modules to predict potential candidate diseases for drugs of interest. We first design the Re_GAT framework, which captures global structural information from a bipartite network. For the second module, we employ GAEs that use known associations between diseases and drugs to simulate label propagation to guide and predict unknown associations. On the basis of the above, we utilize the FMAT module for adversarial training to improve the input quality and increase the prediction accuracy. Figure 7 shows the overall workflow of RAFGAE.
Re_GAT framework
Graph attention networks use a selfattention hidden layer to assign different attention scores to different neighbors, thus extracting the features of neighboring nodes more effectively.
The initial input to the Re_GAT framework can be described as follows:
where N represents the node count, F represents the dimension of the feature and h_{i} ϵ R^{F} represents the initial feature matrix of all the nodes. GATs calculate attention scores on the basis of the importance of neighbors and then aggregate neighbor features on the basis of the attention score.
The attention score is calculated as follows:
To adjust for the influence of different nodes, we use the softmax function for attention score normalization score:
By combining Formulas (3) and (4), the calculation formula for the attention score can be expressed as:
where a_{ij} is the attention score, W is a learnable linear transformation matrix, a vector denotes the weight vector, σ() represents the LeakyReLU activation function, and ║ denotes the connection operation. After normalization, the following formula can be used to calculate the final output feature:
In this study, the drugdisease association matrix is given by matrix A, where the columns represent diseases and the rows represent drugs. The matrix A(j, k) = 1 if drug j is associated with disease k and 0 otherwise. Matrix A and its transposition matrix A^{T} define the bipartite network G:
We create the initial input embedding H^{(0)} as follows:
When combined with the bipartite network adjacency matrix G above, the graph attention network is defined as:
where H^{(l)} represents the node embedding of the lth layer, where l = 1, …, L, and GATs() represents a single attention layer, whereas the entire Re_GAT framework consists of multiple attention layers.
This study proposes a Re_GAT framework through two main strategies for forward propagation: (I) initial residual connection and adaptive residual connection; and (II) attention mechanism layer aggregation.
To facilitate the learning of feature information from higherorder neighbors, multiple attention layers are typically used, easily homogenizing the data and thus leading to oversmoothing problems. To alleviate the oversmoothing problem of deep CNNs, residual connections, also known as skip connections was first proposed for ResNet. Inspired by ResNet [51], recent studies have attempted to apply various residual connections to GATs to alleviate the oversmoothing problem. Several studies have shown that residual connections are necessary for deep GATs [52], not only to alleviate the oversmoothing problem, but also to give GATs a more stable gradient.
We sum the H^{(l)} weights with H^{(0)} and H^{(l−1)} according to the scale coefficients α and β, respectively. We use the initial skip connection and the adaptive skip connection to mitigate the oversmoothing problem and accelerate the convergence of the GATs. The GAT formula of our model can be rewritten as:
where α and β are hyperparameters.
Inspired by LAGCN [35], the embedding of each layer captures structural information from different orders of the heterogeneous network. For instance, the initial layer obtains direct connection information, whereas the higherorder layers collect information about multihop neighbors through iterative update embedding. To fuse all useful information from multiple GAT layers, we use the attention mechanism. Since the Re_GAT framework calculates the embedding of different layers and the embeddings contain different information, we define the resulting GAT layer embedding as:
where Hdr l ϵ R^{Ndr×kl} is the embedding of the drug in layer l and Hdi l ϵ R^{Ndi×kl} is the embedding of the disease in layer l. We use attention mechanism layer aggregation to integrate multiple embedding matrices, and the final fused embedding matrix is as follows:
where, Hdr i and Hdi i are the llayer embeddings of drugs and diseases, respectively, a_{i} and b_{i} are the attention factors that can be calculated via Formulas (2), (3) and (4), and L is the number of layers.
Constructing the feature similarity graph
A previous study showed that a similarity graph constructed using drug and disease features can be used to propagate labels [53]. We use the features C_{dr} and C_{di} to construct feature similarity graphs for diseases and drugs, respectively. These features are used for label propagation in the disease and drug spaces. The feature similarity graphs are constructed as follows. First, the Euclidean distance between nodes is calculated and ranked. Second, for each node i, its 10 nearest neighbors are selected. Finally, the adjacency matrix is defined as M, and the set of neighbors of node i is defined as N(i). The matrix M satisfies M_{ij} = 1 when j belongs to N(i); otherwise, M_{ij} = 0.
The selfloop adjacency matrix for the similarity graph S is constructed as follows:
where ⊙ is the Hadamard product. This method can be used to obtain both the drug similarity graph S_{dr} and the disease similarity graph S_{di}.
Graph autoencoder
Previous studies have shown that the graph autoencoder may simulate label propagation by iteratively propagating label information on the graph [54,55,56]. The association matrix A can be considered initial label information. The initial label information and the similarity graph S calculated via the above method are input to the GAE. The encoder layer produces a hidden layer Z, whereas the decoder outputs the score F. The encoder of the GAE can be defined as:
where Φ denotes the weight matrix. Here, we use two GAEs to propagate label information on the drug and disease graphs. We can obtain the drug hidden layer Z_{dr} and the disease hidden layer Z_{di}, which are expressed as follows:
where S_{dr} and S_{di} denote the drug similarity graph and the disease similarity graph, respectively, and A denotes the association matrix.
The decoder of the GAE is applied to decode the hidden layer representation, which is defined as follows:
Therefore, the score matrices F_{dr} and F_{di} can be obtained by decoding Z_{dr} and Z_{di}, respectively.
Since F_{dr} and F_{di} are both low rank matrices [57], they need to satisfy the ranksum inequality:
By performing a linear combination of F_{dr} and F_{di}, the final integrated score is obtained as follows:
where α ϵ (0,1) represents the balanced weight between the drug space and the disease space.
The GAE reconstruction error is the loss of crossentropy between the final prediction and the true value:
As the information from the disease space and the drug space influences the predicted outcome, we use a cotraining approach to train the above two GAEs. The cotraining training loss L_{co} is defined as:
The combined loss function can be rewritten as:
where L_{rdr} and L_{rdi} denote the reconstruction errors of the two GAEs in the drug space and the disease space, respectively.
Free multiscale adversarial training
In this section, we investigate how to effectively improve the input quality through data augmentation [58]. When neural networks are trained, the quality of the data is far more important than the quantity. By searching for and stamping out small perturbations that cause the classifier to fail, one may hope that adversarial training could benefit standard accuracy. Adversarial training is a wellstudied method that increases the robustness and interpretability of neural networks. When the data distribution is sparse and discrete, the beneficial effect of adversarial perturbations on generalizability is prominent [59]. Inspired by this, we introduce free multiscale adversarial training (FMAT) to augment the node features [60].
Adversarial training first generates adversarial perturbations, which are then integrated into the training node features. Given a learning model f_{θ} with parameters θ, we denote the perturbed feature as H_{adv} = H + δ. Adversarial learning follows the min–max formulation:
where A represents the real value, D represents the data distribution, L represents the objective loss function, ε represents the perturbation budget, and ║║_{p} represents an l_{p}norm distance measure.
The saddlepoint optimization problem can be solved via projected gradient descent (PGD), which implements inner maximization, and stochastic gradient descent (SGD), which implements outer minimization. The parameter δ is updated after each step:
where ∏_{║δ║≤ε} is projected onto the εsphere under the l_{∞}norm. The initial layer of the Re_GAT framework can be rewritten as:
To effectively exploit the generalizability of adversarial perturbations and improve their diversity and quality, Chen et al. emphasized the importance of adapting to different types of data enhancements [61]. To achieve this, we introduce a 'free' training approach [62].
The calculation of δ is inefficient because the Nstep update requires N forward and backward channels. This update runs N times completely forward and backward to obtain the worst perturbation δ_{N}. However, the model weight θ is updated once to use only δ_{N}. Model training is N times slower because of this process. In contrast, the 'free' training outputs the model weights θ on the same backward channel while calculating the δ gradient, allowing model weight updates to be calculated in parallel with perturbation updates.
'Free' training has the same robustness and accuracy as standard adversarial training does. However, the training costs are the same as those of clean training. The 'free' strategy accumulates a gradient of \(\nabla_{\theta } L\) in each iteration and updates the model weight θ through this gradient. During training process, the model runs the inner circle T times, each time calculating the gradient of θ_{t1} and δ_{t} by taking a step along the average gradient at H^{(l)} + δ_{0}, …, H^{(l)} + δ_{T1}. Formally, the optimization step is
Availability of data and materials
We acquired the Cdataset of diseasedrug associations, from the Comparative Toxicogenomics Database [44] (http://ctdbase.org/). We screened the Fdataset of diseasedrug interactions from the OMIM database [46] (https://www.omim.org/) and DrugBank database [47] (https://www.drugbank.ca/). These two datasets and the source code are available at: https://github.com/ghli16/RAFGAE.
Abbreviations
 GAT:

Graph attention network
 GAE:

Graph autoencoder
 FMAT:

Free multiscale adversarial training
 TPR:

True positive rate
 FPR:

Falsepositive rate
 ROC:

Receiver operating characteristic
 AUC:

Area under ROC curve
 CV:

Cross validation
References
Rifaioglu AS, Atas H, Martin MJ, CetinAtalay R, Atalay V, Doğan T. Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases. Brief Bioinform. 2019;20(5):1878–912.
Ashburn TT, Thor KB. Drug repositioning: identifying and developing new uses for existing drugs. Nat Rev Drug Discov. 2004;3(8):673–83.
Dickson M, Gagnon JP. Key factors in the rising cost of new drug discovery and development. Nat Rev Drug Discov. 2004;3(5):417–29.
Padhy BM, Gupta YK. Drug repositioning: reinvestigating existing drugs for new therapeutic indications. J Postgrad Med. 2011;57(2):153.
Xue H, Li J, Xie H, Wang Y. Review of drug repositioning approaches and resources. Int J Biol Sci. 2018;14(10):1232.
Pushpakom S, Iorio F, Eyers PA, Escott KJ, Hopper S, Wells A, Doig A, Guilliams T, Latimer J, McNamee C, Norris A, Sanseau P, Cavalla C, Pirmohamed M. Drug repurposing: progress, challenges and recommendations. Nat Rev Drug Discov. 2019;18(1):41–58.
Baker NC, Ekins S, Williams AJ, Tropsha A. A bibliometric review of drug repurposing. Drug Discov Today. 2018;23(3):661–72.
Nosengo N. New tricks for old drugs. Nature. 2016;534(7607):314–6.
Jarada TN, Rokne JG, Alhajj R. A review of computational drug repositioning: strategies, approaches, opportunities, challenges, and directions. J Cheminform. 2020;12(1):1–23.
Mohamed K, Yazdanpanah N, Saghazadeh A, Rezaei N. Computational drug discovery and repurposing for the treatment of COVID19: a systematic review. Bioorg Chem. 2021;106: 104490.
Fahimian G, Zahiri J, Arab SS, Sajedi RH. RepCOOL: computational drug repositioning via integrating heterogeneous biological networks. J Transl Med. 2020;18(1):1–10.
Traylor JI, Sheppard HE, Ravikumar V, Breshears J, Raza SM, Lin CY, Patel SR, DeMonte F. Computational drug repositioning identifies potentially active therapies for chordoma. Neurosurgery. 2021;88(2):428.
Bai L, Scott MK, Steinberg E, Kalesinskas L, Habtezion A, Shah NH, Khatri P. Computational drug repositioning of atorvastatin for ulcerative colitis. J Am Med Inform Assoc. 2021;28(11):2325–35.
Dai W, Liu X, Gao Y, Chen L, Song J, Chen D, Gao K, Jiang YS, Yang YP, Chen JX, Lu P. Matrix factorizationbased prediction of novel drug indications by integrating genomic space. Comput Math Methods Med. 2015;2015:275045.
Zhang W, Zou H, Luo L, Liu Q, Wu W, Xiao W. Predicting potential side effects of drugs by recommender methods and ensemble learning. Neurocomputing. 2016;173:979–87.
Huang F, Qiu Y, Li Q, Liu S, Ni F. Predicting drugdisease associations via multitask learning based on collective matrix factorization. Front Bioeng Biotechnol. 2020;8:218.
Luo H, Li M, Wang S, Liu Q, Li Y, Wang J. Computational drug repositioning using lowrank matrix approximation and randomized algorithms. Bioinformatics. 2018;34(11):1904–12.
Zhang W, Yue X, Lin W, Wu W, Liu R, Huang F, Liu F. Predicting drugdisease associations by using similarity constrained matrix factorization. BMC Bioinform. 2018;19:1–12.
Yang M, Wu G, Zhao Q, Li Y, Wang J. Computational drug repositioning based on multisimilarities bilinear matrix factorization. Brief Bioinform. 2021;22(4):bbaa267.
Zhang W, Xu H, Li X, Gao Q, Wang L. DRIMC: an improved drug repositioning approach using Bayesian inductive matrix completion. Bioinformatics. 2020;36(9):2839–47.
Hu L, Zhang J, Pan X, Yan H, You ZH. HiSCF: leveraging higherorder structures for clustering analysis in biological networks. Bioinformatics. 2021;37(4):542–50.
Chu Y, Kaushik AC, Wang X, Wang W, Zhang Y, Shan X, Salahub DR, Xiong Y, Wei DQ. DTICDF: a cascade deep forest model towards the prediction of drugtarget interactions based on hybrid features. Brief Bioinform. 2021;22(1):451–62.
Yang K, Zhao X, Waxman D, Zhao XM. Predicting drugdisease associations with heterogeneous network embedding. Chaos Interdiscip J Nonlinear Sci. 2019;29(12):123109.
Luo Y, Zhao X, Zhou J, Yang J, Zhang Y, Kuang W, Peng J, Chen L, Zeng J. A network integration approach for drugtarget interaction prediction and computational drug repositioning from heterogeneous information. Nat Commun. 2017;8(1):573.
Zhao BW, Hu L, You ZH, Wang L, Su XR. HINGRL: predicting drug–disease associations with graph representation learning on heterogeneous information networks. Brief Bioinform. 2022;23(1):bbab515.
Zhang H, Cui H, Zhang T, Cao Y, Xuan P. Learning multiscale heterogenous network topologies and various pairwise attributes for drug–disease association prediction. Brief Bioinform. 2022;23(2):bbac009.
Luo H, Wang J, Li M, Luo J, Peng X, Wu FX, Pan Y. Drug repositioning based on comprehensive similarity measures and birandom walk algorithm. Bioinformatics. 2016;32(17):2664–71.
Cai L, Lu C, Xu J, Meng Y, Wang P, Fu X, Su Y. Drug repositioning based on the heterogeneous information fusion graph convolutional network. Brief Bioinform. 2021;22(6):bbab319.
Xuan P, Ye Y, Zhang T, Zhao L, Sun C. Convolutional neural network and bidirectional long shortterm memorybased method for predicting drug–disease associations. Cells. 2019;8(7):705.
Liu H, Zhang W, Song Y, Deng L, Zhou S. HNetDNN: inferring new drug–disease associations with deep neural network based on heterogeneous network features. J Chem Inf Model. 2020;60(4):2367–76.
Peng L, Tan J, Xiong W, Zhang L, Wang Z, Yuan R, Li Z, Chen X. Deciphering ligand–receptormediated intercellular communication based on ensemble deep learning and the joint scoring strategy from singlecell transcriptomic data. Comput Biol Med. 2023;2023: 107137.
Xuan P, Gao L, Sheng N, Zhang T, Nakaguchi T. Graph convolutional autoencoder and fullyconnected autoencoder with attention mechanism based method for predicting drugdisease associations. IEEE J Biomed Health Inform. 2020;25(5):1793–804.
Coşkun M, Koyutürk M. Node similaritybased graph convolution for link prediction in biological networks. Bioinformatics. 2021;37(23):4501–8.
Zeng X, Zhu S, Liu X, Zhou Y, Nussinov R, Cheng F. deepDR: a networkbased deep learning approach to in silico drug repositioning. Bioinformatics. 2019;35(24):5191–8.
Yu Z, Huang F, Zhao X, Xiao W, Zhang W. Predicting drug–disease associations through layer attention graph convolutional network. Brief Bioinform. 2021;22(4):bbaa243.
Feng Q, Dueva E, Cherkasov A, Ester M. PADME: a deep learningbased framework for drug–target interaction prediction. https://arxiv.org/abs/1807.09741 (2019).
Meng Y, Lu C, Jin M, Xu J, Zeng X, Yang J. A weighted bilinear neural collaborative filtering approach for drug repositioning. Brief Bioinform. 2022;23(2):bbab581.
Gu Y, Zheng S, Yin Q, Jiang R, Li J. REDDA: integrating multiple biological relations to heterogeneous graph neural network for drugdisease association prediction. Comput Biol Med. 2022;150: 106127.
Yang M, Luo H, Li Y, et al. Drug repositioning based on bounded nuclear norm regularization. Bioinformatics. 2019;35(14):i455–63.
Li J, Zhang S, Liu T, et al. Neural inductive matrix completion with graph convolutional networks for miRNAdisease association prediction. Bioinformatics. 2020;36(8):2538–46.
Kingma DP. A method for stochastic optimization. ArXiv Prepr. 2014.
Niu M, Zou Q, Wang C. GMNN2CD: identification of circRNA–disease associations based on variational inference and graph Markov neural networks. Bioinformatics. 2022;38(8):2246–53.
Shi Z, Zhang H, Jin C, Quan X, Yin Y. A representation learning model based on variational inference and graph autoencoder for predicting lncRNAdisease associations. BMC Bioinform. 2021;22(1):1–20.
Davis AP, Murphy CG, Johnson R, Lay JM, LennonHopkins K, SaraceniRichards C, Sciaky D, King BL, Rosenstein MC, Wiegers TC, Mattingly CJ. The comparative toxicogenomics database: update 2013. Nucleic Acids Res. 2013;41(D1):D1104–14.
Gottlieb A, Stein GY, Ruppin E, Sharan R. PREDICT: a method for inferring novel drug indications with application to personalized medicine. Mol Syst Biol. 2011;7(1):496.
Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, Chang Z, Woolsey J. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. 2006;34(suppl_1):D668–72.
Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2005;33(suppl_1):D514–7.
Vidal D, Thormann M, Pons M. LINGO, an efficient holographic text based method to calculate biophysical properties and intermolecular similarities. J Chem Inf Model. 2005;45(2):386–93.
Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E. The Chemistry Development Kit (CDK): an opensource Java library for chemoand bioinformatics. J Chem Inf Comput Sci. 2003;43(2):493–500.
Van Driel MA, Bruggeman J, Vriend G, Brunner HG, Leunissen JA. A textmining analysis of the human phenome. Eur J Hum Genet. 2006;14(5):535–42.
Kaiming H, Shaoqing R, Jian S. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2016:770–778.
Sharma V, Dyreson C. Covid19 screening using residual attention network an artificial intelligence approach. 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE. 2020:1354–1361.
Belkin M, Niyogi P, Sindhwani V. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res. 2006;7(11).
Kipf TN, Welling M. Variational graph autoencoders. https://arxiv.org/abs/1611.07308 (2016).
Li G, Luo J, Xiao Q, Liang C, Ding P. Predicting microRNAdisease associations using label propagation based on linear neighborhood similarity. J Biomed Inform. 2018;82:169–77.
Wang F, Zhang C. Label propagation through linear neighborhoods. Proceedings of the 23rd international conference on Machine learning. 2006:985–992.
Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. https://arxiv.org/abs/1409.0473 (2014).
Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Commun ACM. 2017;60(6):84–90.
Gan Z, Chen YC, Li L, et al. Largescale adversarial training for visionandlanguage representation learning. Adv Neural Inf Process Syst. 2020;33:6616–28.
Kong K, Li G, Ding M, Wu Z, Zhu C, Ghanem B, Taylor G, Goldstein T. Robust optimization as data augmentation for largescale graphs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022:60–69.
Chen T, Kornblith S, Norouzi M, Hinton G. A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR. 2020:1597–1607.
Shafahi A, Najibi M, Ghiasi MA, Xu Z, Dickerson J, Studer C, Davis LS, Taylor G, Goldstein T. Adversarial training for free!. Adv Neural Inf Process Syst. 2019;32.
Acknowledgements
Not applicable.
Funding
This work is supported by the National Natural Science Foundation of China (Grant Nos. 62362034, 61862025, 62372279, and 62002116), the Natural Science Foundation of Jiangxi Province (Grant Nos. 20232ACB202010, 20212BAB202009, 20181BAB211016), and the Natural Science Foundation of Shandong Province (Grant No. ZR2023MF119).
Author information
Authors and Affiliations
Contributions
GL and JL conceived and designed the study. GL and SL implemented the experiments and drafted the manuscript. CL and QX analyzed the results. All the authors have read and approved the final manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons AttributionNonCommercialNoDerivatives 4.0 International License, which permits any noncommercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/byncnd/4.0/.
About this article
Cite this article
Li, G., Li, S., Liang, C. et al. Drug repositioning based on residual attention network and free multiscale adversarial training. BMC Bioinformatics 25, 261 (2024). https://doi.org/10.1186/s12859024058935
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12859024058935