 Research
 Open access
 Published:
CCLDTI: contributing the contrastive loss in drug–target interaction prediction
BMC Bioinformatics volume 25, Article number: 48 (2024)
Abstract
Background
The Drug–Target Interaction (DTI) prediction uses a drug molecule and a protein sequence as inputs to predict the binding affinity value. In recent years, deep learningbased models have gotten more attention. These methods have two modules: the feature extraction module and the task prediction module. In most deep learningbased approaches, a simple task prediction loss (i.e., categorical cross entropy for the classification task and mean squared error for the regression task) is used to learn the model. In machine learning, contrastivebased loss functions are developed to learn more discriminative feature space. In a deep learningbased model, extracting more discriminative feature space leads to performance improvement for the task prediction module.
Results
In this paper, we have used multimodal knowledge as input and proposed an attentionbased fusion technique to combine this knowledge. Also, we investigate how utilizing contrastive loss function along the task prediction loss could help the approach to learn a more powerful model. Four contrastive loss functions are considered: (1) maxmargin contrastive loss function, (2) triplet loss function, (3) Multiclass Npair Loss Objective, and (4) NTXent loss function. The proposed model is evaluated using four wellknown datasets: Wang et al. dataset, Luo's dataset, Davis, and KIBA datasets.
Conclusions
Accordingly, after reviewing the stateoftheart methods, we developed a multimodal feature extraction network by combining protein sequences and drug molecules, along with protein–protein interaction networks and drug–drug interaction networks. The results show it performs significantly better than the comparable stateoftheart approaches.
Introduction
Drug–target interactions (DTI) prediction is vital to drug discovery, as it helps to identify potential interactions between drugs and targets [1,2,3,4]. In particular, DTI prediction focuses on identifying whether the specific proteins interact with a drug compound or not [5]. Additionally, it offers guidance on drug repurposing, multidrug pharmacology, drug resistance, and side effect prediction [6, 7]. The traditional biomedical measurement of DTI through in vitro experiments is considered reliable, but it is costly, timeconsuming, and inefficient, particularly when dealing with largescale datasets [8,9,10,11]. However, computational methods for DTI prediction have been receiving increased attention [12,13,14]. The current techniques for predicting DTI can be categorized into three distinct groups: ligandbased [15], dockingbased [16], and machine learningbased approaches [10].
In recent years, DTI prediction has gotten more attention [17,18,19]. The introduced methods could be divided into two categories: featurebased methods and similaritybased methods. Zhang and Xie [20] introduced a DTI model based on nonnegative matrix factorization. They introduced a new L_2,1 regularization term to guarantee the sparsity of the feature matrices derived through nonnegative matrix factorization. They have proved that the obtained solution converges to the KKT point. Featurebased methods include two main modules: the feature extraction module and the task prediction module. In the feature extraction module, raw sequences of protein and drug molecules should be mapped to discriminative feature spaces. Ozturk et al. [21] introduced a DeepDTA model, which utilizes two 1D convolution networks to learn feature space for drugs and proteins. Then, the drug and protein feature vectors are concatenated to be fed into the task prediction model. Karimi et al. [22] introduced a semisupervised method that first learns two sequencetosequence models to learn an initial representation of a drugtarget pair. Then, it is used as an initializer for the RNNCNN network as a feature extractor of the pair. Li et al. [13] introduced a cocontrastive learningbased method for DTI prediction to learn more discriminative representation for drug target pairs. To do so, they have utilized inhomogeneous graph representation. Qian et al. [23] introduced an approach using the drug chemical text information and the drug 2D structure image as input. Moreover, they have utilized a bidirectional multihead crossattentional module to encode drug and target interaction features. Zhang et al. [24] have used a transformer based model containing graphbased layers to extract features from drug molecules and a convolutional network to extract features from protein sequences. YazdaniJahromi et al. [25] introduced a method called AttentionSiteDTI. They treat the drug–target complex as a sentence to identify the effective protein binding sites that contribute to the drug–target interaction. In the task prediction module, the goal is to take the feature descriptor of the drugtarget pair to produce the task label as output. Many approaches use a simple multilayer perceptron as a task prediction network. Tayebi et al. [26] introduced UnbiasedDTI, which focuses on the imbalance issue of the active/inactive classes in DTI. They have introduced an ensemble of deeplearning models to cope with this issue. He et al. [27] extract crossview knowledge, including the sequence and network views for drugs and targets. They have utilized contrastive loss to learn better feature vectors for drugs and targets. To do so, they have defined auxiliary contrastive losses, including (1) contrasting similar and dissimilar drug feature vectors in sequence view, (2) contrasting similar and dissimilar drug feature vectors in network view, (3) contrasting similar and dissimilar target feature vectors in sequence view and (4) contrasting similar and dissimilar target feature vector in network views. Li et al. [13] introduced a new Supervised Graph Cocontrastive Learning for Drug–Target Interaction Prediction called SGCLDTI. Thay have defined two graphs: topological graph and semantic graph where in these graphs, nodes are the drugtarget pairs. Then, supervised contrastive loss is defined over these feature representaions. Zhnag et al. [28] introduced a new method in DTI called MRBDTA. They have introduced a modified version of the transformer encoder with skip connections. Also, they have introduced an effective approach to better encode the knowledge of the interaction site between drug and protein. In [29], a graph convolutional network (GCN) extracts features from proteins and drugs. To do so, they have extract protein 2D graph by using protein contact matrix and its physicochemical properties of residues. To extract the intramolecular interactions, they have utilized crossattention layers. Then, inter and intramolecular features are fused to feed into the MLP network.
In this paper, the research question is, "How do the different contrastive loss functions impact the drug target interaction prediction model's performance?". To investigate this research question, we present a new approach with two stages: (1) the first stage considers architecture to extract appropriate features for proteins and drugs, and (2) the second stage, a combinational loss function that includes task prediction loss and contrastive loss. For the feature extractor network, the first stage, we have utilized multimodal knowledge as input, including the drug molecule, protein sequences, protein–protein interaction networks, and drug–drug interaction networks. To extract features from the protein–protein interaction graph and drug–drug interaction graph, we have used the Node2vec network. To extract features from protein sequences and drug molecules, the 1Dconvolution neural networks are used. We have used the twosided attention mechanism to fuse the knowledge of these different modalities. Finally, the outputs of these networks are concatenated and fed into a multilayered perceptron (MLP) to predict the affinity value. To recap, this comprehensive approach allows for a more complete understanding of the complex relationship between drugs and their targets, potentially leading to more accurate predictions. To investigate the effect of different contrastive loss functions, we have considered four important contrastive loss functions: (1) triplet loss function, (2) maxmargin contrastive loss function, (3) Multiclass Npair Loss Objective, and (4) NTXent loss function. The overall architecture of the proposed model is shown in Fig. 1. In the proposed approach, we have two loss functions to train the model: (1) the task prediction loss and (2) the contrastive loss function. In the training step, the model is first trained by the contrastive loss function, and then we train the model based on the prediction loss function. Next, this procedure is repeated until convergence is happened. It should be noted that providing data for contrastive loss functions is important. Each input data includes two drugtarget pairs in the maxmargin contrastive loss function. In the triplet loss function, we need three drugtarget pairs, including anchor, positive, and negative. In Multiclass Npair Loss Objective and NTXent loss functions, each input sample contains N drugtarget pairs.
We evaluated the proposed approach on four wellknown datasets: Wang et al. [30], Luo's dataset [31], KIBA [32], and Davis [33]. The results show significant improvements compared to stateoftheart approaches and the base approach. It confirms that learning the discriminative feature space of the drugtarget pair helps the task prediction model to predict the affinity value accurately.
To recap, the contributions of this paper are as follows:

1.
We have utilized a multimodal feature extractor network. It means that the proposed method leverages various sources of information beyond considering the drug molecule and protein sequences. Specifically, it takes into account the drug–drug interaction network and protein–protein interaction network, providing a broader perspective on the interplay between drugs and their targets.

2.
We have proposed an attentionbased fusion technique to combine the knowledge of the different modalities. To do so, we have utilized a twosided attention mechanism.

3.
We have used four powerful contrastive loss functions along the task prediction loss to learn more discriminative feature space.

4.
We have done huge experiments to compare the contrastive loss functions in learning more discriminative feature space.

5.
The results confirm the effectiveness of using contrastive loss functions along the task prediction loss function.
This paper is organized as follows: first, the problem formulation is given, and then, the proposed method is explained in detail. Next, the evaluations of the method’s performance are presented. Finally, the paper highlights its effectiveness and suggests potential areas for further research.
Proposed method
In this section, we have given the details of the proposed method. The main contributions of this paper are to (1) fuse the multimodal knowledge using the attentionbased module and (2) evaluate how different contrastive loss functions could impact drug–target interaction prediction. To do so, first, we have given the problem formulation. Next, the model's architecture is given, and finally, we have defined the different contrastive loss functions.
Problem formulation
Given \(\{ \left( {d^{\left( i \right)} ,p^{\left( i \right)} } \right),l^{\left( i \right)} )\}\) where \(\left( {d^{\left( i \right)} ,p^{\left( i \right)} } \right)\) is a drugtarget pair and \(l^{\left( i \right)}\) is its corresponding affinity value or activity label (active or inactive). A drug, \(d^{\left( i \right)}\), is shown by Simplified Molecular Input Line Entry System (SMILES) sequence, and ith protein is shown by aminoacid sequence. SMILES is a language to translate a threedimensional chemical molecule into a string of symbols. The main goal is to design a system that takes the drugtarget pair as input and predicts affinity value as output.
Model architecture
The architecture of the proposed approach is presented in this section. It consists of three subnetworks: protein feature encoder, drug feature encoder, and affinity value predictor (as task predictor). This paper uses the protein–protein interaction, drug–drug interaction networks, and protein sequence and drug molecule as input. PPI network is fed into the node2vec to extract feature vectors, and the same procedure is done for DDI. For extracting features from drug molecules and protein sequences, two 1D CNN networks are utilized. To combine the knowledge of the drugs and proteins, we have utilized the attention mechanism. In this case, we have utilized a twosided attention mechanism. First, the drug features are considered as a query, and protein features are considered as key and value. Conceptually, it weights each local substructure of the protein sequence contributing to the drug features. Then, the protein features are considered as a query, and drug features are considered as key and value. It determines the contribution of each local substructure of the drug molecule in updating protein features. Finally, the drug molecule features, drug–drug interaction graph features, protein sequence features, and protein–protein interaction graph features are concatenated and fed into the task prediction network. The task prediction network is a multilayer perceptron. A schematic view of the model architecture is shown in Fig. 1. In the following, the whole feature encoder is shown by \(N_{E}\). To recap, the network \(N_{E}\), takes drug SMILES, protein sequence, PPI, and DDI as input and returns the feature descriptor as output.
Contrastive loss function
In this section, the different types of loss functions are introduced and defined. In metric learning, metrics are learned to measure the similarity or dissimilarity between objects. Contrastive loss functions were introduced specifically for metric learning, aiming to optimize the parameters of these functions using deep neural networks. The resulting model can capture complex relationships between features and generate highquality representations by embedding data points into a lowerdimensional space through deep neural networks. Ultimately, the objective is to create a model that renders a pair of examples with the same label more similar than a pair of examples with different labels. In this paper, four types of contrastive loss functions are used as auxiliary loss functions to learn a better model, and finally, in the experimental section, we evaluate these loss functions and explain how they perform.
Maxmargin contrastive loss
The maxmargin contrastive loss function was initially introduced by Hadsell et al. [34]. This loss function aims to maximize the distance between the pair of samples that belong to different classes. The maxmargin contrastive loss function is defined as follows:
where \(z_{i}\) denotes the output of the feature encoder network for the ith sample \(z_{i} = N_{E} \left( {d^{\left( i \right)} ,p^{\left( i \right)} } \right)\). This loss function for samples with similar labels minimizes the Euclidean distance between their corresponding feature vector. The Euclidean distance between the dissimilar samples (with different class labels) should be greater than the predefined margin threshold m.
Triplet loss function
The triplet loss function was first introduced by Weinberger [35], then it was used as a loss function by Facenet to train the deep neural network [36]. This loss function operates on triplets. Given \(\left\{ {\left( {d,p} \right),\left( {d,p} \right)^{ + } ,\left( {d,p} \right)^{  } } \right\}\) as a triplet include an anchor sample shown by \(\left( {d,p} \right)\), positive sample shown by \(\left( {d,p} \right)^{ + }\) which has a same class label with anchor sample, and negative sample shown by \(\left( {d,p} \right)^{  }\) which has a different class label with an anchor sample. This loss function is defined as follows:
where \(m\) shows the margin, this loss function aims to minimize the distance between the feature embedding of the anchor and positive samples and maximize the distance between the anchor and negative samples.
One of the most important disadvantages of the triplet loss function is that only one negative example in each sample is considered, and the relation of that negative example with other negative samples (especially from different negative classes) is not considered. This problem leads to slow convergence for the triplet loss function.
Multiclass Npair loss objective
This loss function is introduced by Sohn [37] for the first time. Given \(\left\{ {\left( {d,p} \right),\left( {d,p} \right)^{ + } ,\left( {d,p} \right)^{  ,1} ,\left( {d,p} \right)^{  ,2} , \ldots , \left( {d,p} \right)^{  ,N  1} } \right\}\) as (N + 1)tuple of the training samples where \(\left( {d,p} \right)\) is the anchor sample. Also, \(\left( {d,p} \right)^{ + }\) denotes the positive samples to \(\left( {d,p} \right)\) and \(\left( {d,p} \right)^{  ,i}\) shows ith negative sample to \(\left( {d,p} \right)\). Hence, the Npair loss function is defined as follows:
where \(z\) and \(z^{ + }\) denotes the output of the feature encoder network for anchor and positive sample. Also, \(z_{k}\) denotes the output of the feature encoder network for \(k{\text{th}}\) negative sample. As it is clear, it is the generalized version of the triplet loss function, which considers more than one negative example. It is shown that when N is set to two, it is identical to the triple loss function. One of the most important disadvantages of minimizing Eq. (3) loss function is that generating a batch is expensive. For each batch sample, we need (N + 1)tuple. Sohn [37] considered this issue by introducing a new approach to generating batches.
NTXent loss function
NTXent was first introduced by Chen et al. [38] for normalized temperaturescaled crossentropy loss. This loss function is similar to multiclass Npair loss with the difference that a new variable called temperature is introduced to consider the scale of the similarity values. Chen et al. [38] introduced the NTXent loss function for semisupervised learning. Khosla et al. [39] modified this loss function for a supervised setting, which is defined as follows:
where \(\tau\) denotes the temperature parameter, one of the most important findings about the temperature is that it could help the approach to learn a better model from hard samples. Chen et al. [38] showed that the value of the temperature parameter is dependent on batch sizes and the number of training epochs. Also, \(A\left( i \right)\) shows all samples in the batch distinct from \(i\), and \(P\left( i \right)\) is the set of all samples in the batch that they have the same label with \(i{\text{th}}\) sample.
The proposed approach uses these contrastive loss functions along the taskspecific loss function to learn a better model. In other words, the overall loss function of the proposed model is defined as follows:
where \({\mathcal{L}}_{contrastive}\) is one of four introduced contrastive loss functions and \({\mathcal{L}}_{task prediction}\) is the taskspecific loss function. If the affinity value is continuous, the taskspecific loss function is the meansquared error, and if it is discrete, the taskspecific loss function is the categorical crossentropy. It should be noted that all introduced contrastive loss functions are supervised, and they utilize the corresponding discrete class labels. Hence, we need to convert the continuous labels to discrete ones for the regression task to use in contrastive loss functions.
Experiments
In this section, the experimental results are given. Four wellknown datasets are used to evaluate the proposed method: Wang et al. [30], Luo's dataset [31], KIBA [32], and Davis[33]. In the following, we first introduce datasets; next, the experimental setting is explained. After that, evaluation metrics are introduced, and finally, the obtained results are given and analyzed.
Datasets
Wang et al. dataset: there are six heterogeneous networks included in Wang et al. [30]: (1) drug–drug interactions network, (2) protein–protein interactions network, (3) drug–protein interaction network, (4) drug–disease associations, (5) protein–disease associations, and (6) drug side effects associations. The drug–target interaction network contains 1923 edges extracted from Drugbank Version 3.0 [40,41,42,43]. In this paper, we have used only the drug–drug interactions network, protein–protein interactions network, and drug–protein interaction network.
KIBA dataset: it is a wellknown DTI dataset containing 117,657 interaction pairs. These pairs are from 2,068 unique drugs and 229 unique target proteins. The affinity value for each pair is measured by the KIBA score, which is an integration of IC_{50}, K_{(i)}, and K_{(d)} scores [44]. KIBA is a large dataset, and there are many varieties in the unique number of drugs and proteins. For the KIBA dataset, similar to [44], the threshold value is set to 12.1 and it is used to convert the predicted continuous values into binary values.
Davis dataset: it is another wellknown DTI dataset containing 25,772 interaction pairs. These pairs are from 68 unique drugs and 442 unique target proteins. In this dataset, the binding affinity is measured by \(k_{d}\) value. To have a more stable learned model, the \(k_{d}\) value should be transformed into the log space as follows:
This study also converts the predicted continuous values into binary values by applying thresholds. Similar to [44], the selected threshold for Davis is set to 7.
Luo Dataset: This dataset is a heterogeneous graph [31] in which there are four different types of nodes: proteins (1512 nodes), drugs (708 nodes), sideeffects (4192 nodes), and diseases (5603 nodes). Also, there are eight types of edges (i.e., interaction), including drug–protein interaction (1923 edges), protein–protein interaction (7363 edges), drug–drug interaction (10,036 edges), drug–disease interaction (199,214 edges), drug–side effect interaction (80,164 nodes) and protein–disease interaction (1,596,745 edges).
Evaluation metrics
We must select important evaluation metrics for regression and classification tasks in the proposed approach. In the regression task, we choose two metrics to evaluate the performance: (1) The Concordance Index (CI) measures the degree of ranking agreement between the predicted and ground truth values. (2) The R^{2} measure provides insight into the percentage of the dependent variable variance that the model can explain. For the classification task, we have considered five evaluation measures: (1) Recall, which measures the ratio of positive samples that are correctly classified from all positive samples; (2) Precision, which considers how good the classifier is at avoiding false alarms.; (3) Accuracy measures the ratio of correctly classified samples; (4) Area under the ROC curve (AUCROC), and (5) Area under the precisionrecall curve (AUCPR).
Results
This section presents the results obtained on four datasets. First, the results of the ablation study by Wang et al. are shown in Fig. 2. The ablation study evaluates six versions of the proposed method: (1) v1: the network is trained without attentionbased fusion and contrastive loss functions. In this case, a simple concatenation is used to fuse the multimodal knowledge. (2) v2: in this case, the architecture is the same as the proposed model, and the contrastive loss is not used. In the following models, the architecture is the same as the proposed architecture, and the effect of the different contrastive loss functions is evaluated. (3) Triplet loss: The overall loss function is equal to the sum of the task prediction loss and the triplet loss; (4) Maxmargin loss: The loss function for this case is the sum of task prediction loss and maxmargin loss; (5) Multiclass Npair loss: the overall loss function is the sum of task prediction loss and Multiclass Npair loss, and (6) NTXent loss: the overall loss function sums the task prediction loss and the NTXent loss. As is shown in the proposed approach, the contrastive loss function is set to one of the four mentioned losses, and the obtained results are reported.
A comparison of the proposed method with stateoftheart methods is shown in Fig. 3. Our approach is compared to five stateoftheart approaches, including MultiDTI [45], DTINet [31], NeoDTI [46], HNM [30], and TripletMultiDTI [3]. As shown in four metrics, the proposed method performs better than the other comparable approaches. It confirms that utilizing an appropriate contrastive loss function along the task prediction loss helps the model learn more discriminative feature space, leading to increased performance.
The obtained results on Luo's dataset are given in Table 1. As shown in accuracy and AUROC, the proposed method performs better than the other approaches. Also, our approach achieves a comparable performance in other metrics compared to the best stateoftheart approaches. It should be noted that MOVE utilizes a contrastive loss function [47], too, and our approach could improve three out of six measures over this approach.
Table 2 shows the results of the proposed method in the Davis dataset. For the Davis and KIBA datasets, we have compared the proposed method with the following approaches: KronRLS [48], SimBoost [44], DeepDTA [21], DeepCDA [1], SimCNNDTA [49], GraphDTA [50], NerLTRDTA [51], and TripletMultiDTI [3]. As shown, the obtained results are reported for the four different contrastive loss functions and a model with only task prediction loss. The obtained results are significantly better than TripletMultiDTI when the NTXent loss function is used as a contrastive loss function [3]. To statistically evaluate the proposed method, we have used the paired ttest. In this test, the null hypothesis states that there is no significant difference between the proposed approach and the comparing methods. Based on the reported pvalues in Table 2, we reject the null hypothesize with a pvalue lower than 30% for all stateoftheart methods except the TripletMultiDTI approach.
Table 3 shows the results obtained by applying the proposed method to the KIBA dataset. As presented, the proposed method effectively increases the performance with respect to the comparable approaches. It should be noted that the task is a regression task for the Davis and KIBA datasets. It means that the model predicts a continuous value. This leads us to utilize CI measures for both of these datasets. Also, we have converted the continuous affinity value to a binary label by thresholding like [1, 21, 44]. The CI measure and AUPR are increased by 2.9% and 5.6% over the best stateoftheart method. In other words, it means the model learns a strong model by utilizing both the appropriate contrastive loss function and the prediction loss function. Based on the reported pvalues in Table 3, we reject the null hypothesis with a pvalue lower than 20% for most stateoftheart methods.
Conclusion
This paper focuses on this research question: "How contrastive loss function along the task prediction loss could help the approach to learn a more discriminative model?". We have selected four important contrastive loss functions to investigate and used them as auxiliary loss functions. However, we believe that a feature extraction network may be beneficial in learning a strong model. Accordingly, after reviewing the stateoftheart methods, we developed a multimodal feature extraction network by combining protein sequences and drug molecules, along with protein–protein interaction networks and drug–drug interaction networks. To fuse the multimodal knowledge, we have proposed to use an attentionbased fusion technique.
One of the advantages of the proposed method, which leads to performance improvement, is that it utilizes a powerful loss function. The loss function guides the optimization process during the backpropagation. Hence, using powerful loss functions leads to an improvement in the performance and the generalization capabilities of trained models. The loss function in most DTI approaches is based on the error between the predicted outputs and the ground truth labels without considering the representation vector of the drugtarget pair. As a result of this work, we introduce a novel loss function that combines the task prediction loss with a contrastive loss function.
To evaluate the proposed method, it is applied to four wellknown datasets: Wang et al., Luo's dataset, Davis, and KIBA datasets. A huge experiment is done to show the effectiveness of the proposed method. Based on the results obtained, the proposed method could improve the performance.
One of the limitations of the proposed method is the computational complexity. In Multiclass Npair Loss and NTXent loss functions, each batch sample needs (N + 1)tuple, which is practically intractable. Although, we have utilized an introduced approach by Sohn [32] to generate batches. Still, it needs more computing power. The other limitation is finding the best strategy to generate batches. In future work, providing more informative batches for DTI will be considered.
In recent years, ncRNAs are recognized as a new class of drug targets due to its effectiveness evidence in gene expression and disease progression [52, 53]. In future work, by providing proteindisease and ncRNAdisease graphs as additional inputs, we can modify the approach to predict small moleculencRNA associations.
Availability of data and materials
The sample data and code for CCLDTI: https://github.com/dehghan1401/CCLDTI.
Abbreviations
 DTI:

Drug–target interactions
 PPI:

Protein–protein interaction
 DDI:

Drug–drug interaction
 CNN:

Convolutional neural network
 SMILES:

Simplified molecularinput lineentry system
 CI:

Concordance Index
 AUCPR:

Area under the precisionrecall curve
 AUCROC:

Area under ROC curve
References
Abbasi K, Razzaghi P, Poso A, Amanlou M, Ghasemi JB, MasoudiNejad A. DeepCDA: deep crossdomain compound–protein affinity prediction through LSTM and convolutional neural networks. Bioinformatics. 2020;36(17):4633–42.
Xia X, Zhu C, Zhong F, Liu L. MDTips: a multimodaldata based drug–target interaction prediction system fusing knowledge, gene expression profile and structural data. Bioinformatics. 2023;39:btad411.
Dehghan A, Razzaghi P, Abbasi K, Gharaghani S. TripletMultiDTI: multimodal representation learning in drug–target interaction prediction with triplet loss function. Expert Syst Appl. 2023;232:120754.
Zhang Y, Hu Y, Han N, Yang A, Liu X, Cai H. A survey of drug–target interaction and affinity prediction methods via graph neural networks. Comput Biol Med. 2023;163:107136.
Palhamkhani F, Alipour M, Dehnad A, Abbasi K, Razzaghi P, Ghasemi JB. DeepCompoundNet: enhancing compound–protein interaction prediction with multimodal convolutional neural networks. J Biomol Struct Dyn. 2023. https://doi.org/10.1080/07391102.2023.2291829.
Xue H, Li J, Xie H, Wang Y. Review of drug repositioning approaches and resources. Int J Biol Sci. 2018;14(10):1232.
Mongia A, Majumdar A. Drug–target interaction prediction using multi graph regularized nuclear norm minimization. PLoS ONE. 2020;15(1):e0226484.
Li F, Zhang Z, Guan J, Zhou S. Effective drug–target interaction prediction with mutual interaction neural network. Bioinformatics. 2022;38(14):3582–9.
Chen X, Yan CC, Zhang X, Zhang X, Dai F, Yin J, Zhang Y. Drug–target interaction prediction: databases, web servers and computational models. Brief Bioinform. 2016;17(4):696–712.
Wen M, Zhang Z, Niu S, Sha H, Yang R, Yun Y, Lu H. Deeplearningbased drug–target interaction prediction. J Proteome Res. 2017;16(4):1401–9.
Huang K, Fu T, Glass LM, Zitnik M, Xiao C, Sun J. DeepPurpose: a deep learning library for drug–target interaction prediction. Bioinformatics. 2020;36(22–23):5545–7.
Hu L, Fu C, Ren Z, Cai Y, Yang J, Xu S, Xu W, Tang D. SSELMneg: spherical searchbased extreme learning machine for drug–target interaction prediction. BMC Bioinform. 2023;24(1):38.
Li Y, Qiao G, Gao X, Wang G. Supervised graph cocontrastive learning for drug–target interaction prediction. Bioinformatics. 2022;38(10):2847–54.
Tanoori B, Zolghadri Jahromi M, Mansoori EG. Binding affinity prediction for binary drug–target interactions using semisupervised transfer learning. J Comput Aided Mol Des. 2021;35:883–900.
Keiser MJ, Roth BL, Armbruster BN, Ernsberger P, Irwin JJ, Shoichet BK. Relating protein pharmacology by ligand chemistry. Nat Biotechnol. 2007;25(2):197–206.
Shaikh N, Sharma M, Garg P. An improved approach for predicting drug–target interaction: proteochemometrics to molecular docking. Mol BioSyst. 2016;12(3):1006–14.
Bagherian M, Sabeti E, Wang K, Sartor MA, NikolovskaColeska Z, Najarian K. Machine learning approaches and databases for prediction of drug–target interaction: a survey paper. Brief Bioinform. 2021;22(1):247–69.
Ezzat A, Wu M, Li XL, Kwoh CK. Computational prediction of drug–target interactions using chemogenomic approaches: an empirical survey. Brief Bioinform. 2019;20(4):1337–57.
Tanoori B, Jahromi MZ, Mansoori EG. Drugtarget continuous binding affinity prediction using multiple sources of information. Expert Syst Appl. 2021;186:115810.
Zhang J, Xie M. Graph regularized nonnegative matrix factorization with L 2, 1 norm regularization terms for drug–target interactions prediction. BMC Bioinform. 2023;24(1):375.
Öztürk H, Özgür A, Ozkirimli E. DeepDTA: deep drug–target binding affinity prediction. Bioinformatics. 2018;34(17):i821–9.
Karimi M, Wu D, Wang Z, Shen Y. DeepAffinity: interpretable deep learning of compound–protein affinity through unified recurrent and convolutional neural networks. Bioinformatics. 2019;35(18):3329–38.
Qian Y, Li X, Wu J, Zhang Q. MCLDTI: using drug multimodal information and bidirectional crossattention learning method for predicting drug–target interaction. BMC Bioinform. 2023;24(1):323.
Zhang P, Wei Z, Che C, Jin B. DeepMGTDTI: transformer network incorporating multilayer graph information for drug–target interaction prediction. Comput Biol Med. 2022;142:105214.
YazdaniJahromi M, Yousefi N, Tayebi A, Kolanthai E, Neal CJ, Seal S, Garibay OO. AttentionSiteDTI: an interpretable graphbased model for drug–target interaction prediction using NLP sentencelevel relation classification. Brief Bioinform. 2022;23(4):bba272.
Tayebi A, Yousefi N, YazdaniJahromi M, Kolanthai E, Neal CJ, Seal S, Garibay OO. UnbiasedDTI: mitigating realworld bias of drug–target interaction prediction by using deep ensemblebalanced learning. Molecules. 2022;27(9):2980.
He C, Qu Y, Yin J, Zhao Z, Ma R, Duan L. Crossview contrastive representation learning approach to predicting DTIs via integrating multisource information. Methods. 2023;218:176–88.
Zhang L, Wang CC, Chen X. Predicting drug–target binding affinity through molecule representation block based on multihead attention and skip connection. Brief Bioinform. 2022;23(6):bbac468.
Zhang L, Wang CC, Zhang Y, Chen X. GPCNDTA: prediction of drugtarget binding affinity through crossattention networks augmented with graph features and pharmacophores. Comput Biol Med. 2023;166:107512.
Wang W, Yang S, Zhang X, Li J. Drug repositioning by integrating target information through a heterogeneous network model. Bioinformatics. 2014;30(20):2923–30.
Luo Y, Zhao X, Zhou J, Yang J, Zhang Y, Kuang W, Peng J, Chen L, Zeng J. A network integration approach for drug–target interaction prediction and computational drug repositioning from heterogeneous information. Nat Commun. 2017;8(1):573.
Tang J, Szwajda A, Shakyawar S, Xu T, Hintsanen P, Wennerberg K, Aittokallio T. Making sense of largescale kinase inhibitor bioactivity data sets: a comparative and integrative analysis. J Chem Inf Model. 2014;54(3):735–43.
Davis MI, Hunt JP, Herrgard S, Ciceri P, Wodicka LM, Pallares G, Hocker M, Treiber DK, Zarrinkar PP. Comprehensive analysis of kinase inhibitor selectivity. Nat Biotechnol. 2011;29(11):1046–51.
Hadsell R, Chopra S, LeCun Y. Dimensionality reduction by learning an invariant mapping. Comput Vis Pattern Recognit. 2006;2:1735–42.
Weinberger KQ, Blitzer J, Saul LK. Distance metric learning for large margin nearest neighbor classification. In: Advances in neural information processing systems; 2006. p. 1473–80.
Schroff F, Kalenichenko D, Philbin J. Facenet: a unified embedding for face recognition and clustering. In: IEEE conference on computer vision and pattern recognition; 2015. p. 815–23.
Sohn K. Improved deep metric learning with multiclass npair loss objective. Adv Neural Inf Process Syst. 2016;29:1857–65.
Chen T, Kornblith S, Norouzi M, Hinton G. A simple framework for contrastive learning of visual representations. In: International conference on machine learning; 2020. p. 1597–607.
Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Isola P, Maschinot A, Liu C, Krishnan D. Supervised contrastive learning. Adv Neural Inf Process Syst. 2020;33:18661–73.
Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, Pon A, Banco K, Mak C, Neveu V, Djoumbou Y. DrugBank 3.0: a comprehensive resource for ‘omics’ research on drugs. Nucleic Acids Res. 2010;39:D1035–41.
Keshava Prasad T, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, Telikicherla D, Raju R, Shafreen B, Venugopal A. Human protein reference database—2009 update. Nucleic Acids Res. 2009;37(suppl_1):D767–72.
Davis AP, Murphy CG, Johnson R, Lay JM, LennonHopkins K, SaraceniRichards C, Sciaky D, King BL, Rosenstein MC, Wiegers TC. The comparative toxicogenomics database: update 2013. Nucleic Acids Res. 2013;41(D1):D1104–14.
Kuhn M, Campillos M, Letunic I, Jensen LJ, Bork P. A side effect resource to capture phenotypic effects of drugs. Mol Syst Biol. 2010;6(1):343.
He T, Heidemeyer M, Ban F, Cherkasov A, Ester M. SimBoost: a readacross approach for predicting drug–target binding affinities using gradient boosting machines. J Cheminformatics. 2017;9(1):1–14.
Zhou D, Xu Z, Li W, Xie X, Peng S. MultiDTI: drug–target interaction prediction based on multimodal representation learning to bridge the gap between new chemical entities and known heterogeneous network. Bioinformatics. 2021;37(23):4485–92.
Wan F, Hong L, Xiao A, Jiang T, Zeng J. NeoDTI: neural integration of neighbor information from a heterogeneous network for discovering new drug–target interactions. Bioinformatics. 2019;35(1):104–11.
Qu Y, He C, Yin J, Zhao Z, Chen J, Duan L. MOVE: integrating multisource information for predicting DTI via crossview contrastive learning. In: 2022 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE; 2022. p. 535–40.
Pahikkala T, Airola A, Pietila S, Shakyawar S, Szwajda A, Tang J, Aittokallio T. Toward more realistic drug–target interaction predictions. Brief Bioinf. 2014;16(2):325–37.
Shim J, Hong ZY, Sohn I, Hwang C. Prediction of drug–target binding affinity using similaritybased convolutional neural network. Sci Rep. 2021;11(1):1–9.
Nguyen T, Le H, Quinn TP, Nguyen T, Le TD, Venkatesh S. GraphDTA: predicting drugtarget binding affinity with graph neural networks. Bioinformatics. 2021;37(8):1140–7.
Ru X, Ye X, Sakurai T, Zou Q. NerLTRDTA: drugtarget binding affinity prediction based on neighbor relationship and learning to rank. Bioinformatics. 2022;38(7):1964–71.
Chen X, Guan NN, Sun YZ, Li JQ, Qu J. MicroRNAsmall molecule association identification: from experimental results to computational models. Brief Bioinform. 2020;21(1):47–61.
Chen X, Zhou C, Wang CC, Zhao Y. Predicting potential small molecule–miRNA associations based on bounded nuclear norm regularization. Brief Bioinform. 2021;22(6):bbab328.
Acknowledgements
We thank the anonymous reviewers for their constructive comments on the original manuscript.
Funding
None.
Author information
Authors and Affiliations
Contributions
AD: Result analysis, conceptualization, methodology, data curation, writing, review and editing. KA: Methodology, formal analysis, conceptualization, writing original draft and editing. HB: Result analysis, programming, visualization and writing the initial draft. PR and SGH: Supervision, conceptualization, review, editing and project administration.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Dehghan, A., Abbasi, K., Razzaghi, P. et al. CCLDTI: contributing the contrastive loss in drug–target interaction prediction. BMC Bioinformatics 25, 48 (2024). https://doi.org/10.1186/s12859024056713
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12859024056713