Skip to main content

Drug–target interaction prediction via multiple classification strategies

Abstract

Background

Computational prediction of the interaction between drugs and protein targets is very important for the new drug discovery, as the experimental determination of drug-target interaction (DTI) is expensive and time-consuming. However, different protein targets are with very different numbers of interactions. Specifically, most interactions focus on only a few targets. As a result, targets with larger numbers of interactions could own enough positive samples for predicting their interactions but the positive samples for targets with smaller numbers of interactions could be not enough. Only using a classification strategy may not be able to deal with the above two cases at the same time. To overcome the above problem, in this paper, a drug-target interaction prediction method based on multiple classification strategies (MCSDTI) is proposed. In MCSDTI, targets are firstly divided into two parts according to the number of interactions of the targets, where one part contains targets with smaller numbers of interactions (TWSNI) and another part contains targets with larger numbers of interactions (TWLNI). And then different classification strategies are respectively designed for TWSNI and TWLNI to predict the interaction. Furthermore, TWSNI and TWLNI are evaluated independently, which can overcome the problem that result could be mainly determined by targets with large numbers of interactions when all targets are evaluated together.

Results

We propose a new drug-target interaction (MCSDTI) prediction method, which uses multiple classification strategies. MCSDTI is tested on five DTI datasets, such as nuclear receptors (NR), ion channels (IC), G protein coupled receptors (GPCR), enzymes (E), and drug bank (DB). Experiments show that the AUCs of our method are respectively 3.31%, 1.27%, 2.02%, 2.02% and 1.04% higher than that of the second best methods on NR, IC, GPCR and E for TWLNI; And AUCs of our method are respectively 1.00%, 3.20% and 2.70% higher than the second best methods on NR, IC, and E for TWSNI.

Conclusion

MCSDTI is a competitive method compared to the previous methods for all target parts on most datasets, which administrates that different classification strategies for different target parts is an effective way to improve the effectiveness of DTI prediction.

Background

Drug development is a time-consuming and expensive process that is plagued with the problem known as the high attrition rate. This led to the practitioners’ great interest in drug repositioning due to its potential to reduce the time, cost, risk and effort inherent in developing new drugs. Some drug-target interaction (DTI) prediction methods have been proposed in the past several years, which can be divided into two categorists: similarity based methods and feature based methods.

Similarity based methods mainly use the similarity relationships between samples. Some similarity based methods proposed new optimization objective functions for similarity decomposition [1,2,3,4,5]. Ban et al. proposed a neighborhood regularized logistic matrix factorization [1], which can utilize the neighborhood information. Cui et al. proposed a L2,1 graph regularized matrix factorization to learn flow patterns in combination with the previous matrix-decomposition method [2]. Li et al. proposed a multi-view low rank embedding to integrate multi-view representations of drugs and proteins [3]. Mongia et al. proposed a multi-graph regularized nuclear norm minimization based method for DTI, which predicts the interactions between drugs and target proteins from three inputs [4]. Wang et al. proposed an effective computational model of dual Laplacian graph regularized matrix completion, where the drug and the target similarities can be fully exploited by using a dual Laplacian graph regularization term [5].

Although designing different optimization objective functions can make the decomposition factor meet different conditions, the decomposition factor heavily depended on similarity. Some similarity based methods designed a new method to calculate the similarity [6,7,8]. Zong et al. calculate the similarities within linked tripartite network, which enhanced existing association discovery methods by using a topology-based similarity measure [6]. Ding et al. developed a fuzzy bipartite local model, where multiple kernels are constructed in drug and target spaces [7]. Fan et al. introduced the similarity information of drugs/targets, and proposed the neighborhood constraint to regularize the unknown cases [8]. However, because the distributions of drugs and targets are very complex, it is hardly to design a good similarity calculation method. To overcome this problem and in order to make better use of the information contained in the feature, some feature-based methods have also been proposed.

Firstly, the feature is very important for the feature based methods, and some researcher proposed new feature extraction methods to extract more features from targets and drugs [10,11,12,13,14,15]. Li et al. used rotation forest in DTI, where local phase quantization descriptors are used to extract evolutionary information in the position-specific scoring matrix (PSSM) [10]. Farshid et al. used Adaboost in DTI, where many feature extraction methods were used in the same time [11]. Jiang et al. proposed an ensemble system integrating k nearest neighbor classifier with a novel feature encoding scheme to identify DTI [12]. Mahmud et.al predicted DTI based on drug chemical structure and protein sequence by using extreme gradient boosting (XGBoost) with synthetic minority oversampling technique (SMOTE) [13]. Han et al. predicted DTI by using Lasso with random forest based on evolutionary information and chemical structure [14]. Xu et al. infer the DTI by using graph isomorphic network and word vector matrix [15].

Secondly, because it is unclear which feature is the best, many features could be extracted for the target and the drug in the same time [16,17,18], and then some dimensional reduction methods have been proposed for DTI [19,20,21,22,23]. Ezzat et al. proposed a framework for DTI prediction by leveraging both feature dimensionality reduction and ensemble learning [19]. Aman et al. proposed a bagging based ensemble framework named for DTI prediction by using dimensionality reduction and active learning to deal with class-imbalanced data [20]. Mahmud et al. predicted DTI based on protein features with under sampling and feature selection techniques with boosting [21]. Feng et al. proposed a supervised discriminative sparse principal component analysis [22] and a graph Laplacian sparse principal component analysis for dimensional reduction [23]

Thirdly, some new classifiers are also proposed for DTI [24,25,26,27,28,29,30]. He et al. presented a method called SimBoost that predicts continuous values of binding affinities of compounds and proteins and thus incorporates the whole interaction spectrum from true negative to true positive interactions [24]. Rayhan et al. proposed an ensemble model which uses extra tree as weak learners inside a boosting scheme while holding on to the best model per iteration [25]. Pliakos et al. proposed a new learning method which addresses DTI prediction as a multi-output prediction task by learning ensembles of multi-output bi-clustering trees on reconstructed networks [26]. Zhang et al. used several random projections to build an ensemble random projection tree system [27]. Buza et al. selected a random subset of features and used only the selected features when training the local models [28]. Ezzat et al. proposed another ensemble learning method that incorporates techniques to address the issues of between class imbalance and within-class imbalance [29]. Ye et al. proposed a multiple output deep neural network to enhance the deep neural network learning ability with a kind of auxiliary classifier layers [30].

Although the above methods can solve some problems from different sides, they do not solve the problem that different targets are with very different numbers of interactions. For targets with larger numbers of interactions (TWLNI), many positive samples can be generated. But for targets with smaller numbers of interactions (TWSNI), so few interactions can only produce a small number of positive samples. As a result, different classification strategies should be designed for these two types of targets. Based on the above idea, in this paper, a new DTI prediction method based on multiple classification strategies (MCSDTI) is proposed.

In MCSDTI, targets are firstly divided into TWLNI and TWSNI. For TWLNI, because drug-target interactions are very sparsely distributed in the drug-target pair space, predicting interactions for these targets together with their neighbors could introduce more negative samples than positive samples. Furthermore, these targets could own enough positive samples for predicting their interactions. So interactions of TWLNI are predicted by using their owned positive samples. For TWSNI, numbers of positive samples of targets are too small. So the positive samples of their neighbors are used together to predict their interactions. As a result, using different classification strategies in different situations can make better use of the advantages of these classification strategies. What's more, TWLNI and TWSNI are evaluated independently, as the result could be mainly determined by TWLNI when TWLNI and TWSNI are evaluated together.

The contribution of this paper can be concluded as follows:

  1. (1)

    As far as we known, this is the first time that interactions of TWLNI and TWSNI are predicted by different classification strategies, which can make better use of the advantages of these classification strategies in different situations.

  2. (2)

    TWLNI and TWSNI are evaluated independently, which can overcome the problem that the improvement for TWSNI could be overwhelmed when TWLNI and TWSNI are evaluated together.

  3. (3)

    Designe a new classifier and a new evaluator for TWLNI, which can overcome the negative impact of samples of the neighbors.

  4. (4)

    Find a good classifier for TWSNI, whose effect for TWSNI has been overwhelmed by TWLNI.

  5. (5)

    Provide a new research idea for DTI prediction, as interactions of TWLNI and TWSNI cannot be predicted in the same time.

The remaining of this paper is organized as follows. Section 2 introduces the Methods. Section 3 introduces the results. Finally, Section 4 gives concluding remarks.

Methods

Data and motivation

Five datasets are used in this work, such as nuclear receptors (NR) [31], ion channels(IC) [31], G protein coupled receptors (GPCR) [31] and enzymes (E) [31], and drug bank (DB) [32]. The simplified molecular input line entry system (SIMILES) of drugs and sequences of targets are offered by these datasets, which can be used to extract the features for drugs and targets. The simple statistics for five datasets are given in the Table 1, where the 2nd to 4th rows respectively represented the number of drugs, targets and interactions, the 5th row represented the proportion of interactions among drug-target pair space.

Table 1 Simple statistics for datasets

By analyzing these datasets, two conclusions can be obtained. Firstly, drug-target interactions are very sparsely distributed in the drug-target pair space, which can be shown by the Table 1. It can be seen from the 5-th row of the Table 1 that the percentage of interactions in the drug-target pairs space are only 6.4%, 3.0%, 3.4% 0.99%, 0.064% respectively on NR, GPCR, IC, E and DB, which shows that the number of interactions is much smaller than the number of drug-target pairs.

Secondly, most of the interactions focus on only a few targets, which can be shown by the Fig. 1. Distributions of interactions on four datasets are given. Targets are divided into five parts according to the numbers of interactions of targets and each part owns the same number of targets, where targets in the 1-th part owns smaller numbers of interactions, targets in the 2nd part owns larger numbers of interactions, targets in the 3rd part owns more large numbers of interactions and so on. It can be seen from Fig. 1 that more than 60% of interactions focus on 20% of targets on GPCR, E, DB, and nearly 50% of interactions focus on 20% of targets on NR and IC. And then some targets are with larger numbers of interactions, but other targets are with smaller numbers of interactions.

Fig. 1
figure 1

The distribution of interactions on five datasets, where Feature vector extraction

As a result, it is difficult to design a prediction strategy that can handle all these cases. So in this paper, different classification strategies are designed for these two types of targets.

To predict the DTI for a drug target pair, the feature vectors for the drug and the target should be firstly extracted. Some types of features have been proposed for the drugs, such as molecular substructure fingerprints, constitutional, topological, quantum chemical properties, and geometrical. Here the PubChem molecular substructure fingerprint is extracted for the drug by PaDEL [33], where the input of PaDEL is the SMILES of the drug. The extracted drug feature is defined as D. In this type of representation, each molecular structure is described by a Boolean vector, which is a fingerprint of a structural key according to a substructure pattern of the predefined PubChem database [34]. This feature gives a direct relationship between the molecular and properties and retain the entire structure of the drug molecule [34].

More types of features have been also proposed for targets, such as amino acid composition, dipeptide composition, autocorrelation descriptors, composition, transition, distribution, quasi-sequence-order descriptors, pseudo-amino acid composition, amphiphilic pseudo-amino acid composition, topological descriptors for atom model, total amino acid properties. In this paper, all above features are extracted for targets by Protein features (PROFEAT) [35], where the input of PROFEAT is the sequence of the target. These features can describe the target from different aspects and the dimension of these features is not very big. The extracted target feature is defined as T.

The simple information of the extracted features is represented in the Table 2. It can be seen from Table 2 that dimensions of the drug feature, target feature and total feature are respectively 1024, 1437 and 2461. Furthermore, it also can be seen from the 4-th row of Table 1 that number of interactions of NR, GPCR, IC, E and DB are respectively 90, 635, 1476, 2926 and 12,674. Obviously, this is a high-dimensional small sample problem, which will be considered in designing classification strategies.

Table 2 Simple information of the extracted features

Overview of MCSDTI

Given drug features D, target features T, interaction matrix Y, drug similar Sd and target similar St, the flowchart of MCSDTI is shown. It can be seen from Fig. 2 that MCSDTI has 5 steps, where 1st and 5th steps are the input step and the output step. Step 2 to step 4 will be simply introduced in the following.

Fig. 2
figure 2

The flowchart of MCSDTI

In the preprocessing step, the targets are divided into TWLNI and TWSNI according to the number of interactions, where TWLNI contains targets with larger numbers of interactions, and TWSNI contains targets with smaller numbers of interactions. In the classification step, the TWLNI classifier and the TWSNI classifier are respectively designed for TWLNI and TWSNI, which can make better use of the advantages of these classifiers in different situations. In the evaluation step, the TWLNI evaluator and the TWSNI evaluator are respectively designed for TWLNI and TWSNI. Two evaluators are designed here, as percentages of interactions of targets with top number of interaction among all interactions are very big. And then the result could be mainly determined by TWLNI when all targets are evaluated together, which could make that the improvement for TWSNI is overwhelmed.

TWLNI classifier and evaluator

A larger number of positive samples can be generated for the TWLNI, and then there would be enough positive samples to predict the interactions of these targets. In this case, because drug-target interactions are very sparsely distributed in the drug-target pair’s space, after adding samples of neighbors, much more negative samples than positive samples would be added. And then the effect of predicting DTI for this target may be worsen, which can be shown by the Fig. 3, where Fig. 3a shows the samples of a target and Fig. 3b shows the samples after adding the samples of its neighbors, x is a testing sample of the target, x1 and x2 are two positive samples of this target, x3 and x4 are two negative samples of its neighbors. It can be seen from the Fig. 3b that many negative samples could be added around the positive samples of this target. As a result, the test sample x could be rightly predicted in the Fig. 3a but be wrongly predicted in the Fig. 3b.

Fig. 3
figure 3

An example used to show the negative impact of samples of the neighbors

To overcome the above problem, interactions of TWLNI are predicted by using their own positive samples in this paper. Given a training drug feature  set \(D = \{ d1,d2, \ldots ,du\} \in R^{u \times p}\), training target feature set  \(T = \{ t1,t2, \ldots ,tv\} \in R^{v \times q}\), and the corresponding interaction matrix \(Y \in R^{u \times v}\), where u is the number of drugs, p is the number of the drug features, v is the number of targets, and q is the number of target features. To predict the interaction of tj, D can be seen as u samples, \(Y,j\) can be seen as the corresponding class label. As a result, the pseudo code of TWLNI classifier can be shown by the Algorithm 1.

figure a

In the step 6 of Algorithm 1, classifier models can be utilized here. However, the number of positive samples is small and the dimension of the extracted feature is high, which should be considered by the utilized classifier model. By analyzing the principles of some classification models, the decision tree has the ability to deal with such problem. The decision tree is generated by a recursive method [36]. In each recursive step, a feature that can gain the most information is used to generate the child node of the decision tree. As a result, the decision tree is influenced by the number of useful features but not the total number of features

It can be seen from Algorithm 1 that this algorithm separately trains a classifier for each target. As a result, the evaluation criteria for each target should be also calculated separately. To more easily describe the evaluator, the pseudo code of TWLNI evaluator is shown in the Algorithm 2. It can be seen from the Algorithm 2 that the evaluation criteria result of tj is calculated by the step 4–10, and the mean of evaluation criteria results of all targets is calculated by the step 12.

figure b

TWSNI classifier and evaluator

Too few positive samples can be generated for TWSNI. In this case, there are not enough positive samples for this target to predict DTI, so other positive samples should be utilized to improve the effect of DTI prediction. An optional method is to use the positive samples generated by its neighbors.

However, according to the principle of clustering, neighbors of TWSNI would also with smaller number of interactions. As a result, a feature based classifier could be hardly trained in this case. To overcome this problem, a similar based method is used to predict the interactions for these targets. However, because the distributions of drugs and targets are very complex, the similarity calculated by the existing similarity calculation methods could be not good. Specially, the further away the drug or target is, the worse the similarity is. As a result, the nearest profile (NP) [31] is used to improve the DTI effect for TWSNI in this paper.

Given drug similar \(Sd \in R^{nd \times nd}\), target similar \(St \in R^{nt \times nt}\), and interaction matrix \(Y \in R^{nd \times nt}\), where nd and nt are the number of drugs and targets, the interaction \(Ytnew\) of a new target \(tnew\) can be predicted as following [31]:

$$Y(:,tnew) = St(tnew,tnearest)Y(:,tnearest)$$
(1)

where \(tnearest\) is the nearest target of \(tnew\) and \(Y(:,tnearest)\) is the interaction of \(tnearest\).

The interaction \(Ydnew\) of a new drug \(dnew\) can be predicted as following [31]:

$$Y(dnew,:) = Sd(dnew,dnearest)Y(dnearest,:)$$
(2)

where \(dnearest\) is the nearest target of \(dnew\) and \(Y(dnearest,:)\) is the interaction of \(dnearest\).

Finally, the interaction \(Y(dnew,tnew)\) of a drug-target pair \((dnew,tnew)\) can be predicted by mean of their scores.

The method NP is only used to evaluate the DTI effect for TWSNI. To utilize the information offered by their neighbors, all targets are used to calculate \(Y(dnew,tnew)\), as there are not enough positive samples for TWSNI.

After calculating all \(Y(dnew,tnew)\), only the evaluation criteria of TWSNI is output, as the result could be mainly determined by TWLNI, which could overwhelm the improvement for TWSNI. To more easily describe the processing, the pseudo code of TWSNI classifier and evaluator is shown in the Algorithm 3. It can be seen from the Algorithm 3 that the processing is not divided into training processing and testing processing, as the training processing and testing processing of the similar based method are processed in the same time.

figure c

Results

To verify the effectiveness of our proposed multiple classification strategies, our method are compared with the following methods, such as decision tree (DT)[36], random forest (RF) [36], nearest profile (NP) [31], weighted profile (WP) [31], network-based inference (NBI) [37], regularized least squares-avg (RLS) [38], regularized least squares-kron (RK) [9], ensemble decision tree (EDT) [19], ensemble kernel ridge regression ensemble (EKRR) [19] and so on.

Experimental setting

A standard fivefold cross validation is performed and the AUC for each method (i.e. the area under the receiver operating characteristic curve) is computed. More precisely, the drugs are divided into 5 parts, where one part is used for testing and other parts are used for training. For each of the methods being compared, 5 AUC scores were computed (one for each fold) and then averaged to give the final overall AUC score. The AUC score can be biased when the data is imbalanced. However, in this paper, TWLNI and TWSNI are evaluated independently, which means that only targets with similar imbalance are evaluated together. And then imbalance does not affect the effectiveness of AUC for each method. Furthermore, AUC is a good performance evaluation metric for binary classification problem. As a result, AUC is used as the evaluation metrics in this paper.

Many parameters should be set for the compared methods. Parameters of DT, RF, EDT and EKRR used in this paper are the same as that used in the Ref. [19]. Default parameter values were used for DT and MCSDTI as defined in MATLAB’s fitctree. The number of trees should be set for RF, which is set to 50. The dimensionality reduction parameter and the number of subspaces should be set for EDT, which are set to 0.8 and 50. The dimensionality reduction parameter, the number of subspaces, the decay term, the Tikhonov regularization parameter, and an adjustable parameter should be set for EKRR, which are set to 0.2, 20, 0.7, 1 and 0.5. The decay term, the Tikhonov regularization parameter, and an adjustable parameter should be set for RLS and RK, which are set to 0.7, 1 and 0.5. NP, WP and NBI do not need to set parameters.

All methods need to extract the drugs features D and targets features T, which can be extracted by the methods described in the subsection “Feature vector extraction”. For our method, the experiments results can be obtained by Algorithm 1, Algorithm 2 and Algorithm 3. For the other compared methods, the \(Y(dnew,tnew)\) for all testing drugs is firstly calculated by these methods. And then the experiment results for TWLNI are calculated by removing TWSNI and the experiment results for TWSNI are calculated by removing TWLNI.

The experiments for TWLNI

The experiment results are presented in Table 3. These experiments would be used to answer the following questions:

  1. (1)

    Which threshold \(\tau\) should be set for our method?

  2. (2)

    Is our method better than the compared methods?

Table 3 AUCs for TWLNI, where \(\tau\) used in Algorithm 1 are respectively set to 1, 3 and 5

As to the first problem, we compare AUCs of the compared methods when the threshold \(\tau\) is set to 1, 3 and 5, which is given in 3–5 columns in Table 3. Setting \(\tau\) to different values can show the adaptability of our algorithm. It can be seen from Table 3 that the AUCs of our method are all the best. Specifically, AUCs of our method are respectively 2.47%, 0%, 1.49%, 0.83% and 1.38% higher than that of the second best method when \(\tau = 1\), where the second best method are EDT, WP, EDT, WP and EKRR on NR, IC, GPCR, E and DB. AUCs of our method are respectively 2.32%, 1.48%, 2.41%, 2.65% and 0.87% higher than that of the second best method when \(\tau = 3\), where the second best method are RK, WP, EKRR, WP and EKRR on NR, IC, GPCR, E and DB. AUCs of our method are respectively 1.84%, 2.31%, 2.05%, 2.60% and 0.85% higher than that of the second best method when \(\tau = 5\), where the second best method are EKRR, WP, EDT, WP and EKRR on NR, IC, GPCR, E and DB. It can be seen from the above results that our method is obviously better than the compared methods regardless which value is set to \(\tau\) and much better than the compared methods when \(\tau\) is set to 3 and 5.

Furthermore, to better show the results of methods with different \(\tau\), the histogram form of Table 3 is given in Fig. 4. It can be seen form Fig. 4 that our method is obviously increased with the increase of \(\tau\) on NR, IC, E and DB, but most of the compared algorithms have no similar phenomena. The reason may be that more positive samples will be generated with the increase of \(\tau\) for a target, and then there would be enough positive samples to predict the interactions of this target. As a result, adding samples of neighbors may be worsening for predicting the DTI of this target.

Fig. 4
figure 4

Histogram of AUCs for TWLNI, where \(\tau\) used in Algorithm.1 are respectively set to 1, 3 and 5

As to the second problem, we will answer it from three aspects. Firstly, it can be seen from Table 3 and Fig. 4 that our method is the best regardless which value is set to \(\tau\) on all datasets. Specifically, it can be seen from the last column in Table 3 that our method is the best method on all 5 datasets, and the AUCs of our method are respectively 3.31%, 1.27%, 2.02%, 2.02% and 1.04% higher than the second best methods on NR, IC, GPCR, E and DB, where the second best methods are respectively RK, WP, EDT, WP and EKRR. They prove that our method owns the best effect for DTI predicting. Secondly, it can be seen from Table 3 and Fig. 4 that the second best methods are very different on different datasets or by setting different value for \(\tau\). It proves that our method is more stable than the compared methods. Thirdly, it can be seen from Fig. 4 that our method is obviously increased with the increase of \(\tau\) on most datasets, which provides a very good guide to the scope of application of our algorithm. As a result, our method is much better than the compared methods.

The experiments for TWSNI

The experiment results are presented in Table 4. These experiments would be used to answer the following questions:

  1. (1)

    Which threshold \(\tau\) should be set for our method?

  2. (2)

    Is our method better than the compared methods?

Table 4 AUCs for TWSNI, where \(\tau\) used in Algorithm 1 are respectively set to 1, 3 and 5

As to the first problem, we compare the AUCs of the compared methods when the threshold \(\tau\) is set to 1, 3 and 5, which is given in 3–5 columns in Table 4. It can be seen from Table 4 that the AUCs of our method are the best on NR, IC, E, and the second best on DB when \(\tau\) is set to 1. However, the AUCs of our method are worse than that of the most compared methods when \(\tau\) is set to 3 and 5. Specifically, our method is much worse than compared methods when \(\tau\) is set to 5.

Furthermore, to better show the results of algorithms with different \(\tau\), the histogram form of Table 4 is given in Fig. 5. It can be seen from Fig. 5 that AUCs of almost all methods are obviously increased with the increase of \(\tau\) on almost all datasets. However, the increase speed of our method is less than that of other methods. The reason may be that nearest profile is used to improve the DTI effect for TWSNI in this paper and nearest profile could be not very good for targets with a larger number of interactions.

Fig. 5
figure 5

Histogram of AUCs for TWSNI, where \(\tau\) used in Algorithm.1 are respectively set to 1, 3 and 5

As a result, \(\tau\) should be set to 1 for our method. Although \(\tau\) only can be set to 1, TWSNI classifier and TWSNI evaluator are also very useful and important. Firstly, the interaction of targets with a larger number of interactions can be predicted by TWLNI classifier and TWLNI evaluator. It can be seen from Table 3 that TWLNI classifier and TWLNI evaluator can obtain good results when \(\tau\) is set to 3 and 5. Secondly, TWSNI classifier and TWSNI evaluator own good results when \(\tau\) is set to 1. It can be seen from Table 4 that our method is the best on NR, IC, E, and the second best on DB when \(\tau\) is set to 1. Thirdly, the best compared method for TWLNI and TWSNI are not the same, which prove that using different classifier strategies for different targets may be necessary. And then separately designed the TWSNI classifier and TWSNI evaluator for TWSNI is important. It can be seen from Tables 3 and 4 that the best method for TWSNI is RLS, but RLS is not the best method for TWLNI.

As to the second problem, we compare the AUCs of the compared methods when \(\tau\) is set to 1, as TWSNI classifier and TWSNI evaluator are only used to improve the DTI effect for the targets with smaller numbers of interactions. It can be seen from the second column in Table 4 that our method is the best method on NR, IC and E, and the AUCs of our method are respectively 1.00%, 3.20% and 2.70% higher than that of the second best methods on NR, IC, and E, where the second best methods are RLS. It shows that our method is better than the compared methods on most datasets.

Furthermore, it can be seen from Table 4 that our method is worse than the most of the compared methods on GPCR, the reason may be that NP is used in our TWSNI classifier. NP can consider the problem that the similarity between drugs and the similarity between targets are not very precise, as only the nearest neighborhood is used to predict the DTI. However, this character also makes NP a little sensitive to the nearest neighborhood. As a result, our method can own good results on most datasets but owns bad result on GPCR. So if using our method to predict the DTI for TWSNI, many cross validation on training data should be firstly performance. Actually, most comparison algorithms are prone to the above phenomenon for TWSNI, as the positive samples are not enough for TWSNI. For example, RF is good on NR but bad on other three datasets. EKRR is the best on GPCR but not very good on other three datasets. RK is good on GPCR but not very good on NR and IC. As a result, our method can be also a good method to predict the DTI for targets with a small number of interactions in real applications.

Discussion

Different targets are with very different numbers of interactions and most of the interactions focus on only a few targets. And then some targets could own enough positive samples to predict their interactions but other targets cannot just use their own positive samples to predict their interactions. As a result, for targets that own enough positive samples, the effect of predicting DTI could be worse by adding samples of neighbors, as neighbors could own much more negative samples than positive samples. However, for targets that do not have enough positive samples, many other positive samples should be utilized to improve the effect of DTI prediction. Obviously, the interactions of different targets should be predicted by different methods.

Furthermore, another problem is also existed in that different targets are with very different numbers of interactions. If TWSNI and TWLNI are evaluated together, the result could be mainly determined by TWLNI, as most of the interactions focus on only a few targets. However, finding new interactions for TWSNI could be more important than finding new interactions for TWLNI in the real application of the DTI prediction. Obviously, new evaluators should be designed to increase the influence of TWSNI on the results of the experiment.

In this study, MCSDTI is designed according above analyses, which owns following advantages: firstly, interactions of TWLNI and TWSNI are predicted by different classification strategies, which can make better use of the advantages of these classification strategies in different situations, and the information contained in different targets can be more fully utilized. Secondly, TWLNI and TWSNI are evaluated independently, and then the DTI prediction effect of TWSNI can be fairly presented, which provides a new research goal for DTI prediction. It can be seen from Tables 3 and 4 that MCSDTI is much better than the compared methods on most datasets. Specifically, most comparison methods cannot obtain good results for TWLNI and TWSNI in the same and many methods can own a good result for TWSNI but not for TWLNI. They prove that interactions for different targets should be predicted by different methods and all targets cannot be evaluated together.

There are several interesting problems to be investigated in our future work. Firstly, in this paper, an existed method is used to improve the DTI effect for TWSNI. Although this method can play its advantages under our framework, the DTI prediction result is also not very good, and then a better method can be designed for TWSNI in the future. Secondly, a new adaptively MCSDTI framework can be designed, where the number of parts can be adaptively chosen and the threshold used to divide the part can be adaptively set.

Conclusions

This paper presents multiple classification strategies based drug-target interaction (MCSDTI) prediction method. In MCSDTI, targets are firstly divided into TWLNI and TWSNI; and then two classifiers and evaluators are respectively designed for TWLNI and TWSNI to predict the corresponding DTI. As a result, information of different target sets can be better used by different classification strategies; and the evaluation results obtained by different evaluation methods can fairer and more useful. The conducted experiments validate that MCSDTI is a competitive method compared to the previous ones. Most of methods cannot own both good DTI prediction results for TWLNI and TWSNI, but MCSDTI can be much better than the compared methods for both TWLNI and TWSNI on most datasets, which shows that designing different classification strategies for different targets is an effective way to improve the effectiveness of DTI prediction.

Availability of data and materials

The datasets processed for this article are freely available as described by [31] at http://web.kuicr.kyoto-u.ac.jp/supp/yoshi/drugtarget/ and freely available as described by [32] at https://go.drugbank.com/.

Abbreviations

DTI:

Drug-target interaction

MCSDTI:

Multiple classification strategies based drug–target interaction

TWSNI:

Targets with smaller numbers of interactions

TWLNI:

Targets with larger numbers of interactions

NR:

Nuclear receptors

IC:

Ion channels

GPCR:

G protein coupled receptors

E:

Enzymes

DB:

Drug bank

AUC:

Area under the receiver operating characteristic curve

PSSM:

Position-specific scoring matrix

XGBoost:

Extreme gradient boosting

SMOTE:

Synthetic minority oversampling technique

SIMILES:

Simplified molecular input line entry system

PROFEAT:

Protein features

NP:

Nearest profile

DT:

Decision tree

RF:

Random forest

WP:

Weighted profile

NBI:

Network-based inference

RLS:

Regularized least squares-avg

RK:

Regularized least squares-kron

EDT:

Ensemble decision tree

EKRR:

Ensemble kernel ridge regression ensemble

MATLAB:

Matrix laboratory

References

  1. Ban T, Ohue M, Akiyama Y, et al. NRLMFβ: beta-distribution-rescored neighborhood regularized logistic matrix factorization for improving the performance of drug-target interaction prediction. Biochem Biophys Rep, 2019.

  2. Cui Z, Gao Y, Liu J, et al. L2,1-GRMF: an improved graph regularized matrix factorization method to predict drug-target interactions. BMC Bioinform. 2019;20(8):1–13.

    CAS  Google Scholar 

  3. Li L, Cai M. Drug target prediction by multi-view low rank embedding. IEEE/ACM Trans Comput Biol Bioinf. 2019;16(5):1712–21.

    Article  CAS  Google Scholar 

  4. Mongia A, Majumdar A. Drug-target interaction prediction using multi graph regularized nuclear norm minimization. bioRxiv. 2018.

  5. Wang M, Tang C, Chen J, et al. Drug-target interaction prediction via dual Laplacian graph regularized matrix completion. BioMed Res Int., 2018: 1–12.

  6. Zong N, Kim H, Ngo V, et al. Deep mining heterogeneous networks of biomedical linked data to predict novel drug-target associations. Bioinformatics. 2017;33(15):2337–44.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Ding Y, Tang J, Guo F, et al. Identification of drug–target interactions via fuzzy bipartite local model . Neural Computing and Applications, 2019: 1–17.

  8. Fan X, Hong Y, Liu X, et al. Neighborhood constraint matrix completion for drug-target interaction prediction. In: Pacific-asia conference on knowledge discovery and data mining. 2018: 348–60.

  9. Laarhoven TV, Marchiori E. Predicting drug-target interactions for new drug compounds using a weighted nearest neighbor profile. PLoS ONE. 2013;8(6):e66952.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Li Y, Huang Y, You Z, et al. Drug-target interaction prediction based on drug fingerprint information and protein sequence. Molecules. 2019, 24(16).

  11. Rayhan F, Ahmed S, Shatabda S, et al. iDTI-ESBoost: identification of drug target interaction using evolutionary and structural features with boosting. Sci Rep. 2017;7(1):17731–17731.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Jiang J, Wang N, Chen P, et al. DrugECs: an ensemble system with feature subspaces for accurate drug-target interaction prediction. BioMed Res Int. 2017: 1–10.

  13. Hasan MSM, Chen WY, Jahan H, et al. iDTi-CSsmoteB: identification of drug-target interaction based on drug chemical structure and protein sequence using XGBoost with over-sampling technique SMOTE. IEEE Access. 2019;7:48699–714.

    Article  Google Scholar 

  14. Shi H, Liu S, Chen J, et al. Predicting drug-target interactions using Lasso with random forest based on evolutionary information and chemical structure. Genomics. 2019;111(6):1839–52.

    Article  CAS  PubMed  Google Scholar 

  15. Xu MQ, Zhang XL, Lin XL. Inferring Drug-target interactions using graph isomorphic network and word vector matrix. In: IEEE international conference on bioinformatics and biomedicine. 2020, p. B487.

  16. Zhang XL, Lin XL, Zhao JF, et al. Efficiently prediction hot spots in PPIs by combining random forest and synthetic minority over-sampling technique. IEEE/ACM Trans Comput Biol Bioinf. 2019;16(3):774–81.

    Article  CAS  Google Scholar 

  17. Lin XL, Zhang XL, Xu X. Efficient classification of hot spots and hub protein interfaces by recursive feature elimination and gradient boosting. IEEE/ACM Trans Comput Biol Bioinf. 2020;17(5):1525–34.

    Article  CAS  Google Scholar 

  18. Lin XL, Zhang XL. Prediction of hot regions in PPIs based on improved local community structure detecting. IEEE/ACM Trans Comput Biol Bioinf. 2018;15(5):1470–9.

    Article  CAS  Google Scholar 

  19. Ezzat A, Wu M, Li X, et al. Drug–target interaction prediction using ensemble learning and dimensionality reduction. Methods. 2017;129:81–8.

    Article  CAS  PubMed  Google Scholar 

  20. Aman S, Rinkle R. BE-DTI: ensemble framework for drug target interaction prediction using dimensionality reduction and active learning. Comput Methods Prog Biomed., 2018, p. 151–162.

  21. Mahmud SM, Chen W, Meng H, et al. Prediction of drug-target interaction based on protein features using undersampling and feature selection techniques with boosting. Anal Biochem. 2020, p. 589.

  22. Feng CM, Xu Y, Liu JX, et al. Supervised discriminative sparse PCA for com-characteristic gene selection and tumor classification on multiview biological data. IEEE Trans Neural Netw Learn Syst. 2019;30(10):2926–67.

    Article  PubMed  Google Scholar 

  23. Feng CM, Xu Y, Hou MX, et al. PCA via joint graph Laplacian and sparse constraint: identification of differentially expressed genes and sample clustering on gene expression data. BMC Bioinform. 2019;20:1–11.

    Article  Google Scholar 

  24. He T, Heidemeyer M, Ban F, et al. SimBoost: a read-across approach for predicting drug-target binding affinities using gradient boosting machines. J Cheminform. 2017;9(1):1–14.

    Article  Google Scholar 

  25. Rayhan F, Ahmed S, Farid D M, et al. CFSBoost: cumulative feature subspace boosting for drug-target interaction prediction. J Theor Biol. 2019, p. 1–8.

  26. Pliakos K, Vens C. Drug-target interaction prediction with tree-ensemble learning and output space reconstruction. BMC Bioinform. 2020;21(49):1–11.

    Google Scholar 

  27. Zhang J, Zhu M, Chen P, et al. DrugRPE: random projection ensemble approach to drug-target interaction prediction. Neurocomputing. 2017, p. 256–62.

  28. Buza K, Peska L. ALADIN: a new approach for drug-target interaction prediction. Eur Conf Mach Learn. 2017, p. 322–337.

  29. Ezzat A, Wu M, Li XL, et al. Drug-target interaction prediction via class imbalance-aware ensemble learning. BMC Bioinform. 2016;17:267–76.

    Article  Google Scholar 

  30. Ye Q, Zhang XL, Lin XL. Drug-target interaction prediction via multiple output deep learning. In: IEEE international conference on bioinformatics and biomedicine. 2020, p. B615.

  31. Yamanishi Y, Araki M, Gutteridge A, et al. Prediction of drug–target interaction networks from the integration of chemical and genomic spaces. Bioinformatics. 2008;24(13):1232–40.

    Article  Google Scholar 

  32. Knox C, Law V, Jewison T, et al. DrugBank 3.0: a comprehensive resource for omics research on drugs. Nucl Acids Res. 2011, p. D1035–D1041.

  33. Yap CW. PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem. 2011;32(7):1466–74.

    Article  CAS  PubMed  Google Scholar 

  34. Wang L, You ZH, Chen X, et al. A computational-based method for predicting drug-target interactions by using stacked autoencoder deep neural network. J Comput Biol. 2017;24:1–15.

    Google Scholar 

  35. Zhang P, Tao L, Zeng X, et al. PROFEAT Update: a protein features web server with added facility to compute network descriptors for studying omics-derived networks. J Mol Biol. 2017;429(3):416–25.

    Article  CAS  PubMed  Google Scholar 

  36. Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.

    Article  Google Scholar 

  37. Cheng FX, Liu C, Jiang J, et al. Prediction of drug-target interactions and drug repositioning via network-based inference. PLoS Comput Biol. 2012;8(5):e1002503.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Laarhoven TV, Nabuurs SB, Marchiori E. Gaussian interaction profile kernels for predicting drug-target interaction. Bioinformatics. 2011;21:3036–43.

    Article  Google Scholar 

Download references

Acknowledgements

The authors thank the members of Machine Learning and Artificial Intelligence Laboratory, School of Computer Science and Technology, Wuhan University of Science and Technology, for their helpful discussion within seminars.

About this supplement

This article has been published as part of BMC Bioinformatics Volume 22 Supplement 12 2021: Explainable AI methods in biomedical data science. The full contents of the supplement are available at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-22-supplement-12.

Funding

This work was supported by National Natural Science Foundation of China (No.61972299) funded all the research, materials and activities needed for the production and analysis of data. National Natural Science Foundation of China (No. 61502356) supported the intensive analyses of all simulations and for testing the other tools. Zhejiang Provincial Natural Science Foundation (No.LQ18F020006) supported the intensive analyses of all simulations and for testing the other tools. Hubei Province Natural Science Foundation of China (No. 2018CFB526) supported the intensive analyses of all simulations and for testing the other tools. Publication costs are funded by National Natural Science Foundation of China (No.61972299).

Author information

Authors and Affiliations

Authors

Contributions

QY designed the algorithm and analysed the experimental results. XZ participated in the implementation of algorithm and did the experiments. QY and XL drafted the original version of the paper. XZ helped rewriting the paper based on the original version. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Xiaolong Zhang.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent to publish

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ye, Q., Zhang, X. & Lin, X. Drug–target interaction prediction via multiple classification strategies. BMC Bioinformatics 22 (Suppl 12), 461 (2021). https://doi.org/10.1186/s12859-021-04366-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12859-021-04366-3

Keywords