Predicting effective drug combinations using gradient tree boosting based on features extracted from drug-protein heterogeneous network

Background Although targeted drugs have contributed to impressive advances in the treatment of cancer patients, their clinical benefits on tumor therapies are greatly limited due to intrinsic and acquired resistance of cancer cells against such drugs. Drug combinations synergistically interfere with protein networks to inhibit the activity level of carcinogenic genes more effectively, and therefore play an increasingly important role in the treatment of complex disease. Results In this paper, we combined the drug similarity network, protein similarity network and known drug-protein associations into a drug-protein heterogenous network. Next, we ran random walk with restart (RWR) on the heterogenous network using the combinatorial drug targets as the initial probability, and obtained the converged probability distribution as the feature vector of each drug combination. Taking these feature vectors as input, we trained a gradient tree boosting (GTB) classifier to predict new drug combinations. We conducted performance evaluation on the widely used drug combination data set derived from the DCDB database. The experimental results show that our method outperforms seven typical classifiers and traditional boosting algorithms. Conclusions The heterogeneous network-derived features introduced in our method are more informative and enriching compared to the primary ontology features, which results in better performance. In addition, from the perspective of network pharmacology, our method effectively exploits the topological attributes and interactions of drug targets in the overall biological network, which proves to be a systematic and reliable approach for drug discovery.

. Developing combinations of targeted agents is more difficult than developing a single agent [7], as inhibiting the cross-talks among multiple pathways depends on our insight into the pathway interdependencies underlying the cancer cell proliferation and survival in a specific cancer type [8,9]. The high-throughput screening (HTS) experiments currently used to evaluate drug combinations are still time-and cost-consuming because they rely heavily on the search for a large number of possible target combinations [10][11][12]. So, there is an urgent demand for rational and systematically in silico methods to narrow down the candidates for combinatorial drugs for wet-lab experimental validations [13].
Quite a few computational methods have been proposed to predict cancer sensitivity to combinatorial drugs [1,4,[14][15][16]. The existing methods can be roughly divided into two categories: system biology-based methods [17] and network-based analysis [15,18]. System biology-based methods mathematically model the perturbation of drugs using biochemical reactions and kinetic parameters, which are often limited to small scale and well-studied signaling pathway. Network-based methods often exploit genomic, chemical and pharmacological properties to build an overall network composed of the associations among drugs, proteins and pathways, and then adopt scoring rules [19,20], optimal combination searches [1,16,21], machine learning [4,18] to predict potential drug combinations. As network-based methods integrate various kinds of ontological features and interactions between different subject of interest, some of these methods achieve remarkable performance in predicting drug combinations. For example, Ligeti et al. [20] proposed so-called Target Overlap Score (TOS) prioritization function, which is defined for two drugs as the number of jointly perturbed targets divided by the number of all targets potentially affected by these two drugs, to rank candidate drug combinations. Pang et al. [1] proposed mixed integer linear programming to find balanced target set cover (BTSC) and minimum off target set cover (MOTSC) for combination therapy. Huang et al. [18] propose DrugComboRanker, which first builds a drug functional network based on their genomic profiles, and disease-specific signaling networks based on patients genomic profiles and interactome data, and then prioritize synergistic drug combinations by searching drugs whose targets are enriched in the complementary signaling modules of the disease signaling network. Matlock et al. [21] tried to find drug combinations maximizing sensitivity over tumor cell models while minimizing toxicity over normal cell models, and then proposed a lexicographic search algorithm to find optimal target set. In addition, some methods exploit the concept of synthetic lethality to discover combinatorial drugs [3,22,23]. However, most of previous methods are usually limited to the ability to dissect potential molecular mechanisms, or to associate multiple drugs to one disease in huge pharmacological space.
There have been many approaches that integrate multiple heterogeneous networks to infer the associations between biological entities, including lncRNA functions [24][25][26], lncRNA-disease associations [27], drug-disease associations [28,29] and gene functions inference [30]. Inspired by heterogeneous network-based inference, we ran random walk with restart on the drug-protein heterogenous network to extract features for drug combinations, and then trained gradient tree boosting classifier using the extracted features to predict new drug combinations. Concretely, we integrated a variety of data sources, including chemical structures of the drugs, protein sequences, and known drug-protein associations, to construct a drug-protein heterogeneous network. The random walk with restart procedure was implemented on the heterogenous network using the combinatorial drug and their targets as the initial probability, respectively. The converged probability distribution was used as feature vector of the drug combination. Based on the probability distribution vectors, we subsequently trained the gradient tree boosting (GTB) classifier, which achieved the AUC of 0.949 by 10-fold cross-validation. We also compared our method to other seven typical classifiers, including kNN, SVM, Logistic regression, Naive Bayes, AdaBoost, Random Forest and LogistBoost. The performance comparison results demonstrate that our proposed model significantly outperformed other traditional methods. From the perspective of network pharmacology, our method effectively make use of the topological attributes and functional interactions of drug targets in the protein-protein network.

Drug combination dataset
The set of effective drug combinations was obtained from DCDB 2.0 [31], a typical drug combination database focused on collecting verified drug combinations to facilitate further exploration, including theoretical modeling and simulation of such beneficial drug combinations. In total, the current version(2.0) of DCDB includes 1363 drug combinations (330 approved and 1033 investigational, including 237 unsuccessful usages), covering 904 individual drugs and 805 targets. We selected those combinations that are approved or under trials in DCDB as positive samples. Note that the number of non-effective drug combinations is actually enormous, much larger than that of effective in real world. Therefore, we generated a number of negative samples of drug combinations by randomly picking up pairwise drugs to balance the positive and negative samples in our benchmark set. The strategy of generation of negative samples has been widely adopted in the prediction of drug-target interactions and drug-disease associations [28,32]. Importantly, the drug set that we selected pairwise combinations is expanded from the individual drugs in DCDB to their most associated 3 drugs according to STITCH, yielding 3266 drugs in total. Finally, the benchmark drug combination set contains 1359 positive combinations and 1359 negative combinations.

Performance measures
We conducted performance evaluation using 10-fold cross validations. In particular, the training set were randomly divided into ten subsets and each subset had roughly equal size to others. Each subset was in turn used as the test set, and the remaining nine subsets were used as training set. This validation process was repeated ten times and each performance measure was averaged over the ten folds for performance evaluation. A couple of performance measures were used in our experiment, including precision (PRE), recall (REC), F-measure, Matthews correlation coefficient (MCC) and the area under the receiver operating characteristic curve (AUC). They are formally defined as below: in which TP and TN represent the numbers of correctly predicted positive and negative samples, FP and FN represent the numbers of wrong predicted positive and negative samples, respectively. Additionally, the AUC score is computed by varying the cutoff of the predicted scores from the smallest to the greatest value.

Impact of parameters on performance
To explore the impact of parameter λ, which is the probability of random walker jumping to different type of network, We gradually increased its value from 0.1 to 0.9 at interval of 0.1. The aforementioned metrics obtained by 10-fold cross-validation are shown in Table 1, which demonstrate that λ has a moderate impact on the prediction performance of our proposed method. In terms of AUC, the values fit approximately a parabola which hit the top 0.944 at λ 0.7, which was thus adopted in our subsequent experiments. For other two parameters introduced in random walk with restart, restart probability α and tradeoff η, we conducted similar tuning to determine their optimal values that achieve the best performance. As shown in Additional file 1: Table S1 and S2, the AUC measure reached the highest value when α and tradeoff η were equal to 0.2 and 0.9. According to the results, the restart probability α has a negligible effect on the AUC. Generally, the restart probability is a heuristical parameter without any theoretical guide or justification when selecting [33]. However, the heterogeneous network is established based on drug-drug similarity, protein-protein similarity and known drug-protein associations, resulting in a heterogeneous network with quantitative weighted edges. From this perspective, since the random walk simulates the influence of drugs in protein network, the convergence state will have a bias on higher weighted nodes. Therefore, the restart probability may have a slight effect on the final distribution. As a result, we set the three parameters λ, α and η to 0.7, 0.2 and 0.9 in the following performance comparison experiments.

Performance comparison to typical classifiers
To demonstrate the outstanding performance of our method, we carried out performance evaluation on the benchmark combination set by comparing our method with seven other typical classifiers, including kNN, SVM, Logistic regression, Naive Bayes, Random Forest, Adaboost and LogitBoost. Based on the derived feature distribution vectors, we implemented these competitive classifiers separately using R package [34] so as to conveniently reproduce our work. For Native Bayes, we adopted the R package e1071 [35] and its default setting. Also, logistic regression and SVM are implemented based on the e1071 R package, and logistic regression was run with default settings, while the misclassification penalty coefficient for SVM varied from 10 to 10000 by interval of 500 to achieve best performance. For KNN, R package kknn [36] was used to run the algorithm, in which the parameter k (k=1, 3, 5, 7 and 9) was enumerated to tune its performance. For the distance metric of kNN, we have tried Manhattan distance, Euclidean distance and Chebyshev distance and found that they yield to similar performance, thereby we adopted Chebyshev distance (q=5) in the performance evaluation. The R package randomForest [37] was used to run random forest algorithm and the number of trees varies from 60 to 500 by interval of 20. For boosting methods Adaboost and Logitboost, the R packages Adabag [38] and caTools were used, where the number of training iterations was tuned from 10 to 100 by interval of 5 and 10, respectively. The performance measures of each comparative method, including precision, recall, F1, MCC and AUC, achieved by the fine-tuned parameters, are shown in Table 2. Apparently, our proposed method significantly outperformed other classifiers in terms of almost all performance metrics.
To present clear performance comparison, the ROC curves of GTB and other seven classifiers are also illustrated in Fig 1. It can be demonstrated that GTB classifier greatly outperforms all other competitive methods, which achieves the highest AUC value 0.95, followed by Random forest and Adaboost at 0.86. The performance of Naive Bayes is the worst and gets only 0.508 AUC value.

Performance improvement by heterogenous network-derived features
To validate the effectiveness of the features extracted from drug-protein heterogeneous network, we conducted performance comparison between the primary ontology features and heterogenous network-based features. Due to different number of individual drugs and target proteins involved in drug combinations, we can not directly concatenate the drug fingerprints and protein GO annotations to construct feature vectors that are inconsistent in dimension. Instead, we first unified the chemical fingerprints of individual drugs in a combination, i.e. union of individual fingerprint vectors, as well as the union of GO terms of target proteins of individual drugs. Next,  Table 3. It can be demonstrated that the performance of GTB classifier with input derived from heterogeneous network-based features is vastly superior to that with primary ontology features. For example, the AUC value increased from 0.528 to 0.949 for GTB classifier. Moreover, we conducted performance comparison for other typical classifiers to validate the advantage of our extracted feature from drug-protein heterogenous network. As shown in Tables 2 and 3, the performance of all these classifiers were greatly boosted by extracting features from the random walk with restart on the heterogenous network.

Discussion
In this paper, we proposed a computational method for predicting effective combination drugs based on features derived from drug-protein heterogenous network by random walk with restart. In order to verify our proposed method, we conducted plenty of empirical experiments to compare the performance of our method to other typical classifiers on the benchmark dataset we constructed previously, and the experimental results significantly demonstrated that our method achieves state-of-the-art performance. Note that the input of the GTB classifier is the output of random walk with restart on the heterogeneous network, which is the probability distribution vector only accounting for 6,074 dimensions. Therefore, we believed that the heterogeneous network-derived features are more informative and have been dimension-reduced compared to the high-dimensional primary ontology features that may lead to curse of dimensionality when performing classification. As a result, the performance of GTB and other classifiers are significantly improved. In addition, the majority of current methods to predict drug combinations are limited to their size, in which pairwise drugs are most used. However, our proposed method can expand the size of drug combinations, which appreciably increases the practicality. It is worth noting that the protein network introduced in the random walk with restart is helpful to dig into the biological mechanism of drug combinations in vivo. In fact, the final probability distribution of certain drug combination derived by random walk with restart strongly suggests the indications of the drug combination to some extent. Taking the pairwise combination Docetaxel and Capecitabine as an example, which has been approved by FDA to treat metastatic breast cancer. We ranked the protein nodes according to the probability distribution, the protein with the highest probability is ENSP00000315644 and the third is ENSP00000252029, which are both encoded by Tyms gene. It has been shown that the polymorphisms of Tyms gene are associated with etiology of neoplasia, including breast cancer. In addition, the fourth and fifth are ENSP00000269571 and ENSP00000275493, which are encoded by Erbb2 and Egfr, respectively, are all highly linked to breast cancer. To further evaluate the potential of the protein network, we exemplified another pair of drug combination, Atorvastatin and Proguanil, which currently has no official indication. The resulting probability distribution  [40]. Therefore, anemia may be a potential indication of the drug combination Atorvastatin and Proguanil. In summary, we draw the conclusion that the probability distribution derived by random walk can effectively reveal the indication of drug combinations. We further checked the positive samples that are falsely classified, as negative samples are randomly generated. We found that the falsely determined samples by our method have low similarity to other samples. In fact, most existing computational models, which aim at the prediction of drug-target interactions, drug-disease associations, often hold the assumption that similar compounds are likely to interact with similar target proteins and thereby play similar therapeutic efficacy in cellular micro-environment. These computational methods have achieved superior performance, and greatly narrowed down the number of candidate drug targets and reveal new indications of approved drugs. Under this assumption, the prediction accuracy often relies on the close associations of tested samples with known samples that have been validated by wet-lab experiments, such as drug combinations and drug-target interactions. In terms of network medicine, the influence of drug molecule would perturb the cellular network via signal cascade reactions and protein interaction network. Many computational methods have taken into account this consideration, and adopted random walks and diffusion on network to capture the perturbation of the drugs.
However, there are always some samples located far from validated samples in the feature space. For instance, some new drugs have low similarity to other drugs, and some proteins have low similarity to other protein in different protein family. As a result, similarity-based or network diffusion-based computational methods tend to encounter failure in predicting drug combinations or drug-target interactions composed of such drugs or proteins. Fortunately, the emergence of large-scale experimental data derived from high-throughput screening technique can strongly motivate the novelty of methods to predict synergistic drugs or effective drug combinations.

Conclusion
In this paper, we proposed a gradient tree boosting (GTB) classifier based on heterogeneous network-derived features to predict effective drug combinations. The heterogeneous network integrates the drug similarity network, protein similarity network and known drug-protein associations. Next, we ran random walk with restart (RWR) on the heterogenous network using the combinatorial drugs and their associated targets as the initial probability, and obtained the converged probability distribution as the feature vector of each drug combination. The heterogeneous network-derived features introduced in our method are more informative and enriching compared to the primary ontology features. The GTB classifier trained based on the heterogeneous network-derived features outperforms seven typical classifiers and traditional boosting algorithms. Moreover, our case studies show that our method is helpful in revealing the indications of drug combinations. From the perspective of network pharmacology, our method effectively exploits the topological attributes and interactions of drug targets in the overall biological network, which proves to be a systematic and reliable approach for drug discovery.

Overview of our methodology
We first constructed the benchmark drug combination set composed of positive samples derived from public databases and negative samples that were randomly generated. For individual drugs included in the benchmark set, we collected a variety of related characteristics, including chemical fingerprints, drug targets and drugprotein associations, as shown in Fig. 2a. These ontology features of drugs and proteins were used to compute the drug-drug similarities and protein-protein similarities. Together with the known drug-protein associations, we constructed the drug-protein heterogeneous network. Next, the random walk with restart on heterogeneous network proposed in our previous work [28] was conducted for each drug combinations as initial state, as illustrated in Fig. 2b-c. The probability distribution when the random walk reaches steady state was used as the feature vector of the drug combination. Based on the feature representation of the drug combinations, the gradient tree boosting (GTB) classifier was trained to predict new effective drug combinations.

Drug-protein associations
We selected drug-protein associations from STITCH database [41], which is a comprehensive database that collected compound-protein interactions from different sources: biochemical experiments, external databases, text mining and computational predictions. STITCH has computed a confidence score for each interaction ranging from 0 to 1,000, which indicates the confidence of the compound-protein interaction supported by four types of evidences. We first used a confidence threshold 0.5 (corresponding to 500 combined score in STITCH) to remove low-confidence target proteins, because we think too lowconfidence targets are probable unauthentic ones. Next, we selected top 3 from the rest of target proteins of each drug. If one drug has less than 3 target proteins with confidence score higher than 0.5, we then took only those targets into account. In total, we got 210,235 drug-protein associations regarding to 3,266 unique drugs (drug set are built by selecting top 3 similar drugs, see following subsection for details). Formally, denoted by D = (d 1 , d 2 , ..., d n ) and P = (p 1 , p 2 , ..., p m ) the drug and protein node set, and A the adjacent matrix of drug-protein associations with element a ij equal to the confidence score if there is validated interaction between drug i and protein j, and a ij =0 otherwise.

Drug-drug similarity network
We expanded the list of individual drugs by selecting top 10 most similar drugs to each single agent included in DCDB, according to the chemical-chemical combined scores that were derived from STITCH [41]. After removal Fig. 2 Illustrative diagram of the proposed method. a Data collection from drug and protein-related databases; b Construction of drug-drug similarity network, protein-protein similarity network and drug-protein association network; c Random walk with restart on drug-protein heterogenous network; d Feature representations of drug combinations via feature extraction process; e Training gradient tree boosting classifier of duplicate drugs, 3266 unique drugs were obtained. Similar compounds are likely to interact with similar target proteins and thereby play similar therapeutic efficacy in cellular micro-environment [42], allowing us to find new drug combinations by introducing similar drugs to known ones. Therefore, we believe that the expanded list of drugs can increase the opportunity for discovery of novel drug combinations.
Next, we generate the chemical fingerprint of the drugs to calculate the similarity measurement of each pair of drugs. Similar to our previous work [28], we applied PaDEL software [43] to compute the chemical fingerprints using the SMILES string of a drug, and obtain an 880-d binary vector for each drug. The element 1 of the binary vector represents that the drug contains the corresponding chemical fingerprint, and 0 otherwise. Subsequently, Jaccard score, a widely used similarity measure, is calculated based on the chemical fingerprints as the chemical similarities for pairwise drugs. The Jaccard score is generally defined as the intersection size divided by the union size of two individual sets, which is shown as follows: Further, the bipartite network projection algorithm, a method inspired by the network-based resourceallocation dynamics [44], was adopted to compute another drug similarity measure based on known drug-protein associations. In the drug-protein bipartite network, each drug node equally allocates the original resource to its associated protein nodes, and successively the assigned resource of each protein node is equally transferred back to its neighborhood drugs. As a result, the proportion of the resource of drug d i conveyed to drug d j in such allocation process represents the strength of association between two drugs. Suppose the initial resource of each drug node is one-unit, the second drug similarity measure, denoted by S ij , can be formulated as below: a il a jl k(p l ) (6) in which k(d j ) and k(p l ) are the degrees of drug d j and protein p l in the drug-protein association network. Intuitively, more common associated protein nodes the pairwise drugs share, higher similarity the drugs have. Particularly, if the associated proteins of two drugs are not overlapped, i.e. no common associated protein exists, the similarity is denoted by 0. Finally, these two aforementioned drug-drug similarities were integrated into a comprehensive measurement using the probability disjunction formula as below:

Protein-protein similarity network
Correspondingly, we constructed the protein-protein similarity network based on two different similarity measures, including protein sequence similarity and GO semantic similarity. By using R package biomaRt (2.40.4) [45,46], the protein sequences can be readily obtained from Ensembl genome database (2018 updated), which is dedicated to curating gene-related information to encourage genome analysis [47]. The sequence similarity S (p1) ij between protein p i and protein p j was computed by using the R package Protr (1.6-2) [48], in which the Smith-Waterman algorithm is applicable.
Similar drugs are supposed to interact with proteins that act in similar biological processes or have similar molecular functions or reside in similar compartments [49]. Therefore, the GO semantic similarity S (p2) ij between protein p i and protein p j was calculated using R package GOSemSim (2.10.0) [50]. All three types of ontology features are used in the calculation of semantic similarity.
Likewise, the probability disjunction was used to integrate two aforementioned protein-protein similarities, which is formulated as below: ij is the comprehensively integrated similarity measurement between protein p i and protein p j .

Random walk with restart on heterogenous network
The drug-drug similarity network, protein-protein similarity network and drug-protein association network were combined to construct the drug-protein heterogeneous network G = (V , E). The node set V = {D, P}, V is the union set of the drug and protein nodes. The edge set E = {E dd ∪ E dp ∪ E pd ∪ E pp }, where E dd , E pp , E dp and E pd are the drug-drug edge, protein-protein and drug-protein edge collections, respectively.
In order to obtain the feature representations of drug combinations, we extended our previous work in which the random walks with restart on the heterogeneous network was developed for single drug repurposing [28]. More precisely, for a drug combination d i and drug d j , we performed random walk with restart on the heterogeneous network in which these two drugs and their known target proteins act as seed nodes, as shown in Fig. 3. Actually, since the initial probability distribution can be easily extended to more drugs and their targets, the number of individual drugs involved in the combination is not limited to 2 in our method. When the random walk process reaches steady state, the probability distribution vector can be regarded as the perturbation on the protein network by the combinatorial drugs. With the drug-protein heterogeneous network, the transition matrix T can be defined as below: where T (dd) and T (pp) are the probability transition matrix from drug nodes (protein) to drug nodes (protein nodes) during the random walk process; T (dp) denotes the probability transition matrix that drug nodes walk to protein nodes, and T (pd) denotes the probability transition matrix that protein nodes walk to drug nodes. Suppose that the random walker starts from a drug node, and then visits one of its targeted proteins with probability λ, or visits any other drug nodes with probability (1-λ) in the heterogeneous network. If λ=0, the random walker can only stay within the networks where it starts. According to the drug-drug similarity, the transition probability from drug d i to drug d j can be defined as below: where S ij is the similarity between ith drug and jth drug, a il is the association confidence score between ith drug and lth protein. The sum of a il equaling to 0 indicates that the drug has no approved or predicted association with any proteins. Similarly, the transition probability from protein p i to protein p j can be defined based on the protein-protein similarity as below: where S ij is the similarity between ith protein and jth protein, a li is the association score between lth drug and ith protein.
Accordingly, the transition probability from drug d i to protein p j is defined as: The transition probability from protein p i to drug d j is defined as: Provided that P(t) is a (n + m)-dimension probability vector at step t, in which P(t) [i] represents the probability of the random walker visiting node i(drug or protein), the random walk process can be iteratively calculated as below: where α is the restart probability, and P 0 is the initial probability distribution vector of a set of seed nodes consisting of a combinatorial drugs and their targeted proteins. Take the drug combination d i and d j as an example, d i and d j are employed as the seed nodes in the drug network and each seed node is given equal probability 1/2. By giving rest drug nodes probability 0, the initial probability matrix with respect to drugs can be constructed. Correspondingly, the protein nodes related to drug d i and drug d j are used as seed nodes in protein network and equal probabilities are allocated to these protein nodes so that the sum of the probabilities is 1. As shown in Fig. 3, there are three targeted proteins and thus each protein is given initial probability 1/3. Let P (d) 0 and P (p) 0 be the initial probability vectors of drugs and proteins separately, the initial probability P 0 for drug-centric random walk can be defined as follows: where η ∈[ 0, 1] is a tradeoff parameter to balance the weight of importance between the drug nodes and protein nodes. In our experiments, η is set to 0.5. If the difference between twice iteration is lower than 1e-10, the random walk is supposed to reach steady state. Once the random walk process converges, the probability distribution is used as the feature vector of the drug combination.

Building gradient tree boosting classifier
Based on the feature vectors produced by the random walk on drug-protein heterogenous network for each pair of drug combination, we built a gradient tree boosting (GTB) classification model, referred to as gradient boosting regression or decision tree (GBRT or GBDT). Gradient tree boosting is an efficacious machine learning method that has achieved desirable performance in both classification and regression problems [51][52][53].
In fact, Caruana and Niculescu-Mizil have conducted comprehensive performance evaluation on eight different binary classification problems by comparing boosted trees algorithm with other nine typical classifiers, including SVMs, Neural Nets, Logistic regression, Naive Bayes, memory-based learning, Random Forests, Decision Trees, Bagged Trees and Boosted Stumps. Their conclusion showed that boosted tree-based algorithm achieved best performance [54]. Another empirical performance evaluation has also demonstrated that boosted decision trees perform exceptionally well when the dimensionality of the input is not too high [55]. Therefore, we adopted the GTB algorithm to build our classification model. Formally, the decision function of GTB is initialized as: where N is the number of drug combinations contained in the training set. The gradient tree boosting algorithm repeatedly constructs K different classification subtrees h(x, a 1 ), h(x, a 2 ),..., h(x, a K ), each of which is separately trained based on a subset of randomly selected samples from the training set, and then iteratively establishes the additive function θ k (x): in which b k and a k are the weight and parameter vector of the k-th classification subtree h(x, a k ). The loss function L(y, θ k (x)) is defined as: L(y, θ(x)) = log(1 + exp(−yθ(x))) where y is a binary value representing the real class of the combination and θ(x) is the decision function. In order to minimize the loss function L(y, θ k (x)), both b k and a k are iteratively optimized by applying grid search. In this paper, grid search strategy was adopted to tune the optimal hyperparameters of GTB by 10-fold cross-validation on the constructed drug combination dataset. Finally, the optimal number of trees of the GTB is 300, and the tuned depth of the trees is 13.